By Andrea De Mauro and Mahantesh Pattadkal

As we pick up from where we left off in Part 1 of the blog series “Job Trends in Data Analytics“, our journey through the world of data analytics job trends and the role of Natural Language Processing (NLP) continues.

In Part 1, we introduced the “Data Analytics Job Trends” application, which is all about gathering data and applying NLP to analyze it, powered by KNIME Analytics Platform. We discussed the web scraping phase used to collect live data regarding the data analytics job market, followed by the process of cleaning up the data using NLP techniques. We then introduced a topic model that revealed seven homogeneous skillsets within job postings. Such skillsets represent the competencies and activities employers across various industries seek in data analytics professionals.

In the second part of the blog series, we will describe the identified skillsets and make some data-backed considerations on the evolving landscape of professional careers in Data Science.


To label the skillsets, we use the most frequent terms and weights identified through the LDA algorithm that was previously applied to the job postings. We further analyze the job descriptions in each topic to highlight the key activities, essential skills, and industries where they are most commonly found. Understanding these topics can help job seekers align their skillsets with the market demands and increase their chances of securing a suitable position in the field of Data Analytics. In the following paragraphs, you will find a brief description of each skillset.


Topic 0: Research and Data Analysis

The following table shows the top five terms and their weights for topic 0. The weights refer to the significance of the term in defining that particular topic. Considering these terms and the documents labeled as topic 0, we interpret this skillset to be “Research and Data Analysis”.



Table 0: Term-Weights for Topic 0


This skillset encompasses activities such as conducting research, analyzing data, and providing insights that drive decision-making in various sectors. As a cornerstone of data analytics, this skillset facilitates the extraction of valuable insights from data, trend identification, and informed decision-making.
From what we gathered within the corpus of job posts, the fundamental competency requirements connected with this skillset are:

  • Strong analytical and problem-solving abilities
  • Expertise in statistical software (R, Python) 
  • Experience with data visualization tools
  • Effective communication and documentation skills
  • A background in relevant field (mathematics, statistics, or data science)


Topic 1: Administration and Customer Support


By looking at the terms and weights from Table 1 and at the documents associated with Topic 1, we decided to label it as “Administration and Customer Support”. This skillset entails managing customer interactions, providing administrative support, and coordinating logistics or procurement processes.



Table 1: Term-Weights for Topic 1


In our opinion, the fundamental competencies necessary to succeed in jobs requiring this skillset are:

  • Strong organizational and time management abilities
  • Attention to detail
  • Proficiency in office software and communication tools
  • Excellent interpersonal and problem-solving skills


Topic 2: Marketing and Product Management


Based on the terms shown in Table 2, we interpret this to be the “ Marketing and Product Management ” skillset.



Table 2: Term-Weights for Topic 2


This skillset revolves around developing marketing strategies, managing product lifecycles, and driving market growth. It’s vital in data analytics-focused jobs, as it allows professionals to use data-driven insights to make informed decisions regarding market trends, customer preferences, and product performance.

The essential competencies required within the Marketing and Product Management skillset are:

  • Strong analytical and strategic thinking abilities
  • Expertise in market research and competitive intelligence
  • Experience with marketing tools and platforms
  • Excellent communication and leadership skills
  • A background in business, marketing, or a related field


Topic 3: Business Management, Data Governance, and Compliance


Based on the terms shown in Table 2, we concluded that it referred to the “Business Management, Data Governance, and Compliance ” skillset.

This skillset encompasses overseeing business operations, ensuring data quality and security, and managing risk and regulatory requirements. In data analytics-intensive jobs, this skillset enables maintaining data integrity, compliance monitoring, risk identification, and business process optimization using data-driven insights.



Table 3: Term-Weights for Topic 3


According to our findings, the required competencies within this skillset are:

  • Strong organizational and leadership abilities
  • Expertise in data management, data governance and risk assessment
  • Experience with regulatory frameworks and industry standards
  • Effective communication and problem-solving skills
  • A background in business, finance, or a related field


Topic 4: Business Intelligence and Data Visualization


Looking at the terms we found within Topic 4, we call it the “Business Intelligence and Data Visualization” skillset.

This skillset involves designing ever-present BI solutions such as dashboards and reports, creating insightful visualizations, and analyzing data for informed decision-making. It’s pivotal in jobs leveraging data analytics, transforming raw data into actionable insights that drive strategic decisions.

Power bi7359
Sql 5836

Table 4: Term-Weights for Topic 4


In our opinion, the fundamental competency requirements within BI and Data Visualization are:

  • Strong analytical and problem-solving abilities
  • Expertise in BI tools (like Power BI, Tableau, SQL)
  • Experience with data visualization techniques
  • Effective communication and storytelling skills


Topic 5: Data Warehouse and Cloud Infrastructure


Based on the terms shown in Table 5, we interpret this to be the “Data Warehouse and Cloud Infrastructure ” skillset.

Job posts requiring a cloud and big data engineering skillset are typically connected with activities such as designing and implementing cloud-based solutions, managing large-scale data processing, and developing software applications. It’s vital in data analytics-focused jobs, enabling efficient processing and analysis of large data volumes for valuable insights.



Table 5: Term-Weights for Topic 5


In our opinion, the fundamental competency requirements related to skillset are 

  • Strong programming and problem-solving abilities
  • Expertise in cloud platforms (like AWS, Azure, and Google Cloud)
  • Experience with big data technologies (like Hadoop, Spark, and NoSQL databases)
  • Knowledge of Information Security policies and related processes


Topic 6: Machine Learning


Based on the terms shown in Table 6, we interpret this to be the “Machine Learning ” skillset, which revolves around designing AI models, researching cutting-edge ML techniques, and developing intelligent software solutions. In data analytics-intensive jobs, it forms the basis for AI model training and performance optimization.



Table 6: Term-Weights for Topic 6


According to our findings, the  fundamental competencies required in machine learning today are

  • Strong programming and mathematical abilities
  • Expertise in machine learning frameworks (like TensorFlow, PyTorch)
  • Experience with advanced AI techniques (like deep learning, and natural language processing)
  • Effective communication and collaboration skills 


In this installment, our focus turns to the intricate analysis of skillset associations as revealed through topic modeling across three distinct professional profiles: Data Engineer, Data Analyst, and Data Scientist. To align these professional profiles with job postings, we leveraged a rule-based classifier. This classifier managed to determine the profile designation of a job listing based on keywords found within the job title. For instance, a job post titled “Data Architect” would be categorized as a Data Engineer role, while a posting titled “Machine Learning Engineer” would be attributed to the Data Scientist category. 

Using Latent Dirichlet Allocation (LDA) topic modeling furnishes us with topic weights for each job posting, spanning seven distinct skillsets. By calculating the mean weight of each skillset across all professional profiles, we arrive at the average skillset weight specific to each role. Notably, these weights are then normalized and represented as percentages.

As Illustrated in Figure 1, we present an insightful visualization of the interplay between professional designations and corresponding skillsets. This visual encapsulates the collective anticipation of employers concerning the fundamental proficiencies crucial for Data Engineers, Data Analysts, and Data Scientists.

As anticipated, the role of Data Engineer prominently necessitates mastery in the “Data Warehouse & Cloud Infrastructure” skillset. Moreover, a supplementary grasp of Visualization and Machine Learning is imperative. This emphasis on skill diversity can be attributed to the anticipation that Data Engineers will be integral in supporting both Data Analysts and Data Scientists.

Conversely, the paramount expertise projected for Data Scientists lies in “Machine Learning,” closely followed by a proficiency in “Research” methodologies. Notably, a hybrid skillset encompassing “Business Management” and “Product Management” also ranks high in significance. This encapsulates the intricate array of competencies sought by the job market for aspiring Data Scientists.

Turning our attention to the Data Analyst domain, a pivotal requirement emerges for proficiency in “BI and Visualization.” Given their role in generating business reports, driving dashboards, and monitoring business vitality, this comes as no surprise. The parallel demand for “Business Management” as a secondary key skill mirrors the strategic acumen expected from this role. Moreover, akin to the Data Scientist role, there’s a parallel requirement for “Product Management” and “Research” proficiencies within the Data Analyst spectrum.

In summation, this exploration underscores the nuanced landscape of skillset prerequisites across various Data Analytics roles. It portrays employers’ multifaceted expectations for candidates aspiring to excel in the capacities of Data Engineers, Data Analysts, and Data Scientists.


Figure 1: The Radar Plot displays the association between professional profiles plotted against the skillsets shown in dimensions.
Figure 1: The Radar Plot displays the association between professional profiles plotted against the skillsets shown in dimensions (click to enlarge).


Our analysis of job postings in the expanding field of Data Analytics aims to categorize jobs based on distinct skillsets and clarify the diverse range of abilities required in each category. With exponential growth in this domain and the critical nature of decisions made based on data, the process of collecting, storing, and analyzing data has seen remarkable advances, leading to an insatiable demand for professionals skilled in data analytics.

Through the classification of job postings into seven notable skillset topics, we shed light on the necessity for both specialized and multifaceted skills in this rapidly changing field. The topics ranged from data analysis and business intelligence to machine learning and artificial intelligence, underscoring the surging demand for individuals adept at harnessing data, technology, and cross-functional teamwork.

Notwithstanding, this study has several limitations. The dynamic nature of the job market and the emergence of novel technologies and methodologies require continuous updating of our analysis versus a static “snapshot” view as we did here. Furthermore, our approach may not have captured every nuance of the diverse job roles and skills in the Data Analytics arena, given the reliance on available job postings at the time of research.

All our work is freely available at KNIME Community Hub Public Space – “Job Competency Application”. You can download and play with the workflows to try out and discover for yourself and extend or improve. 


Looking ahead, we see the potential for considerable expansion of this study. This includes the development of KNIME components to implement the ‘Stop Phrases removal’ method, described in Part 1,  and a human-in-the-loop interactive visualization framework in KNIME. Such a framework would simplify the process of human judgment in selecting the most coherent topic model for a given corpus, enhancing the scaling of our work. We also envision the application of LLM-aided mechanisms to support and simplify the topic modeling phase: this scenario certainly leaves room for further experimentation and research.

Professionals in the Data Analytics field must remain informed and adaptable in the face of emerging technologies. This ensures that their skillsets stay relevant and valuable in the ever-changing landscape of data-driven decision-making. By recognizing and cultivating the skills related to the identified topics, job seekers can gain a competitive edge in this vibrant market. To protect their relevance in the field, Data Analytics professionals must remain curious throughout their careers and continue to learn continuously.

Mahantesh Pattadkal brings more than 6 years of experience in consulting on data science projects and products. With a Master’s Degree in Data Science, his expertise shines in Deep Learning, Natural Language Processing, and Explainable Machine Learning. Additionally, he actively engages with the KNIME Community for collaboration on data science-based projects.

Andrea De Mauro has over 15 years of experience building business analytics and data science teams at multinational companies such as P&G and Vodafone. Apart from his corporate role, he enjoys teaching Marketing Analytics and Applied Machine Learning at several universities in Italy and Switzerland. Through his research and writing, he has explored the business and societal impact of Data and AI, convinced that a broader analytics literacy will make the world better. His latest book is ‘Data Analytics Made Easy’, published by Packt. He appeared in CDO magazine’s 2022 global ‘Forty Under 40’ list.

Source link