Data science as a profession is still extremely new. Data and statistical modelling were used to make predictions long before data scientists existed, but this role has become more established as the world shifts towards a data-driven approach. As a result, the specification for a perfect data scientist still lacks clarity.
To be considered for a data science role, an applicant needs to meet a range of prerequisites. A degree with high mathematical content is essential, with some employers preferring it to be from a Russell group university. A post-graduate might also be requested; in the current data scientist pool, 88% have masters degrees and 34% have completed a PhD. Next on the shopping list is relevant coding skills: R or Python are likely to be specified, with languages such as Java, C++ or SQL considered useful. In the era of “Big Data”, experience using a platform or analytical software for large datasets is another nice-to-have, alongside specialist knowledge in areas such as natural language processing or neural networks.
Clearly, the barrier to entry is already high. A misunderstanding of the data scientist role can cause employers to ask for excessive experience and skills. In some cases, a data engineer or data architect might be better-suited for the job, which only contributes further to the blurred expectations of a data scientist.
The number of data scientist positions advertised on Indeed increased by 78% over the course of 2019. A significant increase, but this level of growth has fallen for the last for years. This may be attributed to a rise in the number of other roles, with the fastest growing job listed as “Machine Learning Engineer”.
In theory, a machine learning engineer combines aspects of data science, data engineering, and software development. More formally, a data scientist does the statistical analysis required to determine which machine learning approach to use, then they model the algorithm and prototype it for testing. At that point, a machine learning engineer takes the prototyped model and makes it work in a production environment at scale.
However, in practice, rather than narrow the specification for a data scientist, organisations bucket all of these responsibilities within the data scientist’s domain. This elevates the stress put on data scientists to have an unrealistic plethora of skills and limits their effectiveness for the organisation.
In essence, the role of a data scientist is becoming diluted as more specialist responsibilities begin to be defined in this area. Often these focus on data infrastructure or management, which, as we’ve discussed previously, are vitally important. It has been estimated that only 5% of machine learning projects make it off the ground, which represents a lot of time and money. Ensuring that solutions can be appropriately operationalised will dramatically improve this rate.
Employers should work to understand what capabilities they require within their organisation to effectively execute machine learning projects and assign clearly defined roles and responsibilities. The demand for dedicated machine learning engineers is surely going to rise as they become an important part of the data science puzzle.
As for data scientists, their role should be keenly focused on creating models, not on data management or programming, in order to really maximise machine learning results.