Making this possible from a skills perspective has seen a rapid increase in academically trained scientists taking the step into the private and commercial sectors. This has been in equal parts due to:
- A lack of funding in science and academia,
- An increase in graduates trained to PhD and MSc,
- Investment in data by the private sector,
- Greater ease of access to large real-world datasets in industry relative to academia.
However, the demand for suitably skilled data scientists still outstrips supply.
WHAT IS DATA SCIENCE?
It is perhaps unhelpful to view ‘Data Science’ as a distinct discipline. Instead, it is more useful to think of it as an interdisciplinary tool kit applied to a wide range of sectors and industries. The answer to “What is Data Science?” therefore may not lie in the headline description ‘Data Science’, but instead by a general characterisation of what it is to “do data science”. We should ask ourselves; “are there any common themes to the problems solved, the methods used, technologies leveraged, and philosophy applied?”.
Here is a list of what we came up with at T-DAB:
- Solving problems through the use of data, advanced mathematical techniques, and modern computing technology at scale.
- Question and hypothesis driven, but less reductionist in philosophy than traditional research – models are informed by data and as little as possible by humans
- Motivated by return of value and extraction of actionable insight
- Defined by a focus on the use of algorithms to uncover patterns within data and make predictions
- Focus on an ability for algorithms to update themselves based on new data (machine learning)
- Use of larger-than-normal (big) datasets and high dimensional data
- The integration and use of multiple data sources and types
THE DATA SCIENCE SAUCE
Data science has some defining quirks that set it apart from more traditional analytics:
- Data science is often more concerned with predictive power rather than statistical significance per se.
- Establishing causation is not always an imperative, so long as predictive accuracy is high and that there is no need to take action based on the predicting features.
- The focus on prediction has led to a propensity for the use of complex, difficult to interpret algorithms, such as neural networks.
- Finally, data science has extended into areas not the domain of traditional data analysis or business intelligence; Image recognition, computer vision, and the wider use of unstructured data.
One might also be tempted to define data science according to traits of those that practice it. If you were to search, ‘what is a data scientist’ on Google, you will get the same repeated list of skills that a data scientist ‘must have’: