Too many words without an agreed meaning makes for confusion and low quality discussions. I am personally moving (back) into the field of “Data Science” (DS), and a lot of initial discussions revolve around what the term actually means. In this post I attempt to describe what I think are sensible and correct descriptions.
Machine Learning (ML)
This is the simplest one. ML is a technique in computer science to create computer programs that can perform actions, without a programmer explicitly programming its logic (its if’s- and else’s).
Such a system “learns” from data, rather than having its logic explicitly programmed. This means that if you “ask” the system the same question twice, you might get a different answer, if the system has been exposed to data in the meanwhile (been learning).
Artificial Intelligence (AI)
This is a much more general term. It does not refer to a technique in computer science, nor a specific set of algorithms. It is a term to describe a computer system that exhibits “intelligence”, typically by mimicking “human reasoning”. It can be implemented using all sorts of techniques, its simplest form could be a completely standard and simple if / else type of program. But – machine learning is often used to implement AI, thus are they often confused with each other.
Data Science (DS)
Data Science (DS) is even more general, and is in my opinion a poorly defined buzzword with no authoritative definition. However, it does carry some meaning. It has to do with analysis of data. But the same can be said about data-warehouse analysts, machine learning experts, statisticians and so on. Based on my impression, which will be equally subjective and imprecise as the term itself, I suggest that:
- DS usually aims to predict something, often using machine learning techniques.
- DS is often used by analysts to make (operational) decisions or advice, as opposed to being used for reporting (of past events).
- DS often related to processing of large data-sets or high frequency data.
- DS often related to analysis of data across different systems that normally are not analyzed together.
- DS focuses on visualization of these data.
As a data scientist, you are expected to cover “all” these topics; collecting and preparing data, visualizing data, and doing “some kind of advanced” analysis on the data. The latter usually involving machine learning.
The main problem with data science (compared to ML or AI) is that it seems superfluous. There are already a lot of people working with large data outside their source systems, using prediction based methods, statistics and so on – without calling themselves data scientists. Thus – to me it appears to a branding thing, more than an actual defined “science”. It consists of different parts, where machine learning and statistics is the core.