I believe there is a new role in data that businesses need to start taking into account, that of the data science architect.
What is a data science architect? It is a mix between a data scientist and a data engineer. Data science is (according to wikipedia):
Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).
The role of a data engineer is
A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses. The specific tasks handled by data engineers can vary from organization to organization but typically include building data pipelines to pull together information from different source systems; integrating, consolidating and cleansing data; and structuring it for use in individual analytics applications.
The data science architect (DSA) sits in between the two. The DSA deals with the design of the data collection, storage and analysis processes, while taking into account time and cost trade-offs and business requirements.
Some example problems are:
1) What variables should be stored?
This is mostly an early stage company problem which I have already discussed in my article about data science strategy.
2) What issues might arise regarding data quality?
Should additional measures be taken in order to ensure that the appropriate data is in place? What these measures can be and at what stage of the architecture (e.g. a data firewall, or filling missing values during the analysis).
3) What are the different options for a database, and which suits the company the best at this and at future stages?
Is it more important to go for a solution that makes storage easy, but it is is more difficult to query, or a relational database might be a better choice?
4) Are there any concerns regarding the choice of a database, programming language, the data being collected and different technologies?
E.g. A particular type of analysis might be easier to do with a library that exists only in R. However, there might not be anyone in the company that can use R, so a second best has to be found in Python. The DSA needs to decide on the best way to adapt and move forward.
So, a DSA starts by analyzing a company’s needs having the end goal in mind: using data to generate value. From that goal, the DSA designs the architecture and the analytics pipelines while taking into account appropriate time frames, and costs.
The DSA is a more relevant role for startups, since all startups that deal with data will have to make these decisions.
Now someone might argue that the DSA is not so much a separate role as it is a separate function within a data scientists repertoire. I think this could be right, but it is still important to stress out the existence of this function. A data scientist is valuable when the data is already in place. A data engineer does not have the appropriate skills and knowledge to design the architecture in a way that maximises value for the long run. A data science architect enters the scene in the early stage and then paves the way for the other two.