2. INTRODUCTION
Promising future for data scientist.
Data science is a methodological approach
The entire process is impossible to do without the help of trained
individuals, who specialize in Data Science, called “Data
Scientists.”
The demand for such Data – Intensive jobs is increasing
exponentially.
3. 1. TECHNICAL SKILLS
The next generation of data scientist will maintain a breadth of
hard technical skills such as mathematics, statistics, probability
theory, machine learning, coding, data visualization etc.
Data science process need to be fully cultivated such as:
Exploratory data analysis (EDA), creative feature engineering,
managing the vast number of models.
4. 2. SLOW DOWN AND PROCEED METHODICALLY
Understand the data properly
Focus on Customer Requirements
Avoid using data which is humongous but irrelevant
Proceed with careful considerations for meaningful
results.
Spend more time in getting the collected data in
proper shape
5. 3. SOFT SKILLS
What is soft skills for next generation of data scientists?
Soft skills are non-technical skills that relate to how data
scientists work. They include how they interact with
colleagues, how they solve problems, and how they manage
their work.
Some examples for soft skills include:
•Leadership
•Communication
•Team work
•Time management
6. 4. APPLY THE SCIENTIFIC METHOD
Data scientists should ascribe to the “scientific method” in
the way they test hypotheses and welcome challenges and
alternative theories.
It’s important to ask a lot of questions.
Don’t worry about appearing stupid.
Don’t be afraid to ask for clarification.
Do not confuse correlation and causation.
Data scientists should remain skeptical.
7. 5. PROCEED WITH ETHICS
It’s important to realize that algorithms are not only
capable of predicting the future, but also of directing the
future. Next generation data scientists shouldn’t let their
salaries blind them to the point that their models are used
for unethical purposes. Instead, they should seek out
opportunities to solve problems of social value and
consider the impact and consequences of their models.
8. 6. DATA SCIENCE TOOLS AND WORKFLOWS
Becoming data scientist is hard. In any hard task, focus is critical. As a data scientist,
Python should probably be the first tool you should master.
Python, SQL and R are the top performers. Other sources such as KDNuggets’ poll
results also support the prevalence of Python and R
9. TRANSFORMATION TOOLS
1. Tensorflow: Focused on deep learning, launched by Google,
Tensorflow has 153k stars on github.
2. PyTorch: Open source, built in Python, starred by ±45k in github. Most
data science teams that I personally know rely on PyTorch which
includes libraries for most machine learning approaches.
3. DataRobot: DataRobot offers a machine learning platform for data
scientists of all skill levels to build and deploy accurate predictive
models in a fraction of the time it used to take.
4. Alteryx
5. Qubole
10. QUERYING AND PROCESSING
Once data is accessible either in more structured
forms like relational databases or in a data lake.
For querying data from multiple data sources, data
scientists also have query engines in their toolkit.
These engines allow data scientists to use SQL
queries on disparate data silos, so that they can
query the data where it already exists rather than
having to move around massive amounts of data to
run queries.
Often queries can involve massive amounts of data,
which require processing engines to help speed up
and simplify the querying process.
11. IDES AND WORKSPACES
Data scientists can now begin to conduct analyses. This piece of the
workflow happens in integrated development environments (IDEs)
and workspaces (often notebooks).
IDEs: IDEs combine different automated application development
tools into one interface, can serve as a fast way to test and debug
code.
Notebooks: Notebooks are tools that allow data scientists to combine
text that explains their thoughts and work, with code and graphs.
Advanced Notebooks like Observable, Hex.tech, Deepnote, and
Noteable have emerged that allow for more powerful visualizations
and collaboration.