2. WHAT IS DATA SCIENCE &
BIG DATA?
Data Science is an interdisciplinary field that
combines statistics, computer science, and
operations research. It has numerous applications
such as in Fintech, Genomics, and even the Social
Sciences, just to name a few.
Big Data is data science applied to large
data sets, usually in the terabyte range and
above. It has its roots in Web 2.0 which
emphasized user-generated content, thus
resulting in greater variety, volume, and
velocity of data.
7. DATA SCIENCE VENN DIAGRAM
Hacking Skills
Having a proper mathematical background and
domain expertise may not be sufficient to succeed
as a data scientist. The ability to combine together
Different tools and visualizations is key to becoming
an effective data scientist.
Math & Statistics
Computer Science, Math, Statistics, and
Linear Algebra provide a solid foundation from which
a data scientist can draw the necessary knowledge to
apply analysis to data sets.
SME & Job Experience
There is no substitute for solid work experience as
a business analyst, programmer, and/or statistician
for the domain in which you are applying your skills
and knowledge. The absence of such experience can
lead to biased statistical models or irrelevant
conclusions.
8. WHAT DOES A GOOD DATA
SCIENTIST LOOK LIKE?
Inquisitive – skeptical and curious
Knowledgeable – knows machine
learning, statistics, and probability
Scientific Method – Creates
hypotheses, tests them, and updates
understanding
Coding – is good at coding, hacking,
and general programming
Product Oriented – knows how to
build data products and visualizations to
make data understandable to mere
mortals
Domain Knowledge –
understands the business and how to tell
the relevant
story from business data. Able to find
answers to known unknowns.
12. DEMAND & OPPORTUNITY
Data Science has been dubbed by the Harvard Business Review (Thomas H. Davenport
and D.J. Patil, October 2012) as…
“The Sexiest Job of the 21st Century”
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
And by the New York Times (April 11, 2013) as a…
“hot new field [that] promises to revolutionize
industries from business to government,
healthcare to academia”
Data Science, however, is NOT NEW! It’s basically just data mining rebranded.
13. DEMAND & OPPORTUNITY
Data Scientist was identified by Glassdoor as the top job for Work-Life Balance in 2015
(out of 25), with the highest salary…(in USA)
1. Data Scientist
• Work-Life Balance Rating: 4.2 (out of 5)
• Salary: $114,808 (highest salary)
• Number of Job Openings: 1,315 (highest in the top 9)
https://www.glassdoor.com/blog/25-jobs-worklife-balance-2015/
According to McKinsey, there will be a shortage of talent needed to take advantage of data
science and big data. By 2018, The USA alone could face a shortage of 140-190k skilled data
scientists and 1.5 million managers and analysts with the know-how to use the analysis of big
data to make effective decisions.
http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation
14. DATA SCIENCE PRINCIPLES
1. Socio-Technical Systems are complex!
2. Data is never at rest
3. Data is dirty, deal with it!
4. SVoT = LOL! (Single Version of Truth)
5. Data munging/wrangling & data wrestling > 70% time – this is the
reality of the data scientist
6. Simplification. Reduction. Distillation.
7. Curiosity. Empricism. Skepticism.
15. KNOWNS AND UNKNOWNS
There are known knowns. These are things we know that we know.
There are known unknowns. That is to say, there are things that we know
we don’t know.
But there are also unknown unknowns. There are things we don’t know
we
don’t know.
Donald Rumsfeld
18. APPLICATIONS OF DATA SCIENCE
Data-Driven Decision Making (DDD) refers to the practice of basing decisions on
data, rather than purely on intuition.
DataScienceforBusiness.O’ReillyMedia
33. DATA-DRIVEN ORGANIZATION
Organizations become data-driven by developing data products.
What is a data product?
• Curated and crafted from raw data
• A result of exploration and iterations
• A machine that learns from data
• An answer to known unknowns or unknown unknowns
• A mechanism that triggers immediate business value
• A probabilistic window of future events or behavior
34. DEVELOPING DATA PRODUCTS
OBJECTIVES
What outcome am I
trying to achieve?
LEVERS
What inputs can we
control?
DATA
What data can we
collect?
MODELS
How the levers
influence the
objectives?
39. DATA SCIENCE AS A
CAREER
DJ Patil, Chief Data Scientist of the United States
is the perfect prototype of the Data Scientist. He brings a deep understanding of mathematics from
his Ph.D. in applied mathematics. He has created multiple data products, and collaborated with
people in various data science roles. He’s headed up strategy and led teams to build out entire new
extensions of Linkedin’s data, from the creation of “People You May Know”, to Talent Match, a
function that automatically sources the best candidate for any job posted on Linkedin.
Doug Cutting, Creator of Hadoop & Chief Architect at Cloudera
is somebody who has dedicated his time to creating technical solutions to store and process data at
scale. Hadoop is widely used to distribute data across several hardware servers so that huge data
sets can become manageable. Doug Cutting is the prototypical example of a data engineer and he
is now the chief architect at Cloudera, one of the largest data engineering organizations in the world.
40. DATA SCIENCE EDUCATION FRAMEWORK
LEARN TO CODE
PYTHON R JULIA
HIGH-LEVEL
LOWER-LEVEL
JAVA SCALA/CLOJURE C++/GO