SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
From Rocket Science to
Data Science
Sanghamitra Deb
Data Scientist, Accenture Tech Lab
Sexiest Job of the 21st century
Nate Silver predicted
correctly how all 50
states would go in the
presidential election
2012
Target predicted teen
pregnancy from retail data.
The Big Data Challenge
“With the need for data
scientists growing at about
3x those for statisticians
and BI analysts…. and an
anticipated 100,000+
person analytic talent
shortage through 2020… “
gartner article
“… three core data science skills: data management, analytics modeling and
business analysis. But beyond these, there’s an art to data science. We detail several
soft skills that our research showed are also critical to success, i.e., communication,
collaboration, leadership, creativity, discipline and passion (for information and
truth).”
Who are you?
• Front Engineer UX/UI
• Backend engineer
• Project Manager
• Academic (PhD, physics, neuroscience, economics,CS
… ) trying to find a niche in tech industry
• Quantitative background , curiosity and ability to
understand business needs.
Start a data driven project relevant to the industry you want to join
Where to start
blogs: yhat, data robot, datatau,
upshot …
twitter: follow data science
news…
Data Exploration/Discovery …
open a dataset in your favorite
coding language: Python, R ,
scala, julia, …
Learn to pipe data in to a
database such as MySQL/
MongoDB
Kaggle competitions, live and
older ones… e.g.: digit
recognition, titanic
Data Frameworks: Apache Spark.
Do a few online courses on data science, big data,
machine learning, python, R, … from coursera, udemy,
khan academy, … form study groups, go to meetups.
pros: DIY , bite size videos, flexibility, discussion
forums, interactivity, great way to gure out if a new eld
is interesting.
cons: DIY, choosing the correct course, signing up and
not participating after rst few weeks.
Small Data Project Flow
Get open source data.
Sources: city data
(SanFrancisco, LA, Seattle,
Chicago, transit data,…)
Load it up on
Python, if the data
is too big I will put
in MySQL (for
structured data) or
mongodb for free
form json.
Machine Learning, Statistics ,
counting statistics and
histogram are very powerful. If
you are a python user data frameworks such
as “GraphLab” is open source & easy to
learn.
Create a dashboard/
viz/app
Ask the right
Question!!!
Data Wrangling/Cleaning
• Open your data set and profile it
• Look for missing data, bad data
points vs true outliers
• Pattern of your data, is it a
phone number, timestamps or a
social security number? is it
structured data or unstructured
text
• Prep your data, identify the
features that influence your
outcome, feature selection and
feature engineering.
Lets start …
Question: What is a Data Scientist?
Data : scraped indeed.com for all jobs containing
“data” in the title. ~5000 jobs …
Meta Data: Job title, job description, city, state
job description: unstructured text…
Job Title
Job Description
text cleaning+ Bag of words
What are the data jobs?
participates in evaluation of hardware and software platforms and
integrating systems as they relate to the data architecture
participates in selection of application packages, agency services,
and technology/infrastructure capabilities to ensure alignment to
data architecture works in an environment, which includes data
modeling, data design, metadata and repository creation reviews
object and data models and the metadata repository to structure
the data for better management and quicker access plays a
liaison role with business data owner/stewards
'data_architect'+
description title
job title disambiguation
{‘data_integration_architect',
'data_architect',
'data_warehouse_architect',
'data_warehouse_lead',
‘sr_data_architect'}
data architect
data scientist
data engineer
data entry
database developer
data analyst Algorithm: word2vec synonym
Where are the data jobs?
Job Title
Job Description
text cleaning+ Bag of words
Word2Vec
stitchx blog
What do the job descriptions mean?
Algorithm: word2vec synonym
degree
report
team
written
Hadoop
Algorithm: word2vec synonym
Python
Algorithm: word2vec synonym
Statistics
• (u'mathematics', 0.8544293642044067),
• (u'economics', 0.8378890752792358),
• (u'applied', 0.8295730948448181),
• (u'physics', 0.8211749792098999),
• (u'math', 0.8039191961288452),
• (u'quantitative', 0.8003592491149902),
• (u'phd', 0.795414388179779),
• (u'fields', 0.7486724257469177),
• (u'science', 0.7226663827896118),
• (u'masters', 0.7045900225639343)
Algorithm: word2vec synonym
Regression
• [(u'segmentation', 0.7036155462265015),
• (u'statistical', 0.6883552670478821),
• (u'mining', 0.6801210045814514),
• (u'graph', 0.6701105237007141),
• (u'algorithm', 0.6695878505706787),
• (u'theory', 0.6563447713851929),
• (u'predictive', 0.6474782228469849),
• (u'matlab', 0.6356837749481201),
• (u'recommendation', 0.6203793287277222),
• (u'analyses', 0.6119924783706665)]
Algorithm: word2vec synonym
Graph
• (u'text', 0.7591882944107056),
• (u'manipulating', 0.716569185256958),
• (u'visualization', 0.7084065675735474),
• (u'matlab', 0.7055898904800415),
• (u'mining', 0.700824499130249),
• (u'unstructured', 0.6868686676025391),
• (u'regression', 0.6701105833053589),
• (u'algorithms', 0.6691791415214539),
• (u'natural', 0.6633298397064209),
• (u'engines', 0.6632224321365356)
Algorithm: word2vec synonym
Visualization
• [(u'tableau', 0.7196237444877625),
• (u'graph', 0.7084065675735474),
• (u'matlab', 0.6993618011474609),
• (u'libraries', 0.6821463108062744),
• (u'visualizations', 0.6746233701705933),
• (u'mining', 0.6517949104309082),
• (u'spss', 0.651625394821167),
• (u'text', 0.6145033836364746),
• (u'qlikview', 0.6053836345672607),
• (u'js', 0.5960412621498108)]
Algorithm: word2vec synonym
Machine Learning
• (u'learning', 0.8338875770568848),
• (u'algorithms', 0.7662283182144165),
• (u'natural', 0.7161275744438171),
• (u'physics', 0.707731306552887),
• (u'mining', 0.6965328454971313),
• (u'ideally', 0.6682661175727844),
• (u'graph', 0.6596766710281372),
• (u'predictive', 0.656450629234314),
• (u'applied', 0.6529620885848999),
• (u'statistics', 0.6500071883201599)
Algorithm: word2vec synonym
Fun with words
data + engineer-software = {cleansing, analyst, modeler, scientist}
python + ruby - html = {perl,scala,bash, scripting}
storm + hadoop - scripting = {hive,hbase,spark,pig}
visualizations +algorithms-predictive = {backend,libraries,js,jquery}
http://www.datasciencecentral.com/proles/blogs/how-to-become-a-data-scientist
Data Science in a Nutshell
Digging deeper …
• Create a data story, i.e put all the visualizations and insights in
a dashboard create an infographic using tableau, d3 , …
• Get data (say from crunchbase) on the companies that are
hiring and gure out which industries dominate in the data
world
• Get data for atleast the past 6 months and have exact
statistics for skills in the data world. Advanced text analytics
(bi-gram, tri-gram modeling, topic modeling)
• Create an app that gives tells you how “hot” your skills are and
what skills are easiest for you to acquire to become “hotter”.
Right questions?
Take different slices of the data and look for patterns that
might be interesting to you?
Retail: What effects customers shopping habits?
what are the control variables? are promos, discounts influencing any of this habits?
Crime: What are the sequence of crimes that happen every day? Do initiatives led
by government or non-prot organizations have an effect on certain crime rates?
Education: Does regular feedback to parents about their children’s education
have an effect on the grades or engagement of the children?
Healthcare: Does sending preventive care emails reduce knee surgeries?
Managing
Interview Process
• 3-5 hours long
• Depending on company size 4-6 people
• Statistics white boarding … A/B testing
calculations,
• Formulation of a machine learning use case
with parameter tuning, edge cases relevant
to the company
• Open question that the team is trying to solve
• CS Algorithms … cracking the coding
interview.
• Databases, SQL queries …
http://deblivingdata.net/wp-content/uploads/2014/05/DSTalk.slides.html
Now that you have landed
the job …
References
• Data Sources: data.gov, kaggle, open city data
• Volunteering opportunities: Datakind, BayesImpact,
Data For good
• DS Schools: Insight Data Sciences, Zipfian
Academy, …
• sqlzoo.net
• meetup.com
@sangha_deb,
deblivingdata.net,
sangha123.github.io
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-surveyAdam Rabinovitch
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Data Science
Data ScienceData Science
Data ScienceAmit Singh
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceAnastasiia Kornilova
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)Buhwan Jeong
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceLivePerson
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia articleHimanshuPise1
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 

Was ist angesagt? (20)

Data science presentation
Data science presentationData science presentation
Data science presentation
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Data science 101
Data science 101Data science 101
Data science 101
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science
Data ScienceData Science
Data Science
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 

Andere mochten auch

Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
 
Clinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeClinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeFotis Stathopoulos
 
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Cirdan
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathwaysdiannepatricia
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value? Michael Peters
 
Clinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya GlobalClinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya Globalikya global
 
Clinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataClinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataIUPUI
 
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Perficient
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseJosh Patterson
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonSri Ambati
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareNUS-ISS
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big dataPoo Kuan Hoong
 
Protocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLSProtocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLSKatalyst HLS
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relationsSanghamitra Deb
 
Argus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLSArgus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLSKatalyst HLS
 
Big Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and ConsiderationsBig Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and ConsiderationsMerge eClinicalOS
 

Andere mochten auch (20)

Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
 
Data day2017
Data day2017Data day2017
Data day2017
 
Clinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeClinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decade
 
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathways
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value?
 
Clinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya GlobalClinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya Global
 
Clinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated dataClinical Data Management: Strategies for unregulated data
Clinical Data Management: Strategies for unregulated data
 
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in Healthcare
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big data
 
Protocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLSProtocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLS
 
Extracting medical attributes and finding relations
Extracting medical attributes and finding relationsExtracting medical attributes and finding relations
Extracting medical attributes and finding relations
 
Clinical trial
Clinical trialClinical trial
Clinical trial
 
Argus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLSArgus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLS
 
Big Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and ConsiderationsBig Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and Considerations
 

Ähnlich wie From Rocket Science to Data Science

365 Data Science
365 Data Science365 Data Science
365 Data ScienceIvanHo572682
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analyticsSherpa Consulting
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Career_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptxCareer_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptxHarpreetSharma14
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data ScienceNyraSehgal
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringRy Walker
 
Data Science Whitepaper
Data Science WhitepaperData Science Whitepaper
Data Science WhitepaperTuan Yang
 
Data Science Growth Accelerator
Data Science Growth AcceleratorData Science Growth Accelerator
Data Science Growth AcceleratorKanika Khanna
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniDonatella Cambosu
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressMarcel Blattner, PhD
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their rolebhavesh lande
 
Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst3RI Technologies Pvt Ltd
 
Emerging opportunities in the age of data
Emerging opportunities in the age of dataEmerging opportunities in the age of data
Emerging opportunities in the age of dataEjaz Siddiqui
 

Ähnlich wie From Rocket Science to Data Science (20)

365 Data Science
365 Data Science365 Data Science
365 Data Science
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Career_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptxCareer_Jobs_in_Data_Science.pptx
Career_Jobs_in_Data_Science.pptx
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
From Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data EngineeringFrom Volume to Value - A Guide to Data Engineering
From Volume to Value - A Guide to Data Engineering
 
Data Science Whitepaper
Data Science WhitepaperData Science Whitepaper
Data Science Whitepaper
 
Data Science Growth Accelerator
Data Science Growth AcceleratorData Science Growth Accelerator
Data Science Growth Accelerator
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their role
 
Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
 
Emerging opportunities in the age of data
Emerging opportunities in the age of dataEmerging opportunities in the age of data
Emerging opportunities in the age of data
 

Mehr von Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from textSanghamitra Deb
 

Mehr von Sanghamitra Deb (14)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
 
Extracting knowledgebase from text
Extracting knowledgebase from textExtracting knowledgebase from text
Extracting knowledgebase from text
 

KĂźrzlich hochgeladen

Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''Lauren Prophet-Bryant
 
0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf
0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf
0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdfssuserded2d4
 
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...poojakaurpk09
 
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdfreStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdfKen Fuller
 
Joshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxJoshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxsportsworldproductio
 
Internship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkInternship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkSujalTamhane
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negronnegronf24
 
Delhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur DubaiBur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubaiparisharma5056
 
Escorts Service Cambridge Layout ☎ 7737669865☎ Book Your One night Stand (Ba...
Escorts Service Cambridge Layout  ☎ 7737669865☎ Book Your One night Stand (Ba...Escorts Service Cambridge Layout  ☎ 7737669865☎ Book Your One night Stand (Ba...
Escorts Service Cambridge Layout ☎ 7737669865☎ Book Your One night Stand (Ba...amitlee9823
 
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...Call Girls in Nagpur High Profile
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)Delhi Call girls
 
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...amitlee9823
 
Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...
Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...
Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...Pooja Nehwal
 
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

KĂźrzlich hochgeladen (20)

Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''Get To Know About "Lauren Prophet-Bryant''
Get To Know About "Lauren Prophet-Bryant''
 
0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf
0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf
0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf0425-GDSC-TMU.pdf
 
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
 
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdfreStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
reStartEvents 5:9 DC metro & Beyond V-Career Fair Employer Directory.pdf
 
Joshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxJoshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptx
 
Internship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkInternship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmk
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negron
 
Delhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls South Ex 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur DubaiBur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
Bur Dubai Call Girl Service #$# O56521286O Call Girls In Bur Dubai
 
VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Escorts Service Cambridge Layout ☎ 7737669865☎ Book Your One night Stand (Ba...
Escorts Service Cambridge Layout  ☎ 7737669865☎ Book Your One night Stand (Ba...Escorts Service Cambridge Layout  ☎ 7737669865☎ Book Your One night Stand (Ba...
Escorts Service Cambridge Layout ☎ 7737669865☎ Book Your One night Stand (Ba...
 
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hosur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
 
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
 
Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...
Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...
Dombivli Call Girls, 9892124323, Kharghar Call Girls, chembur Call Girls, Vas...
 
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Devanahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

From Rocket Science to Data Science

  • 1. From Rocket Science to Data Science Sanghamitra Deb Data Scientist, Accenture Tech Lab
  • 2. Sexiest Job of the 21st century Nate Silver predicted correctly how all 50 states would go in the presidential election 2012 Target predicted teen pregnancy from retail data.
  • 3. The Big Data Challenge “With the need for data scientists growing at about 3x those for statisticians and BI analysts…. and an anticipated 100,000+ person analytic talent shortage through 2020… “ gartner article “… three core data science skills: data management, analytics modeling and business analysis. But beyond these, there’s an art to data science. We detail several soft skills that our research showed are also critical to success, i.e., communication, collaboration, leadership, creativity, discipline and passion (for information and truth).”
  • 4. Who are you? • Front Engineer UX/UI • Backend engineer • Project Manager • Academic (PhD, physics, neuroscience, economics,CS … ) trying to nd a niche in tech industry • Quantitative background , curiosity and ability to understand business needs. Start a data driven project relevant to the industry you want to join
  • 5. Where to start blogs: yhat, data robot, datatau, upshot … twitter: follow data science news… Data Exploration/Discovery … open a dataset in your favorite coding language: Python, R , scala, julia, … Learn to pipe data in to a database such as MySQL/ MongoDB Kaggle competitions, live and older ones… e.g.: digit recognition, titanic Data Frameworks: Apache Spark. Do a few online courses on data science, big data, machine learning, python, R, … from coursera, udemy, khan academy, … form study groups, go to meetups. pros: DIY , bite size videos, flexibility, discussion forums, interactivity, great way to gure out if a new eld is interesting. cons: DIY, choosing the correct course, signing up and not participating after rst few weeks.
  • 6. Small Data Project Flow Get open source data. Sources: city data (SanFrancisco, LA, Seattle, Chicago, transit data,…) Load it up on Python, if the data is too big I will put in MySQL (for structured data) or mongodb for free form json. Machine Learning, Statistics , counting statistics and histogram are very powerful. If you are a python user data frameworks such as “GraphLab” is open source & easy to learn. Create a dashboard/ viz/app Ask the right Question!!!
  • 7. Data Wrangling/Cleaning • Open your data set and prole it • Look for missing data, bad data points vs true outliers • Pattern of your data, is it a phone number, timestamps or a social security number? is it structured data or unstructured text • Prep your data, identify the features that influence your outcome, feature selection and feature engineering.
  • 8. Lets start … Question: What is a Data Scientist? Data : scraped indeed.com for all jobs containing “data” in the title. ~5000 jobs … Meta Data: Job title, job description, city, state job description: unstructured text…
  • 9. Job Title Job Description text cleaning+ Bag of words
  • 10. What are the data jobs? participates in evaluation of hardware and software platforms and integrating systems as they relate to the data architecture participates in selection of application packages, agency services, and technology/infrastructure capabilities to ensure alignment to data architecture works in an environment, which includes data modeling, data design, metadata and repository creation reviews object and data models and the metadata repository to structure the data for better management and quicker access plays a liaison role with business data owner/stewards 'data_architect'+ description title job title disambiguation {‘data_integration_architect', 'data_architect', 'data_warehouse_architect', 'data_warehouse_lead', ‘sr_data_architect'} data architect data scientist data engineer data entry database developer data analyst Algorithm: word2vec synonym
  • 11. Where are the data jobs?
  • 12. Job Title Job Description text cleaning+ Bag of words
  • 14. What do the job descriptions mean? Algorithm: word2vec synonym degree report team written
  • 17. Statistics • (u'mathematics', 0.8544293642044067), • (u'economics', 0.8378890752792358), • (u'applied', 0.8295730948448181), • (u'physics', 0.8211749792098999), • (u'math', 0.8039191961288452), • (u'quantitative', 0.8003592491149902), • (u'phd', 0.795414388179779), • (u'elds', 0.7486724257469177), • (u'science', 0.7226663827896118), • (u'masters', 0.7045900225639343) Algorithm: word2vec synonym
  • 18. Regression • [(u'segmentation', 0.7036155462265015), • (u'statistical', 0.6883552670478821), • (u'mining', 0.6801210045814514), • (u'graph', 0.6701105237007141), • (u'algorithm', 0.6695878505706787), • (u'theory', 0.6563447713851929), • (u'predictive', 0.6474782228469849), • (u'matlab', 0.6356837749481201), • (u'recommendation', 0.6203793287277222), • (u'analyses', 0.6119924783706665)] Algorithm: word2vec synonym
  • 19. Graph • (u'text', 0.7591882944107056), • (u'manipulating', 0.716569185256958), • (u'visualization', 0.7084065675735474), • (u'matlab', 0.7055898904800415), • (u'mining', 0.700824499130249), • (u'unstructured', 0.6868686676025391), • (u'regression', 0.6701105833053589), • (u'algorithms', 0.6691791415214539), • (u'natural', 0.6633298397064209), • (u'engines', 0.6632224321365356) Algorithm: word2vec synonym
  • 20. Visualization • [(u'tableau', 0.7196237444877625), • (u'graph', 0.7084065675735474), • (u'matlab', 0.6993618011474609), • (u'libraries', 0.6821463108062744), • (u'visualizations', 0.6746233701705933), • (u'mining', 0.6517949104309082), • (u'spss', 0.651625394821167), • (u'text', 0.6145033836364746), • (u'qlikview', 0.6053836345672607), • (u'js', 0.5960412621498108)] Algorithm: word2vec synonym
  • 21. Machine Learning • (u'learning', 0.8338875770568848), • (u'algorithms', 0.7662283182144165), • (u'natural', 0.7161275744438171), • (u'physics', 0.707731306552887), • (u'mining', 0.6965328454971313), • (u'ideally', 0.6682661175727844), • (u'graph', 0.6596766710281372), • (u'predictive', 0.656450629234314), • (u'applied', 0.6529620885848999), • (u'statistics', 0.6500071883201599) Algorithm: word2vec synonym
  • 22. Fun with words data + engineer-software = {cleansing, analyst, modeler, scientist} python + ruby - html = {perl,scala,bash, scripting} storm + hadoop - scripting = {hive,hbase,spark,pig} visualizations +algorithms-predictive = {backend,libraries,js,jquery}
  • 24. Digging deeper … • Create a data story, i.e put all the visualizations and insights in a dashboard create an infographic using tableau, d3 , … • Get data (say from crunchbase) on the companies that are hiring and gure out which industries dominate in the data world • Get data for atleast the past 6 months and have exact statistics for skills in the data world. Advanced text analytics (bi-gram, tri-gram modeling, topic modeling) • Create an app that gives tells you how “hot” your skills are and what skills are easiest for you to acquire to become “hotter”.
  • 25. Right questions? Take different slices of the data and look for patterns that might be interesting to you? Retail: What effects customers shopping habits? what are the control variables? are promos, discounts influencing any of this habits? Crime: What are the sequence of crimes that happen every day? Do initiatives led by government or non-prot organizations have an effect on certain crime rates? Education: Does regular feedback to parents about their children’s education have an effect on the grades or engagement of the children? Healthcare: Does sending preventive care emails reduce knee surgeries?
  • 27. Interview Process • 3-5 hours long • Depending on company size 4-6 people • Statistics white boarding … A/B testing calculations, • Formulation of a machine learning use case with parameter tuning, edge cases relevant to the company • Open question that the team is trying to solve • CS Algorithms … cracking the coding interview. • Databases, SQL queries … http://deblivingdata.net/wp-content/uploads/2014/05/DSTalk.slides.html
  • 28. Now that you have landed the job …
  • 29. References • Data Sources: data.gov, kaggle, open city data • Volunteering opportunities: Datakind, BayesImpact, Data For good • DS Schools: Insight Data Sciences, Zipan Academy, … • sqlzoo.net • meetup.com