Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Call Girls Devanahalli Just Call đ 7737669865 đ Top Class Call Girl Service B...
Â
From Rocket Science to Data Science
1. From Rocket Science to
Data Science
Sanghamitra Deb
Data Scientist, Accenture Tech Lab
2. Sexiest Job of the 21st century
Nate Silver predicted
correctly how all 50
states would go in the
presidential election
2012
Target predicted teen
pregnancy from retail data.
3. The Big Data Challenge
âWith the need for data
scientists growing at about
3x those for statisticians
and BI analystsâŚ. and an
anticipated 100,000+
person analytic talent
shortage through 2020⌠â
gartner article
â⌠three core data science skills: data management, analytics modeling and
business analysis. But beyond these, thereâs an art to data science. We detail several
soft skills that our research showed are also critical to success, i.e., communication,
collaboration, leadership, creativity, discipline and passion (for information and
truth).â
4. Who are you?
⢠Front Engineer UX/UI
⢠Backend engineer
⢠Project Manager
⢠Academic (PhD, physics, neuroscience, economics,CS
⌠) trying to ďŹnd a niche in tech industry
⢠Quantitative background , curiosity and ability to
understand business needs.
Start a data driven project relevant to the industry you want to join
5. Where to start
blogs: yhat, data robot, datatau,
upshot âŚ
twitter: follow data science
newsâŚ
Data Exploration/Discovery âŚ
open a dataset in your favorite
coding language: Python, R ,
scala, julia, âŚ
Learn to pipe data in to a
database such as MySQL/
MongoDB
Kaggle competitions, live and
older ones⌠e.g.: digit
recognition, titanic
Data Frameworks: Apache Spark.
Do a few online courses on data science, big data,
machine learning, python, R, ⌠from coursera, udemy,
khan academy, ⌠form study groups, go to meetups.
pros: DIY , bite size videos, ďŹexibility, discussion
forums, interactivity, great way to ďŹgure out if a new ďŹeld
is interesting.
cons: DIY, choosing the correct course, signing up and
not participating after ďŹrst few weeks.
6. Small Data Project Flow
Get open source data.
Sources: city data
(SanFrancisco, LA, Seattle,
Chicago, transit data,âŚ)
Load it up on
Python, if the data
is too big I will put
in MySQL (for
structured data) or
mongodb for free
form json.
Machine Learning, Statistics ,
counting statistics and
histogram are very powerful. If
you are a python user data frameworks such
as âGraphLabâ is open source & easy to
learn.
Create a dashboard/
viz/app
Ask the right
Question!!!
7. Data Wrangling/Cleaning
⢠Open your data set and proďŹle it
⢠Look for missing data, bad data
points vs true outliers
⢠Pattern of your data, is it a
phone number, timestamps or a
social security number? is it
structured data or unstructured
text
⢠Prep your data, identify the
features that inďŹuence your
outcome, feature selection and
feature engineering.
8. Lets start âŚ
Question: What is a Data Scientist?
Data : scraped indeed.com for all jobs containing
âdataâ in the title. ~5000 jobs âŚ
Meta Data: Job title, job description, city, state
job description: unstructured textâŚ
10. What are the data jobs?
participates in evaluation of hardware and software platforms and
integrating systems as they relate to the data architecture
participates in selection of application packages, agency services,
and technology/infrastructure capabilities to ensure alignment to
data architecture works in an environment, which includes data
modeling, data design, metadata and repository creation reviews
object and data models and the metadata repository to structure
the data for better management and quicker access plays a
liaison role with business data owner/stewards
'data_architect'+
description title
job title disambiguation
{âdata_integration_architect',
'data_architect',
'data_warehouse_architect',
'data_warehouse_lead',
âsr_data_architect'}
data architect
data scientist
data engineer
data entry
database developer
data analyst Algorithm: word2vec synonym
24. Digging deeper âŚ
⢠Create a data story, i.e put all the visualizations and insights in
a dashboard create an infographic using tableau, d3 , âŚ
⢠Get data (say from crunchbase) on the companies that are
hiring and ďŹgure out which industries dominate in the data
world
⢠Get data for atleast the past 6 months and have exact
statistics for skills in the data world. Advanced text analytics
(bi-gram, tri-gram modeling, topic modeling)
⢠Create an app that gives tells you how âhotâ your skills are and
what skills are easiest for you to acquire to become âhotterâ.
25. Right questions?
Take different slices of the data and look for patterns that
might be interesting to you?
Retail: What effects customers shopping habits?
what are the control variables? are promos, discounts inďŹuencing any of this habits?
Crime: What are the sequence of crimes that happen every day? Do initiatives led
by government or non-proďŹt organizations have an effect on certain crime rates?
Education: Does regular feedback to parents about their childrenâs education
have an effect on the grades or engagement of the children?
Healthcare: Does sending preventive care emails reduce knee surgeries?
27. Interview Process
⢠3-5 hours long
⢠Depending on company size 4-6 people
⢠Statistics white boarding ⌠A/B testing
calculations,
⢠Formulation of a machine learning use case
with parameter tuning, edge cases relevant
to the company
⢠Open question that the team is trying to solve
⢠CS Algorithms ⌠cracking the coding
interview.
⢠Databases, SQL queries âŚ
http://deblivingdata.net/wp-content/uploads/2014/05/DSTalk.slides.html
29. References
⢠Data Sources: data.gov, kaggle, open city data
⢠Volunteering opportunities: Datakind, BayesImpact,
Data For good
⢠DS Schools: Insight Data Sciences, ZipďŹan
Academy, âŚ
⢠sqlzoo.net
⢠meetup.com