Predict oscars (4:17)

Predicting the Oscars with Data Science
http://bit.ly/tf-predict-oscars

About me
• Jasjit Singh
• Self-taught developer
• Worked in ﬁnance & tech
• Co-Founder Hotspot
• Thinkful General Manager

About us
Thinkful prepares students for web development &
data science jobs with 1-on-1 mentorship programs

What’s your background?
• I have a software background
• I have a math or stats background
• None of the above

Data Science Process
• Frame the question.
• Collect the raw data.
• Process the data.
• Explore the data.
• Communicate results.

Frame the question
• Who will win the Oscar for Best Picture?

Collect the Data
• What kind of data do we need?
• Financial data (Budget, box ofﬁce…)
• Reviews, ratings and scores.
• Awards and nominations.

Process the data
• How’s the data “dirty” and how can we ﬁx it?
• User input, redundancies, missing data…
• Formatting: adapt the data to meet certain
speciﬁcations.
• Cleaning: detecting and correcting
corrupt or inaccurate records.

Explore the data
• What are the meaningful patterns in the
data?
• How meaningful is each data point for our
predictions?

Goals
• Introduction to a data scientist's tools and
methods:
• Jupyter notebooks, numpy, pandas,
sklearn…
• Overview of basic machine learning
concepts:
• Data formatting and cleaning, Decision
trees, Overﬁtting, Random Forests…

Jupyter Notebooks
• One of data scientist’s everyday tools.
• Find the links in our classroom tool.
• Contains cells with code.

NumPy
• The fundamental package for scientiﬁc
computing with Python.
• Provides powerful multi-dimensional array
objects.
• Many methods for fast operations on arrays.

Pandas
• Fundamental high-level building block for
doing practical, real world data analysis in
Python.
• Built on top of NumPy.
• Offers data structures and operations for
manipulating numerical tables and time
series.

Scikit-learn
• Python module for machine learning.
• Provides a large menu of libraries for
scientiﬁc computation, such as integration,
interpolation, signal processing, linear
algebra, statistics, etc.

Initial imports and loading data with Pandas

Understanding your data
• .head(n) method: Returns ﬁrst n rows.
• .value_counts() method: Returns the counts
of unique values in the DataFrame.

Formatting your Data
• Rate values in a non-numeric format. Thus,
we will need to assign each rate a unique
integer so that Python can handle the
information.
• With the .ix method you create a subset of
rows and assign a value to a certain variable
of that subset of observations.

Decision Trees
• It breaks down a dataset into smaller and
smaller subsets.
• The ﬁnal result is a model with a tree
structure that has:
• Decision nodes: ask a question and have
two or more branches.
• Leaf nodes: represent a classiﬁcation or
decision.

Classiﬁcation vs Regression
• Classiﬁcation — Predict categories.
• Identifying group membership.
• Regression — Predict values.
• Involves estimating or predicting a
response.

Creating your ﬁrst Decision Tree
You will use the scikit-learn and numpy
libraries to build your ﬁrst decision tree. We
will need the following to build a decision tree
• target: A one-dimensional numpy array
containing the target from the train data.
• features: A multidimensional numpy array
containing the features/predictors from the
train data.

Creating your ﬁrst Decision Tree

Importances and Score
• .feature_importances_ attribute: tells us
how important the features are for the ﬁnal
result.
• .score() method: returns the mean accuracy
of our ﬁtting.

Pretty bad results :(
Let’s improve it!

Overﬁtting
• Resulting model too tied to the training set.
• It doesn’t generalize to new data, which is
the point of prediction.

Random Forest Classifier
• Random Forest Classifiers use many
Decision Trees to build a classifier.
• We introduce a bit of randomness.
• Each Tree can give a different answer (a
vote). The final classification is the most
common amongst the Trees.

Predicting with Random Forest Classiﬁers

The End
Nothing happened after that.
Right?? RIGHT??

We can predict the Oscars
Except for 2017 ¯_(ツ)_/¯

More about Thinkful
• Anyone who’s committed can learn to code
• 1-on-1 mentorship is the best way to learn
• Flexibility! Learn anywhere, anytime, & at your
own pace

Our Program
You’ll learn concepts, practice with drills, and build
capstone projects — all guided by a personal mentor

Our Mentors
Mentors have, on average, 10+ years of experience

Data Science Syllabus
• Managing data with SQL and Python
• Modeling with both supervised and unsupervised
models
• Data visualization and communicating with data
• Technical interviews + Career prep

Our Results
Job Titles after GraduationMonths until Employed

Special Introductory Offer
• Prep course for 50% off — $250 instead of $500
• Covers math, stats, Python, and data science toolkit
• Option to continue into full program
• Talk to me (or email me) if you’re interested

October 2015
Questions?
jas@thinkful.com
schedule a call through thinkful.com

Predict oscars (4:17)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Predict oscars (4:17)

Ähnlich wie Predict oscars (4:17) (20)

Mehr von Thinkful

Mehr von Thinkful (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Predict oscars (4:17)