This Data Science with Python presentation will help you understand what is Data Science, basics of Python for data analysis, why learn Python, how to install Python, Python libraries for data analysis, exploratory analysis using Pandas, introduction to series and dataframe, loan prediction problem, data wrangling using Pandas, building a predictive model using Scikit-Learn and implementing logistic regression model using Python. The aim of this video is to provide a comprehensive knowledge to beginners who are new to Python for data analysis. This video provides a comprehensive overview of basic concepts that you need to learn to use Python for data analysis. Now, let us understand how Python is used in Data Science for data analysis.
This Data Science with Python presentation will cover the following topics:
1. What is Data Science?
2. Basics of Python for data analysis
- Why learn Python?
- How to install Python?
3. Python libraries for data analysis
4. Exploratory analysis using Pandas
- Introduction to series and dataframe
- Loan prediction problem
5. Data wrangling using Pandas
6. Building a predictive model using Scikit-learn
- Logistic regression
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you'll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques.
Learn more at: https://www.simplilearn.com
Data Science With Python | Python For Data Science | Python Data Science Course | Simplilearn
1.
2. What’s in it for you?
What is Data Science?
Basics of Python for Data Analysis
Why learn Python?
How to Install Python?
Python Libraries for Data Analysis
Exploratory analysis using Pandas
Introduction to series and data frame
Loan Prediction Problem
Data Wrangling using Pandas
Building a Predictive Model using Scikit-Learn
Logistic Regression
3. What is Data Science?
Example
Restaurants can predict how many
customers will visit on a weekend
and plan their food inventory to
handle the demand
Service Planning
System can be trained based on
customer behavior pattern to
predict the likelihood of a
customer buying a product
Customer Prediction
Data Science is about finding and exploring data in real world, and then using that knowledge to solve
business problems
5. Why Python?
The usage statistics based on google trends depict that Python is currently more popular than R or SAS
for Data Science!
6. Why Python?
SPEED PACKAGES DESIGN GOAL
But, there are various factors you should consider before deciding which language is best for
your Data Analysis:
7. Why Python?
SPEED PACKAGES DESIGN GOAL
But, there are various factors you should consider before deciding which language is best for
your Data Analysis:
8. Why Python?
SPEED PACKAGES DESIGN GOAL
But, there are various factors you should consider before deciding which language is best for
your Data Analysis:
9. Why Python?
For instructor
Design Goal:
Syntax rules in python helps in building application with concise and readable code base
Packages:
There are numerous packages in Python to choose from like pandas to aggregate & manipulate data, Seaborn or
matplotlib to visualize relational data to mention a few
Speed:
Studies suggest that Python is faster than several widely used languages. Also, we can further speed up python
using algorithms and tools
11. Installing Python
• Go to: http://continuum io/downloads
• Scroll down to download the graphical installer
suitable for your operating system
After successful installation, you can launch Jupyter notebook from Anaconda Navigator
Anaconda comes with pre-installed libraries
In this tutorial, we will be working on Jupyter notebook using Python 3
12. Python libraries for Data Analysis
Let’s get to know some
important Python libraries for
Data Analysis
13. Python libraries for Data Analysis
There are many interesting libraries that have made Python popular with Data Scientists:
14. Python libraries for Data Analysis
Most useful library for variety of high level science and engineering modules like discrete Fourier
transform, Linear Algebra, Optimization and Sparse matrices
Pandas for structured data operations and manipulations It is extensively
used for data munging and preparation
The most powerful feature of NumPy is n-dimensional array This library also contains basic linear algebra
functions, Fourier transforms, advanced random number capabilities
Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots
contains a lot of efficient tools for machine learning and statistical modeling including classification,
regression, clustering and dimensional reduction
For instructor
15. Python libraries for Data Analysis
Additional libraries, you might need:
Networkx & I graph
Tensorflow
BeautifulSoup
OS
16. Python libraries for Data Analysis
os for Operating system and file operations
networkx and igraph for graph based data manipulations
TensorFlow
BeautifulSoup for scrapping web
For instructor
17. What is SciPy?
SciPy is a set of scientific and numerical tools for Python
• It currently supports special functions, integration, ordinary
differential equation (ODE) solvers, gradient optimization, and
others
• It has fully-featured versions of the linear algebra modules
• It is built on top of NumPy
18. What is NumPy?
NumPy is the fundamental package for scientific computing with
Python. It contains:
• Powerful N-dimensional array object
• Tools for integrating C/C++ and Fortran code
• It has useful linear algebra, Fourier transform, and random number
capabilities
19. What is Pandas?
• The most useful Data Analysis library in Python
• Instrumental in increasing the use of Python in Data Science
community
• It is extensively used for data munging and preparation
Pandas is used for structured data operations & manipulations
20. Exploratory analysis using Pandas
Let’s understand the two most common terms used in Pandas:
Series Dataframe
21. Exploratory analysis using Pandas
A Series is a one-dimensional object that can
hold any data type such as integers, floats
and strings
Series
A DataFrame is a two dimensional object
that can have columns with potential
different data types
DataFrame
Pandas
24. Exploratory analysis using Pandas
Problem Statement: Based on customer data, predict whether a particular customer’s loan
will be approved or not
LOAN
30. Exploratory analysis using Pandas
Categorical values’ distribution using matplotlib library:
Credit History
31. Exploratory analysis using Pandas
Hence, ‘loanAmount’ and ‘ApplicantIncome’ needs
Data Wrangling as some extreme values are observed!
32. Data Wrangling using Pandas
Before proceeding further,
let’s understand what is
Data Wrangling and why we
need it?
33. Data Wrangling: Process of cleaning and unifying messy
and complex data sets
It reveals more information about your data
Enables decision-making skills in the organization
Helps to gather meaningful and precise data for the business
Data Wrangling using Pandas
36. Data Wrangling using Pandas
You can access the data types of each column in a DataFrame:
37. Data Wrangling using Pandas
You can perform basic math operations to know more about your data:
38. Data Wrangling using Pandas
You can combine your DataFrames:
Combining DataFrame objects can be done using simple concatenation (provided they have the same columns):
Creates an array of
specified shape and fills it
with random values using
numpy
45. Model Building using Scikit-learn
Extracting the variables and then splitting the data into train and test:
46. Model Building using Scikit-learn
In this case, we will use Logistic
Regression model
Logistic Regression is appropriate
when the dependent variable is
binary
47. Model Building using Scikit-learn
Fitting the data into Logistic Regression model:
49. Model Building using Scikit-learn
To describe the performance of the model let’s build the confusion matrix on test data:
50. Model Building using Scikit-learn
Let’s calculate ACCURACY and PRECISION from confusion matrix:
False Positive
True Positive
False Negative
True Negative
51. Model Building using Scikit-learn
Let’s calculate ACCURACY and PRECISION from confusion matrix:
• Accuracy
Overall, how often is the classifier correct?
(TP+TN)/total = (103+18)/150 = 0.80
• Precision
When it predicts yes, how often is it correct?
TP/predicted yes = 103/130 = 0.79
52. Model Building using Scikit-learn
We can also find the accuracy through Python module:
54. Summary
Data Science & its popularity with python Data Analysis Libraries in python Series and dataframe in pandas
Logistic Regression using scikitData wranglingExploratory analysis