SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Data Science for Advanced
Dummies
Introduction to Big Data
What is Big Data?
What makes data, “Big” Data?
2
Big Data Definition
• No single standard definition…
“Big Data” is data whose scale, diversity, and complexity require new architecture,
techniques, algorithms, and analytics to manage it and extract value and hidden
knowledge from it…
3
Characteristics of Big Data:
1-Scale (Volume)
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
4
Exponential increase in
collected/generated data
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video, sequences, time
series, social media data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be generating/collecting many
types of data
5
To extract knowledge all these types of data need to
linked together
Characteristics of Big Data:
3-Speed (Velocity)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions  missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what you like  send
promotions right now for store next to you
• Healthcare monitoring: sensors monitoring your activities and body  any abnormal
measurements require immediate reaction
6
Big Data: 3V’s
7
Some Make it 4V’s
8
Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the
collected data in a timely manner and in a scalable fashion
9
What Technology Do We Have
For Big Data ??
10
11
Which Movie Do You
Like?
Designing a movie recommendation system
Can you describe the movie you would
like?
Recommender Systems
• Movie Problem: Find “Similar” movies to my taste.
• Movies have many “Features” – Western, Clint Eastwood, Tarantino, 90s,
• A viewer as preferences –”Features” – Likes ‘Western’; hates ‘content based
filtering movies’
Netflix Prize
From Wikipedia, the free encyclopedia
The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict
user ratings for films, based on previous ratings without any other information about the users or
films, i.e. without the users or the films being identified except by numbers assigned for the contest.
The competition was held by Netflix, an online DVD-rental service, and was open to anyone not
connected with Netflix (current and former employees, agents, close relatives of Netflix employees,
etc.) or a resident of Cuba, Iran, Syria, North Korea, Burma or Sudan.[1] On 21 September 2009, the
grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's
own algorithm for predicting ratings by 10.06%.[2]
A Highly Simple Solution
Comedy Action Blockbu
ster
…. … … … Is Tom Cruise
the Lead?
6 5 0 … … … … 1
7 8 1 … … … … 0
… … … … … … … …
Saurav
2
8
…
Saurav’s Score = .2*Comedy + .1*Action + 10*Blockbuster + …+ … -.9*Tom Cruise
Comedy Action Blockbu
ster
…. … … … Is Tom Cruise
the Lead?
2 8 0 … … … … 0
Saurav
7
Quiz #1
• Is google search a recommender systems?
Supervised Learning
Design an Accurate Vending Machine
This is a Classification Problem – This line is called the
Decision Boundary or Separating Hyper plane
Quiz #2
• Give an example where you think supervised learning is used –
• Hint – Spam vs. Ham in Emails
Some Common Supervised Algorithms
• Classification
• Decision Trees
• Random Forest
• Support Vector Machine
• Neural Network
• Logistic Regression
• Regression
• Linear Regression
• Non-linear Regression
• Logistic Regression
• Association Rule Learning
• Arules
• Even Sequence Analysis
In Action
• Handwriting Recognition System
• Classification
• Input?
• Output?
200 200 10 …
200 200 8 …
180 200 20 …
… … … …
6
Features Labels
Note the
similarity
Classification Algorithms Try to
Separate items into “Classes”
Demo
Quiz #3
• Is driverless cars a learning problem?
• What are the features?
• What is the label?
Unsupervised Learning
Flowers
Tetramerous flower of Ludwigia
octovalvis showing petals and
sepals
Sepal lengthSepal width Petal length Petal width
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2
5.4 3.9 1.7 0.4
4.6 3.4 1.4 0.3
5.0 3.4 1.5 0.2
4.4 2.9 1.4 0.2
4.9 3.1 1.5 0.1
5.4 3.7 1.5 0.2
Clustering
• Cluster: A collection/group of data objects/points
• similar (or related) to one another within the same group
• dissimilar (or unrelated) to the objects in other groups
• Cluster analysis
• find similarities between data according to characteristics underlying the data and
grouping similar data objects into clusters
• Unsupervised learning
• no predefined classes for a training data set
• Two general tasks: identify the “natural” clustering number and properly grouping
objects into “sensible” clusters
Plot
Quiz #4
• How many types (species) of flowers are there?
Can you see 3 species?
Examples of Unsupervised Learning
• Clustering
• Dimensionality Reduction
• Feature Extraction
• Self Organizing Maps
Quiz #5
• Which of the below are supervised and which are unsupervised
• Take a collection of 1000 essays written on the US Economy, and find a way to automatically
group these essays into a small number of groups of essays that are somehow "similar" or
"related".
• Examine a large collection of emails that are known to be spam email, to discover if there
are sub-types of spam mail.
• Given historical data of children‘s ages and heights, predict children's height as a function of
their age.
• Have a computer examine an audio clip of a piece of music, and classify whether or not
there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical
instruments (and no vocals).
• Given a set of news articles from many different news websites, find out what are the main
topics covered.
• Suppose you are working on weather prediction, and you would like to predict
whether or not it will be raining at 5pm tomorrow. You want to use a learning
algorithm for this. Would you treat this as a classification or a regression problem?
Where is Big Data???
Lets start from (Big) Data
• How do you design this system?
• How do you pay for this?
• How do you trust someone to do it
right?
• How expensive will such a system be?
I need Data. Good reusable data. High quality data. Else
all the smarts are waste.
Here comes BIG Data to help
• Image
• Audio
• Learning
• HUGE data sets
Thank you!

Weitere ähnliche Inhalte

Andere mochten auch

Introduction of Machine Learning
Introduction of Machine LearningIntroduction of Machine Learning
Introduction of Machine Learning
Mohammad Hossain
 

Andere mochten auch (20)

ES6 metaprogramming unleashed
ES6 metaprogramming unleashedES6 metaprogramming unleashed
ES6 metaprogramming unleashed
 
Quero trabalhar com big data data science, como faço-
Quero trabalhar com big data   data science, como faço-Quero trabalhar com big data   data science, como faço-
Quero trabalhar com big data data science, como faço-
 
Data Science & Big Data, made in Switzerland
Data Science & Big Data, made in SwitzerlandData Science & Big Data, made in Switzerland
Data Science & Big Data, made in Switzerland
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
7 historical software bugs
 7 historical software bugs 7 historical software bugs
7 historical software bugs
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with Python
 
Introduction of Machine Learning
Introduction of Machine LearningIntroduction of Machine Learning
Introduction of Machine Learning
 
TDC2016SP - Trilha Data Science
TDC2016SP - Trilha Data ScienceTDC2016SP - Trilha Data Science
TDC2016SP - Trilha Data Science
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?
 
[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventure
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with python
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science
 
Machine learning for dummies - Azuges November 2016
Machine learning for dummies - Azuges November 2016Machine learning for dummies - Azuges November 2016
Machine learning for dummies - Azuges November 2016
 
Machine learning for dummies
Machine learning for dummiesMachine learning for dummies
Machine learning for dummies
 
Getting Started with Amazon Machine Learning
Getting Started with Amazon Machine LearningGetting Started with Amazon Machine Learning
Getting Started with Amazon Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 

Ähnlich wie Data science for advanced dummies

rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
Jeff Heaton
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
Neeraj Tewari
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
Sulman Ahmed
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousing
Vaishnavi
 

Ähnlich wie Data science for advanced dummies (20)

Introduction to apache spark and machine learning
Introduction to apache spark and machine learningIntroduction to apache spark and machine learning
Introduction to apache spark and machine learning
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
machine learning
machine learningmachine learning
machine learning
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information SecurityBSidesLV 2013 - Using Machine Learning to Support Information Security
BSidesLV 2013 - Using Machine Learning to Support Information Security
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousing
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 

Kürzlich hochgeladen

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Data science for advanced dummies

  • 1. Data Science for Advanced Dummies
  • 2. Introduction to Big Data What is Big Data? What makes data, “Big” Data? 2
  • 3. Big Data Definition • No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… 3
  • 4. Characteristics of Big Data: 1-Scale (Volume) • Data Volume • 44x increase from 2009 2020 • From 0.8 zettabytes to 35zb • Data volume is increasing exponentially 4 Exponential increase in collected/generated data
  • 5. Characteristics of Big Data: 2-Complexity (Varity) • Various formats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data 5 To extract knowledge all these types of data need to linked together
  • 6. Characteristics of Big Data: 3-Speed (Velocity) • Data is begin generated fast and need to be processed fast • Online Data Analytics • Late decisions  missing opportunities • Examples • E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you • Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction 6
  • 8. Some Make it 4V’s 8
  • 9. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 9
  • 10. What Technology Do We Have For Big Data ?? 10
  • 11. 11
  • 12. Which Movie Do You Like? Designing a movie recommendation system
  • 13. Can you describe the movie you would like?
  • 14. Recommender Systems • Movie Problem: Find “Similar” movies to my taste. • Movies have many “Features” – Western, Clint Eastwood, Tarantino, 90s, • A viewer as preferences –”Features” – Likes ‘Western’; hates ‘content based filtering movies’ Netflix Prize From Wikipedia, the free encyclopedia The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest. The competition was held by Netflix, an online DVD-rental service, and was open to anyone not connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) or a resident of Cuba, Iran, Syria, North Korea, Burma or Sudan.[1] On 21 September 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.[2]
  • 15.
  • 16.
  • 17. A Highly Simple Solution Comedy Action Blockbu ster …. … … … Is Tom Cruise the Lead? 6 5 0 … … … … 1 7 8 1 … … … … 0 … … … … … … … … Saurav 2 8 … Saurav’s Score = .2*Comedy + .1*Action + 10*Blockbuster + …+ … -.9*Tom Cruise Comedy Action Blockbu ster …. … … … Is Tom Cruise the Lead? 2 8 0 … … … … 0 Saurav 7
  • 18. Quiz #1 • Is google search a recommender systems?
  • 19. Supervised Learning Design an Accurate Vending Machine This is a Classification Problem – This line is called the Decision Boundary or Separating Hyper plane
  • 20. Quiz #2 • Give an example where you think supervised learning is used – • Hint – Spam vs. Ham in Emails
  • 21. Some Common Supervised Algorithms • Classification • Decision Trees • Random Forest • Support Vector Machine • Neural Network • Logistic Regression • Regression • Linear Regression • Non-linear Regression • Logistic Regression • Association Rule Learning • Arules • Even Sequence Analysis
  • 22. In Action • Handwriting Recognition System • Classification • Input? • Output? 200 200 10 … 200 200 8 … 180 200 20 … … … … … 6 Features Labels
  • 23. Note the similarity Classification Algorithms Try to Separate items into “Classes”
  • 24. Demo
  • 25. Quiz #3 • Is driverless cars a learning problem? • What are the features? • What is the label?
  • 27. Flowers Tetramerous flower of Ludwigia octovalvis showing petals and sepals Sepal lengthSepal width Petal length Petal width 5.1 3.5 1.4 0.2 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 5.0 3.6 1.4 0.2 5.4 3.9 1.7 0.4 4.6 3.4 1.4 0.3 5.0 3.4 1.5 0.2 4.4 2.9 1.4 0.2 4.9 3.1 1.5 0.1 5.4 3.7 1.5 0.2
  • 28. Clustering • Cluster: A collection/group of data objects/points • similar (or related) to one another within the same group • dissimilar (or unrelated) to the objects in other groups • Cluster analysis • find similarities between data according to characteristics underlying the data and grouping similar data objects into clusters • Unsupervised learning • no predefined classes for a training data set • Two general tasks: identify the “natural” clustering number and properly grouping objects into “sensible” clusters
  • 29. Plot
  • 30. Quiz #4 • How many types (species) of flowers are there?
  • 31. Can you see 3 species?
  • 32. Examples of Unsupervised Learning • Clustering • Dimensionality Reduction • Feature Extraction • Self Organizing Maps
  • 33. Quiz #5 • Which of the below are supervised and which are unsupervised • Take a collection of 1000 essays written on the US Economy, and find a way to automatically group these essays into a small number of groups of essays that are somehow "similar" or "related". • Examine a large collection of emails that are known to be spam email, to discover if there are sub-types of spam mail. • Given historical data of children‘s ages and heights, predict children's height as a function of their age. • Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals). • Given a set of news articles from many different news websites, find out what are the main topics covered. • Suppose you are working on weather prediction, and you would like to predict whether or not it will be raining at 5pm tomorrow. You want to use a learning algorithm for this. Would you treat this as a classification or a regression problem?
  • 34. Where is Big Data???
  • 35. Lets start from (Big) Data • How do you design this system? • How do you pay for this? • How do you trust someone to do it right? • How expensive will such a system be? I need Data. Good reusable data. High quality data. Else all the smarts are waste.
  • 36. Here comes BIG Data to help • Image • Audio • Learning • HUGE data sets