SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Introduction to
Data Science

Prithwis Mukerjee, PhD
Praxis Business School, Calcutta
prithwis mukerjee, ph.d.
Agenda
●
●

●

●

Why data science ?
Techniques
○ Statistics
○ Data Mining
○ Visualisation
Tools & Platforms
○ R
○ Hadoop / MapReduce
○ Real Time Systems
Business Domains

prithwis mukerjee, ph.d.
prithwis mukerjee, ph.d.
Volume
Data is being acquired from a
variety of sources
●
●
●
●
●
●
●

EFT in Banks, Credit card
payments
Cell phones
Sensors attached to a variety
of equipment
Surveillance cameras, CCTV
Social Media Updates
Blogs
Websites

prithwis mukerjee, ph.d.
Variety / Velocity
●
●
●
●
●
●

Numeric data
Structured text data
Unstructured text data
Images
Sound and video recordings
Graph Nodes
○ Social Media “friends”
○ Websites linked to each
other

prithwis mukerjee, ph.d.

Data is being generated fast and is
becoming obsolete or useless
equally faster
●
●
●

Realtime ( or near realtime)
data from sensors, cameras
Website traffic
Social media “trends”
So what is Big Data ?
●
●
●

Volume
Velocity
Variety ?

A new term coined by
IT vendors to push new
technology like
●
●
●

prithwis mukerjee, ph.d.

Map Reduce
Hadoop
NOSQL

A new way to
●
●
●
●
●

collect
store
manage
analyse
visualise data
Big Data is like Crude Oil { not new Oil }
Think of data as crude oil !
Big Data is like extracting the
crude oil, transporting it in mega
tankers, pumping it through
pipelines and storing it in
massive silos

But what
about
refining ?
prithwis mukerjee, ph.d.
The Science (and Art ) of Data
Think of data as crude oil !

Data Science
●

Big Data is like extracting the
crude oil, transporting it in mega
tankers, pumping it through
pipelines and storing it in
Refining
massive silos

prithwis mukerjee, ph.d.

●
●
●

Discovering what we do not
know about the data
Obtaining predictive, actionable
insight
Creating data products that have
business impacts
Communicating relevent
business stories
Two Perspectives

Programming
or “Hacking”
Skills

Machine
Learning

Mathematics,
Statistics
Knowledge

Data
Science
RDBMS
ERP / BI

Operations
Research

Business
Domain
Knowledge

prithwis mukerjee, ph.d.
10 Things {most} Data Scientists do ...
1. Ask good questions

6. Create models, algorithms

What is what ?

7. Under data relationships

We do not know ! We would like to
know

8. Tell the machine how to learn
from the data

2. Define, Test Hypothesis, Run
experiments
3, Scoop, scrape, sample business
data
4. Wrestle and tame data
5. Play with data, discover
unknowns

prithwis mukerjee, ph.d.

9. Create data products that
deliver actionable insights
10. Tell relevant business stories
from data
Statistics - World of Data
●

Data comes in various types
○ Nominal - colour, gender,
PIN code
○ Ordinal - scale of 1-10,
{high, medium, low}
○ Interval - Dates,
Temperature (Centigrade)
○ Ratio - length, weight, count

prithwis mukerjee, ph.d.

●

Data comes in various
structure
○ Structured data - nominal,
ordinal, interval, ratio
○ Unstructured text - email,
tweets, reviews
○ Images, voice prints
○ graphs, networks - social
media friendships, likes
Descriptive Statistics
●

Numeric Description
○ Mean, Median, Mode
○ Quartile, Percentile
○ Variance / Standard
Deviation

prithwis mukerjee, ph.d.
Statistics : The Path Ahead

Probability,
Distributions

prithwis mukerjee, ph.d.

Testing of
Hypothesis

Regression,
Testing

Predictive
Analysis
Data Mining / Machine Learning
Is the process of obtaining

Typical tasks are

●

novel

●

classification

●

valid

●

clustering

●

potentially useful

●

association rules

●

understandable

●

sequential patterns

●

regression

●

deviation detection

patterns in data

prithwis mukerjee, ph.d.
Some definitions
Instance ( an item or record)
●

an observation that is
characterised by a number of
attributes
○
○

person - with attributes like age,
salary, qualification
sale - with product, quantity, price

Attribute
●

measuring characteristics of an
instance

Class
●

grouping of an instance into
○
○

acceptable, not acceptable
mammal, fish, bird
prithwis mukerjee, ph.d.

Nominal
●

colour, PIN code, state

Ordinal
●

ranking : tall, medium, short or
feedback on a scale of 1 - 10

Ratio
●

length, price, duration, quantity

Interval
●

date, temperature
Data Mining : Classification
Classification
●
●

Which loan applicant will not
default on the loan ?
Which potential customer will
respond to a mailer campaign
?

prithwis mukerjee, ph.d.
Classification Example
s
l
ca uou
ri
go ontin lass
c
ate c

l

a
ric

o

teg
ca

c

Test
Set

Learn
Classifier

prithwis mukerjee, ph.d.

Training
Set

Model
Data Mining : Clustering
Given a set of
unclassified data
points, how to find
a natural grouping
within them

●

Can we segment the market in
some way that is not yet known ?

prithwis mukerjee, ph.d.
Example of Document Clustering
Clustering points : 3204 article
from the Los Angeles Times
Similarity Measure : How many
words are common in these
documents ( after excluding some
common words )

prithwis mukerjee, ph.d.
Clustering of S&P Stock Data
●
●
●

●

Observe Stock Movements
every day.
Clustering points: Stock{UP/DOWN}
Similarity Measure: Two
points are more similar if
the events described by
them frequently happen
together on the same day.
We used association rules
to quantify a similarity
measure.

prithwis mukerjee, ph.d.
Regression
● Predict a value of a given continuous valued variable
based on the values of other variables, assuming a
linear or nonlinear model of dependency.
○

Greatly studied in statistics, neural network fields.

● Examples:
○

Predicting sales amounts of new product based on advertising
expenditure.

○

Predicting wind velocities as a function of temperature, humidity, air

○

pressure, etc.
Time series prediction of stock market indices.
prithwis mukerjee, ph.d.
Data Mining : Association Rules Mining
Association Rules
●

●

which products
should be kept
along with other
products
which two
products should
never be
discounted
together

prithwis mukerjee, ph.d.
Visualisation : The need to tell a story

prithwis mukerjee, ph.d.
Visualisation : The need to tell a story

prithwis mukerjee, ph.d.
Definitions
Data Mining
●

●

Is the process of extracting
unknown, valid and
actionable information from
large databases and using
this to make business
decisions
Non trivial process of
identifying valid, novel,
potentially useful and
understandable /
explainable patterns in data
prithwis mukerjee, ph.d.

Data Science is a rare combination of
multiple skills that include
●

Technology : obviously !

but also
●

●
●

Curiosity - a desire to go below
the surface and discover a
hypothesis that can be tested
Storytelling - create a business
story around the data
Cleverness - again obviously, to
look at the problem from different
angles
prithwis mukerjee, ph.d.
R : Your first step into Data Science

prithwis mukerjee, ph.d.

Try out this free interactive tutorial just now
Statistical Tools

prithwis mukerjee, ph.d.

http://r4stats.com/articles/popularity/
Some Comparisons

prithwis mukerjee, ph.d.
Map Reduce
●
●

●

Input : A set of (key, value)
pairs
User supplies two functions
○ Map (k,v) => List(k1,v1)
○ Reduce (k1, list(v1)) => v2
Output is the set of (k1,v2)
pairs

prithwis mukerjee, ph.d.
Hadoop
A programming framework that
allows you to run Map-Reduce jobs
on a distributed cluster of low cost
machines without having to bother
about anything except
●
●

the Map and Reduce functions
loading data into HDFS

1.

2.

3.
4.

prithwis mukerjee, ph.d.

HIVE
a. A plug-in that allows one to
use SQL like queries that are
converted into map-reduce
jobs
PIG
a. A scripting language for
writing long queries
HBASE
a. A non-relational DBMS
SQOOP
a. moves data to andfrom HDFS
Data-in-Flight

prithwis mukerjee, ph.d.
JavaScript for Data Visualisation

prithwis mukerjee, ph.d.
Business Domain
●

●

Financial Sector
○ Risk Management, Credit
Scoring
○ Predict Customer Spend
○ Stock and Investment
Analysis
○ Loan approval
Telecom Sector
○ Fraud Detection
○ Churn Prediction

prithwis mukerjee, ph.d.

●

●

Retail and Marketing
○ Market segmentation
○ Promotional strategy
○ Market Basket Analysis
○ Trend Analysis
Healthcare & Insurance
○ Fraud Detection
○ Drug Development
○ Medical Diagnostic Tools
Conclusion
●
●

●

●

Why data science ?
Techniques
○ Statistics
○ Data Mining
○ Visualisation
Tools & Platforms
○ R
○ Hadoop / MapReduce
○ Real Time Systems
Business Domains

Data Science is a rare combination of
multiple skills that include
●

but also
●

●
●

prithwis mukerjee, ph.d.

Technology : obviously !
Curiosity - a desire to go below
the surface and discover a
hypothesis that can be tested
Storytelling - create a business
story around the data
Cleverness - again obviously, to
look at the problem from different
angles
prithwis mukerjee, ph.d.
Thank You
Contact

This presentation is accessible at at
the blog

Prithwis Mukerjee
Professor, Praxis Business School

http://blog.yantrajaal.com

prithwis@praxis.ac.in

at the following URL
http://bit.ly/pm-datascience

prithwis mukerjee, ph.d.

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
Data Visualization
Data VisualizationData Visualization
Data Visualizationsimonwandrew
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Data Visualization With Tableau | Edureka
Data Visualization With Tableau | EdurekaData Visualization With Tableau | Edureka
Data Visualization With Tableau | EdurekaEdureka!
 
Data Visualization & Analytics.pptx
Data Visualization & Analytics.pptxData Visualization & Analytics.pptx
Data Visualization & Analytics.pptxhiralpatel3085
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientistVijayMohan Vasu
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
Data visualization tools & techniques - 1
Data visualization tools & techniques - 1Data visualization tools & techniques - 1
Data visualization tools & techniques - 1Korivi Sravan Kumar
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1Sonia Mim
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 

Was ist angesagt? (20)

What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Visualization With Tableau | Edureka
Data Visualization With Tableau | EdurekaData Visualization With Tableau | Edureka
Data Visualization With Tableau | Edureka
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Data Visualization & Analytics.pptx
Data Visualization & Analytics.pptxData Visualization & Analytics.pptx
Data Visualization & Analytics.pptx
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data science
Data scienceData science
Data science
 
Data visualization tools & techniques - 1
Data visualization tools & techniques - 1Data visualization tools & techniques - 1
Data visualization tools & techniques - 1
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 

Ähnlich wie Introduction to Data Science: Techniques, Tools and Business Applications

Data analytics career path
Data analytics career pathData analytics career path
Data analytics career pathRubikal
 
How to program your way into data science?
How to program your way into data science?How to program your way into data science?
How to program your way into data science?DeZyre
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsIRJET Journal
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringDataRobot
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningIRJET Journal
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big DataYasas Senarath
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxNagarajanG35
 
Data science a practitioner's perspective
Data science  a practitioner's perspectiveData science  a practitioner's perspective
Data science a practitioner's perspectiveAmir Ziai
 

Ähnlich wie Introduction to Data Science: Techniques, Tools and Business Applications (20)

Data science guide
Data science guideData science guide
Data science guide
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
How to program your way into data science?
How to program your way into data science?How to program your way into data science?
How to program your way into data science?
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data Analytics
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine Learning
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
 
Data science a practitioner's perspective
Data science  a practitioner's perspectiveData science  a practitioner's perspective
Data science a practitioner's perspective
 

Mehr von Prithwis Mukerjee

Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2Prithwis Mukerjee
 
Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3Prithwis Mukerjee
 
Currency, Commodity and Bitcoins
Currency, Commodity and BitcoinsCurrency, Commodity and Bitcoins
Currency, Commodity and BitcoinsPrithwis Mukerjee
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6Prithwis Mukerjee
 
World of data @ praxis 2013 v2
World of data   @ praxis 2013  v2World of data   @ praxis 2013  v2
World of data @ praxis 2013 v2Prithwis Mukerjee
 
BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2Prithwis Mukerjee
 
Lecture02 - Data Mining & Analytics
Lecture02 - Data Mining & AnalyticsLecture02 - Data Mining & Analytics
Lecture02 - Data Mining & AnalyticsPrithwis Mukerjee
 
ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?Prithwis Mukerjee
 
Data mining clustering-2009-v0
Data mining clustering-2009-v0Data mining clustering-2009-v0
Data mining clustering-2009-v0Prithwis Mukerjee
 
Data mining classification-2009-v0
Data mining classification-2009-v0Data mining classification-2009-v0
Data mining classification-2009-v0Prithwis Mukerjee
 
Business Intelligence Industry Perspective Session I
Business Intelligence   Industry Perspective Session IBusiness Intelligence   Industry Perspective Session I
Business Intelligence Industry Perspective Session IPrithwis Mukerjee
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 

Mehr von Prithwis Mukerjee (20)

Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2Bitcoin, Blockchain and the Crypto Contracts - Part 2
Bitcoin, Blockchain and the Crypto Contracts - Part 2
 
Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3Bitcoin, Blockchain and Crypto Contracts - Part 3
Bitcoin, Blockchain and Crypto Contracts - Part 3
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Thought controlled devices
Thought controlled devicesThought controlled devices
Thought controlled devices
 
Cloudcasting
CloudcastingCloudcasting
Cloudcasting
 
Currency, Commodity and Bitcoins
Currency, Commodity and BitcoinsCurrency, Commodity and Bitcoins
Currency, Commodity and Bitcoins
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 
04 Dimensional Analysis - v6
04 Dimensional Analysis - v604 Dimensional Analysis - v6
04 Dimensional Analysis - v6
 
Thought control
Thought controlThought control
Thought control
 
World of data @ praxis 2013 v2
World of data   @ praxis 2013  v2World of data   @ praxis 2013  v2
World of data @ praxis 2013 v2
 
BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2BIS 08a - Application Development - II Version 2
BIS 08a - Application Development - II Version 2
 
Lecture02 - Data Mining & Analytics
Lecture02 - Data Mining & AnalyticsLecture02 - Data Mining & Analytics
Lecture02 - Data Mining & Analytics
 
ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?ইন্টার্নেট কি এবং কেন ?
ইন্টার্নেট কি এবং কেন ?
 
Data mining clustering-2009-v0
Data mining clustering-2009-v0Data mining clustering-2009-v0
Data mining clustering-2009-v0
 
Data mining classification-2009-v0
Data mining classification-2009-v0Data mining classification-2009-v0
Data mining classification-2009-v0
 
Data mining arm-2009-v0
Data mining arm-2009-v0Data mining arm-2009-v0
Data mining arm-2009-v0
 
Data mining intro-2009-v2
Data mining intro-2009-v2Data mining intro-2009-v2
Data mining intro-2009-v2
 
PPM Lite
PPM LitePPM Lite
PPM Lite
 
Business Intelligence Industry Perspective Session I
Business Intelligence   Industry Perspective Session IBusiness Intelligence   Industry Perspective Session I
Business Intelligence Industry Perspective Session I
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 

Kürzlich hochgeladen

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Kürzlich hochgeladen (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Introduction to Data Science: Techniques, Tools and Business Applications

  • 1. Introduction to Data Science Prithwis Mukerjee, PhD Praxis Business School, Calcutta prithwis mukerjee, ph.d.
  • 2. Agenda ● ● ● ● Why data science ? Techniques ○ Statistics ○ Data Mining ○ Visualisation Tools & Platforms ○ R ○ Hadoop / MapReduce ○ Real Time Systems Business Domains prithwis mukerjee, ph.d.
  • 4. Volume Data is being acquired from a variety of sources ● ● ● ● ● ● ● EFT in Banks, Credit card payments Cell phones Sensors attached to a variety of equipment Surveillance cameras, CCTV Social Media Updates Blogs Websites prithwis mukerjee, ph.d.
  • 5. Variety / Velocity ● ● ● ● ● ● Numeric data Structured text data Unstructured text data Images Sound and video recordings Graph Nodes ○ Social Media “friends” ○ Websites linked to each other prithwis mukerjee, ph.d. Data is being generated fast and is becoming obsolete or useless equally faster ● ● ● Realtime ( or near realtime) data from sensors, cameras Website traffic Social media “trends”
  • 6. So what is Big Data ? ● ● ● Volume Velocity Variety ? A new term coined by IT vendors to push new technology like ● ● ● prithwis mukerjee, ph.d. Map Reduce Hadoop NOSQL A new way to ● ● ● ● ● collect store manage analyse visualise data
  • 7. Big Data is like Crude Oil { not new Oil } Think of data as crude oil ! Big Data is like extracting the crude oil, transporting it in mega tankers, pumping it through pipelines and storing it in massive silos But what about refining ? prithwis mukerjee, ph.d.
  • 8. The Science (and Art ) of Data Think of data as crude oil ! Data Science ● Big Data is like extracting the crude oil, transporting it in mega tankers, pumping it through pipelines and storing it in Refining massive silos prithwis mukerjee, ph.d. ● ● ● Discovering what we do not know about the data Obtaining predictive, actionable insight Creating data products that have business impacts Communicating relevent business stories
  • 10. 10 Things {most} Data Scientists do ... 1. Ask good questions 6. Create models, algorithms What is what ? 7. Under data relationships We do not know ! We would like to know 8. Tell the machine how to learn from the data 2. Define, Test Hypothesis, Run experiments 3, Scoop, scrape, sample business data 4. Wrestle and tame data 5. Play with data, discover unknowns prithwis mukerjee, ph.d. 9. Create data products that deliver actionable insights 10. Tell relevant business stories from data
  • 11. Statistics - World of Data ● Data comes in various types ○ Nominal - colour, gender, PIN code ○ Ordinal - scale of 1-10, {high, medium, low} ○ Interval - Dates, Temperature (Centigrade) ○ Ratio - length, weight, count prithwis mukerjee, ph.d. ● Data comes in various structure ○ Structured data - nominal, ordinal, interval, ratio ○ Unstructured text - email, tweets, reviews ○ Images, voice prints ○ graphs, networks - social media friendships, likes
  • 12. Descriptive Statistics ● Numeric Description ○ Mean, Median, Mode ○ Quartile, Percentile ○ Variance / Standard Deviation prithwis mukerjee, ph.d.
  • 13. Statistics : The Path Ahead Probability, Distributions prithwis mukerjee, ph.d. Testing of Hypothesis Regression, Testing Predictive Analysis
  • 14. Data Mining / Machine Learning Is the process of obtaining Typical tasks are ● novel ● classification ● valid ● clustering ● potentially useful ● association rules ● understandable ● sequential patterns ● regression ● deviation detection patterns in data prithwis mukerjee, ph.d.
  • 15. Some definitions Instance ( an item or record) ● an observation that is characterised by a number of attributes ○ ○ person - with attributes like age, salary, qualification sale - with product, quantity, price Attribute ● measuring characteristics of an instance Class ● grouping of an instance into ○ ○ acceptable, not acceptable mammal, fish, bird prithwis mukerjee, ph.d. Nominal ● colour, PIN code, state Ordinal ● ranking : tall, medium, short or feedback on a scale of 1 - 10 Ratio ● length, price, duration, quantity Interval ● date, temperature
  • 16. Data Mining : Classification Classification ● ● Which loan applicant will not default on the loan ? Which potential customer will respond to a mailer campaign ? prithwis mukerjee, ph.d.
  • 17. Classification Example s l ca uou ri go ontin lass c ate c l a ric o teg ca c Test Set Learn Classifier prithwis mukerjee, ph.d. Training Set Model
  • 18. Data Mining : Clustering Given a set of unclassified data points, how to find a natural grouping within them ● Can we segment the market in some way that is not yet known ? prithwis mukerjee, ph.d.
  • 19. Example of Document Clustering Clustering points : 3204 article from the Los Angeles Times Similarity Measure : How many words are common in these documents ( after excluding some common words ) prithwis mukerjee, ph.d.
  • 20. Clustering of S&P Stock Data ● ● ● ● Observe Stock Movements every day. Clustering points: Stock{UP/DOWN} Similarity Measure: Two points are more similar if the events described by them frequently happen together on the same day. We used association rules to quantify a similarity measure. prithwis mukerjee, ph.d.
  • 21. Regression ● Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. ○ Greatly studied in statistics, neural network fields. ● Examples: ○ Predicting sales amounts of new product based on advertising expenditure. ○ Predicting wind velocities as a function of temperature, humidity, air ○ pressure, etc. Time series prediction of stock market indices. prithwis mukerjee, ph.d.
  • 22. Data Mining : Association Rules Mining Association Rules ● ● which products should be kept along with other products which two products should never be discounted together prithwis mukerjee, ph.d.
  • 23. Visualisation : The need to tell a story prithwis mukerjee, ph.d.
  • 24. Visualisation : The need to tell a story prithwis mukerjee, ph.d.
  • 25. Definitions Data Mining ● ● Is the process of extracting unknown, valid and actionable information from large databases and using this to make business decisions Non trivial process of identifying valid, novel, potentially useful and understandable / explainable patterns in data prithwis mukerjee, ph.d. Data Science is a rare combination of multiple skills that include ● Technology : obviously ! but also ● ● ● Curiosity - a desire to go below the surface and discover a hypothesis that can be tested Storytelling - create a business story around the data Cleverness - again obviously, to look at the problem from different angles
  • 27. R : Your first step into Data Science prithwis mukerjee, ph.d. Try out this free interactive tutorial just now
  • 28. Statistical Tools prithwis mukerjee, ph.d. http://r4stats.com/articles/popularity/
  • 30. Map Reduce ● ● ● Input : A set of (key, value) pairs User supplies two functions ○ Map (k,v) => List(k1,v1) ○ Reduce (k1, list(v1)) => v2 Output is the set of (k1,v2) pairs prithwis mukerjee, ph.d.
  • 31. Hadoop A programming framework that allows you to run Map-Reduce jobs on a distributed cluster of low cost machines without having to bother about anything except ● ● the Map and Reduce functions loading data into HDFS 1. 2. 3. 4. prithwis mukerjee, ph.d. HIVE a. A plug-in that allows one to use SQL like queries that are converted into map-reduce jobs PIG a. A scripting language for writing long queries HBASE a. A non-relational DBMS SQOOP a. moves data to andfrom HDFS
  • 33. JavaScript for Data Visualisation prithwis mukerjee, ph.d.
  • 34. Business Domain ● ● Financial Sector ○ Risk Management, Credit Scoring ○ Predict Customer Spend ○ Stock and Investment Analysis ○ Loan approval Telecom Sector ○ Fraud Detection ○ Churn Prediction prithwis mukerjee, ph.d. ● ● Retail and Marketing ○ Market segmentation ○ Promotional strategy ○ Market Basket Analysis ○ Trend Analysis Healthcare & Insurance ○ Fraud Detection ○ Drug Development ○ Medical Diagnostic Tools
  • 35. Conclusion ● ● ● ● Why data science ? Techniques ○ Statistics ○ Data Mining ○ Visualisation Tools & Platforms ○ R ○ Hadoop / MapReduce ○ Real Time Systems Business Domains Data Science is a rare combination of multiple skills that include ● but also ● ● ● prithwis mukerjee, ph.d. Technology : obviously ! Curiosity - a desire to go below the surface and discover a hypothesis that can be tested Storytelling - create a business story around the data Cleverness - again obviously, to look at the problem from different angles
  • 37. Thank You Contact This presentation is accessible at at the blog Prithwis Mukerjee Professor, Praxis Business School http://blog.yantrajaal.com prithwis@praxis.ac.in at the following URL http://bit.ly/pm-datascience prithwis mukerjee, ph.d.