SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
An Introduction to Data Science
Anoop V.S
Ph.D Research Scholar
Data Engineering Lab
Indian Institute of Information Technology and Management - Kerala (IIITM-K)
Thiruvananthapuram, India
anoop.res15@iiitmk.ac.in
March 10, 2017
Anoop V.S Introduction to Data Science March 10, 2017 1 / 48
Anoop V.S Introduction to Data Science March 10, 2017 2 / 48
Why you should attend this talk ?
Companies have recognized the immense business value which can be
delivered using data. This has caused a huge demand of skilled
professional in data related jobs around the world.
Job profiles such as Data Scientist, Data Analyst, Big Data Engineer,
Statistician are being largely hunted by companies. Not only they are
being handsomely paid, but a career in analytics has much more to
promise.
After the U.S., India has the largest demand of analytics / big data /
data science professionals. Amidst such demand, people find
themselves confused to select an appropriate job profile for the best
future.
Anoop V.S Introduction to Data Science March 10, 2017 3 / 48
How much a Data Science Professional can earn ?
Anoop V.S Introduction to Data Science March 10, 2017 4 / 48
Which cities are offering high salary ?
Anoop V.S Introduction to Data Science March 10, 2017 5 / 48
Data Scientist - the SEXIEST JOB OF 21st CENTURY !
Requires a mixture of multidisciplinary skills ranging from an
intersection of mathematics, statistics, computer science,
communication and business.
Finding a Data Scientist is hard !
Finding people who understand who a Data Scientist is, is equally
hard !!
The trend is expected to accelerate in the coming years as data from
mobile sensors, sophisticated instruments, the web, and more, grows
It is predicted that in 2020, the world will generate 50 times the
amount of data than in 2011
Anoop V.S Introduction to Data Science March 10, 2017 6 / 48
What skills are needed ?
Anoop V.S Introduction to Data Science March 10, 2017 7 / 48
So, what really is Data Science ?
Asking questions (formulating hypothesis), answers to which solve
known problems or unearth unknown solutions that in turn drive
business value
Defining the data needed or working with an existing data set and
employing tools (computer science based) to collect, store and explore
such data generally in huge volume & variety
Identifying the type of analysis to be done to get to the answers and
performing such analysis by implementing various algorithms/tools,
often in a distributed and parallel architecture
Communicating the insights gathered from the analysis in the form of
simple stories/visualizations/dashboards that a non-data scientist can
understand and build conversation out of it
Building a higher level abstraction that does steps 2-3-4 in an
autonomous way, analyzing & taking actions on new data as they are
fed to the system
Anoop V.S Introduction to Data Science March 10, 2017 8 / 48
Summing up in an image
Anoop V.S Introduction to Data Science March 10, 2017 9 / 48
Leading by an example
Two of the most famous companies in the world use analytics and Big
Data to shape their product, services and delivery - Amazon and Facebook.
Amazon uses analytics to curate products on their customers
homepages based on their previous purchases and browsing habits.
Facebook uses analytics to fill your news feed with updates from
people you interact with the most; content from sites you frequent
and products you have checked out on other sites.
Anoop V.S Introduction to Data Science March 10, 2017 10 / 48
Type of analytics
Descriptive Analytics, which use data aggregation and data mining
to provide insight into the past and answer: ”What has happened?”
Predictive Analytics, which use statistical models and forecasts
techniques to understand the future and answer: ”What could
happen?”
Prescriptive Analytics, which use optimization and simulation
algorithms to advice on possible outcomes and answer: ”What
should we do?”
Anoop V.S Introduction to Data Science March 10, 2017 11 / 48
Descriptive Analytics: Insight into the past
Descriptive analysis or statistics does exactly what the name implies
they Describe, or summarize raw data and make it something that is
interpretable by humans
They are analytics that describe the past. The past refers to any
point of time that an event has occurred, whether it is one minute
ago, or one year ago
Descriptive analytics are useful because they allow us to learn from
past behaviors, and understand how they might influence future
outcomes.
Common examples of descriptive analytics are reports that provide
historical insights regarding the companys production, financials,
operations, sales, finance, inventory and customers
Anoop V.S Introduction to Data Science March 10, 2017 12 / 48
Predictive Analytics: Understanding the future
Predictive analytics has its roots in the ability to ”Predict” what
might happen
Predictive analytics provides companies with actionable insights based
on data.
It is important to remember that no statistical algorithm can predict
the future with 100% certainty. Companies use these statistics to
forecast what might happen in the future. This is because the
foundation of predictive analytics is based on probabilities
Predictive analytics can be used throughout the organization, from
forecasting customer behavior and purchasing patterns to identifying
trends in sales activities
Anoop V.S Introduction to Data Science March 10, 2017 13 / 48
Prescriptive Analytics: Advise on possible outcomes
The relatively new field of prescriptive analytics allows users to
prescribe a number of different possible actions to and guide them
towards a solution
At their best, prescriptive analytics predicts not only what will
happen, but also why it will happen providing recommendations
regarding actions that will take advantage of the predictions.
Prescriptive analytics use a combination of techniques and tools such
as business rules, algorithms, machine learning and computational
modelling procedures. These techniques are applied against input
from many different data sets including historical and transactional
data, real-time data feeds, and big data
Anoop V.S Introduction to Data Science March 10, 2017 14 / 48
Now into some basics - What is Data / Information /
Knowledge ?
Data is unprocessed facts and figures without any added
interpretation or analysis. ”The price of crude oil is $80 per barrel.”
Information is data that has been interpreted so that it has meaning
for the user. ”The price of crude oil has risen from $70 to $80 per
barrel” gives meaning to the data and so is said to be information to
someone who tracks oil prices.
Knowledge is a combination of information, experience and insight
that may benefit the individual or the organisation. ”When crude oil
prices go up by $10 per barrel, it’s likely that petrol prices will rise by
Rs. 20 per litre” is knowledge.
Anoop V.S Introduction to Data Science March 10, 2017 15 / 48
Relationship of Data, Information and Intelligence
Anoop V.S Introduction to Data Science March 10, 2017 16 / 48
Categories of Data - A quick view
Structured Data concerns all data which can be stored in database
SQL in table with rows and columns. They have relationnal key and
can be easily mapped into pre-designed fields. Today, those data are
the most processed in development and the simpliest way to manage
informations.
Semistructured Data doesnt reside in a relational database but that
does have some organizational properties that make it easier to
analyze. With some process you can store them in relation database.
Unstructured Data represent around 80% of data. It often include
text and multimedia content. Examples include e-mail messages,
word processing documents, videos, photos, audio files, presentations,
webpages and many other kinds of business documents.
Unstructured data is everywhere. In fact, most individuals and
organizations conduct their lives around unstructured data
Anoop V.S Introduction to Data Science March 10, 2017 17 / 48
Big Data - in recent News
Anoop V.S Introduction to Data Science March 10, 2017 18 / 48
Big Data - in recent News
Anoop V.S Introduction to Data Science March 10, 2017 19 / 48
Big Data - in recent News
Anoop V.S Introduction to Data Science March 10, 2017 20 / 48
Big Data - in recent News
Anoop V.S Introduction to Data Science March 10, 2017 21 / 48
Do you know ”90% of the worlds data was generated in
the last few years.” !!!
Big data means really a big data, it is a collection of large datasets
that cannot be processed using traditional computing techniques
Big data is not merely a data, rather it has become a complete
subject, which involves various tools, techniques and frameworks.
What comes under Big Data ?
Black Box Data
Social Media Data
Stock Exchange Data
Power Grid Data
Transport Data
Search Engine Data etc.
Anoop V.S Introduction to Data Science March 10, 2017 22 / 48
3Vs of Big Data
Volume Organizations collect data from a variety of sources,
including business transactions, social media and information from
sensor or machine-to-machine data. In the past, storing it wouldve
been a problem but new technologies (such as Hadoop) have eased
the burden.
Velocity Data streams in at an unprecedented speed and must be
dealt with in a timely manner. RFID tags, sensors and smart metering
are driving the need to deal with torrents of data in near-real time.
Variety Data comes in all types of formats from structured, numeric
data in traditional databases to unstructured text documents, email,
video, audio, stock ticker data and financial transactions.
Anoop V.S Introduction to Data Science March 10, 2017 23 / 48
Who uses Big Data ?
Banking - its important to understand customers and boost their
satisfaction, its equally important to minimize risk and fraud while
maintaining regulatory compliance. Big data brings big insights, but
it also requires financial institutions to stay one step ahead of the
game with advanced analytics
Education - Educators armed with data-driven insight can make a
significant impact on school systems, students and curriculums. By
analyzing big data, they can identify at-risk students, make sure
students are making adequate progress, and can implement a better
system for evaluation and support
Government - When government agencies are able to harness and
apply analytics to their big data, they gain significant ground when it
comes to managing utilities, running agencies, dealing with traffic
congestion or preventing crime.
Anoop V.S Introduction to Data Science March 10, 2017 24 / 48
Who uses Big Data ?
Health care - Patient records. Treatment plans. Prescription
information. When it comes to health care, everything needs to be
done quickly, accurately and, in some cases, with enough
transparency to satisfy stringent industry regulations. When big data
is managed effectively, health care providers can uncover hidden
insights that improve patient care.
Manufacturing - More and more manufacturers are working in an
analytics-based culture, which means they can solve problems faster
and make more agile business decisions.
Retail - Retailers need to know the best way to market to customers,
the most effective way to handle transactions, and the most strategic
way to bring back lapsed business
Anoop V.S Introduction to Data Science March 10, 2017 25 / 48
Operational Vs. Analytical Big Data
Operational Big Data provide operational features to run real-time,
interactive workloads that ingest and store data.
MongoDB is a top technology for operational Big Data applications
with over 10 million downloads of its open source software.
Analytical Big Data Analytical Big Data technologies, on the other
hand, are useful for retrospective, sophisticated analytics of your data.
Hadoop is the most popular example of an Analytical Big Data
technology.
But picking an operational vs analytical Big Data solution isnt the
right way to think about the challenge. They are complementary
technologies and you likely need both to develop a complete Big Data
solution.
Anoop V.S Introduction to Data Science March 10, 2017 26 / 48
Traditional Vs. Google’s solution
In Traditional approach will have a computer to store and process
big data. Here data will be stored in an RDBMS like Oracle
Database, MS SQL Server or DB2 and sophisticated softwares can be
written to interact with the database, process the required data and
present it to the users for analysis purpose.
Limitations will have a computer to store and process big data. Here
data will be stored in an RDBMS like Oracle Database, MS SQL
Server or DB2 and sophisticated softwares can be written to interact
with the database, process the required data and present it to the
users for analysis purpose.
Anoop V.S Introduction to Data Science March 10, 2017 27 / 48
Google’s solution
Google solved this problem using an algorithm called MapReduce.
This algorithm divides the task into small parts and assigns those
parts to many computers connected over the network, and collects
the results to form the final result dataset.
Doug Cutting, Mike Cafarella and team took the solution provided by
Google and started an Open Source Project called HADOOP in 2005.
Hadoop runs applications using the MapReduce algorithm, where the
data is processed in parallel on different CPU nodes. In short, Hadoop
framework is capable enough to develop applications capable of
running on clusters of computers and they could perform complete
statistical analysis for a huge amounts of data.
Anoop V.S Introduction to Data Science March 10, 2017 28 / 48
How MapReduce works ?
Anoop V.S Introduction to Data Science March 10, 2017 29 / 48
Machine Learning - Learning from DATA !
Machine learning is a method of data analysis that automates
analytical model building. Using algorithms that iteratively learn from
data, machine learning allows computers to find hidden insights
without being explicitly programmed where to look.
The iterative aspect of machine learning is important because as
models are exposed to new data, they are able to independently adapt.
They learn from previous computations to produce reliable, repeatable
decisions and results
While many machine learning algorithms have been around for a long
time, the ability to automatically apply complex mathematical
calculations to big data over and over, faster and faster is a recent
development.
Anoop V.S Introduction to Data Science March 10, 2017 30 / 48
Here are a few widely publicized examples of machine
learning applications you may be familiar with
The heavily hyped, self-driving Google car? The essence of machine
learning.
Online recommendation offers such as those from Amazon and
Netflix? Machine learning applications for everyday life.
Knowing what customers are saying about you on Twitter? Machine
learning combined with linguistic rule creation.
Fraud detection? One of the more obvious, important uses in our
world today.
Anoop V.S Introduction to Data Science March 10, 2017 31 / 48
How to learn from DATA ?
1 Supervised Learning
1 we have training data with correct answers
2 use training data to prepare the algorithm
3 then apply it to a data without correct answer
2 Unsupervised Learning
1 no training data
2 throw data into the algorithm
3 hope it makes some kind of sense out of the data
Anoop V.S Introduction to Data Science March 10, 2017 32 / 48
Some types of learning algorithms
Prediction Predicting a variable from data
Classification Assigning records to predefined groups
Clustering Splitting records into groups based on similarity
Association Learning Seeing what often appears together with what
Issues with learning - Data is usually noisy in some way, Inductive bias -
the shape of the algorithm we choose may not fit the data at all, may
induce induce under-fitting or over-fitting.
Anoop V.S Introduction to Data Science March 10, 2017 33 / 48
Testing our model and treating missing values
When using for real problems, testing the model is crucial.
Testing means splitting your dataset - training data (used as input to
algorithm) and test data (used for evaluation only)
Need to compute some measure of performance - precision / recall,
root mean square error
Usually there are missing values in the dataset and this cause problems for
many Machine Learning algorithms. These can be solved by,
Remove all records with NULL values
Use a default value
Estimate a replacement value etc.
Anoop V.S Introduction to Data Science March 10, 2017 34 / 48
Top 10 Machine Learning Algorithms
Machine Learning algorithms are expected to replace 25% of the jobs
across the world in the next 10 years !!!
Nave Bayes Classifier Algorithm
K Means Clustering Algorithm
Support Vector Machine Algorithm
Apriori Algorithm
Linear Regression
Logistic Regression
Artificial Neural Networks
Random Forests
Decision Trees
Nearest Neighbours
Anoop V.S Introduction to Data Science March 10, 2017 35 / 48
Nave Bayes Classifier Algorithm
When to use Nave Bayes Classifier Algorithm ?
If you have a moderate or large training data set.
If the instances have several attributes.
Given the classification parameter, attributes which describe the
instances should be conditionally independent.
Applications of Nave Bayes Classifier Algorithm
Sentiment Analysis - It is used at Facebook to analyse status updates
expressing positive or negative emotions.
Document Categorization - Google uses document classification to
index documents and find relevancy scores i.e. the PageRank
Google Mail uses Nave Bayes algorithm to classify your emails as
Spam or Not Spam
Anoop V.S Introduction to Data Science March 10, 2017 36 / 48
K Means Clustering Algorithm
K-means is a popularly used unsupervised machine learning algorithm
for cluster analysis
The algorithm operates on a given data set through pre-defined
number of clusters, k.
The output of K Means algorithm is k clusters with input data
partitioned among the clusters.
Applications of K Means Clustering Algorithm
K Means Clustering algorithm is used by most of the search engines
like Yahoo, Google to cluster web pages by similarity and identify the
relevance rate of search results
This helps search engines reduce the computational time for the users.
Anoop V.S Introduction to Data Science March 10, 2017 37 / 48
Support Vector Machine Learning Algorithm
Support Vector Machine is a supervised machine learning algorithm
for classification or regression problems
Dataset teaches SVM about the classes so that SVM can classify any
new data
It works by classifying the data into different classes by finding a line
(hyperplane) which separates the training data set into classes
SVM offers best classification performance (accuracy) on the training
data.
Applications of Support Vector Machine Learning Algorithm
SVM is commonly used for stock market forecasting by various
financial institutions.
It can be used to compare the relative performance of the stocks
when compared to performance of other stocks in the same sector
The relative comparison of stocks helps manage investment making
decisions based on the classifications made by the SVM learning
algorithm.
Anoop V.S Introduction to Data Science March 10, 2017 38 / 48
Apriori Machine Learning Algorithm
Apriori algorithm is an unsupervised machine learning algorithm that
generates association rules from a given data set
Association rule implies that if an item A occurs, then item B also
occurs with a certain probability
Most of the association rules generated are in the IF THEN format.
For example, IF people buy an iPad THEN they also buy an iPad
Case to protect it
It is easy to implement and can be parallelized easily.
Applications of Apriori Machine Learning Algorithm
Detecting Adverse Drug Reactions
Market Basket Analysis
Auto-Complete Applications
Anoop V.S Introduction to Data Science March 10, 2017 39 / 48
Linear Regression Machine Learning Algorithm
Linear Regression algorithm shows the relationship between 2
variables and how the change in one variable impacts the other
The algorithm shows the impact on the dependent variable on
changing the independent variable
It is one of the most interpretable machine learning algorithms,
making it easy to explain to others.
It is the mostly widely used machine learning technique that runs fast.
Applications of Linear Regression Machine Learning Algorithm
Estimating Sales - Linear Regression finds great use in business, for
sales forecasting based on the trends
Risk Assessment - Linear Regression helps assess risk involved in
insurance or financial domain. A health insurance company can do a
linear regression analysis on the number of claims per customer
against age
Anoop V.S Introduction to Data Science March 10, 2017 40 / 48
Decision Tree Machine Learning Algorithm
A decision tree is a graphical representation that makes use of
branching methodology to exemplify all possible outcomes of a
decision, based on certain conditions
In a decision tree, the internal node represents a test on the attribute,
each branch of the tree represents the outcome of the test and the
leaf node represents a particular class label
The classification rules are represented through the path from root to
the leaf node.
Applications of Decision Tree Machine Learning Algorithm
Decision trees are among the popular machine learning algorithms
that find great use in finance for option pricing.
Decision tree algorithms are used by banks to classify loan applicants
by their probability of defaulting payments.
Anoop V.S Introduction to Data Science March 10, 2017 41 / 48
The Best Machine Learning Libraries in Python
Python is one of the best languages you can use to learn (and implement)
machine learning techniques for a few reasons:
It’s simple - Python is now becoming the language of choice among
new programmers thanks to its simple syntax and huge community
It’s powerful - Just because something is simple doesn’t mean it
isn’t capable. Python is also one of the most popular languages
among data scientists and web programmers. Its community has
created libraries to do just about anything you want, including
machine learning
Lots of ML libraries There are tons of machine learning libraries
already written for Python. You can choose one of the hundreds of
libraries based on your use-case, skill, and need for customization.
Anoop V.S Introduction to Data Science March 10, 2017 42 / 48
The Best Machine Learning Libraries in Python - contd..
Tensorflow - a high-level neural network library that helps you
program your network architectures while avoiding the low-level details
scikit-learn - The scikit-learn library is definitely one of, if not the
most, popular ML libraries out there among all languages. It has a
huge number of features for data mining and data analysis, making it
a top choice for researches and developers alike.
Theano - is a machine learning library that allows you to define,
optimize, and evaluate mathematical expressions involving
multi-dimensional arrays, which can be a point of frustration for some
developers in other libraries
Anoop V.S Introduction to Data Science March 10, 2017 43 / 48
The Best Machine Learning Libraries in Python - contd..
Pylearn2 - Most of Pylearn2’s functionality is actually built on top of
Theano, so it has a pretty solid base.
Pyevolve - Pyevolve provides a great framework to build and execute
genetic algorithms and neural networks.
Pattern - This is more of a ’full suite’ library as it provides not only
some ML algorithms but also tools to help you collect and analyze
data. The data mining portion helps you collect data from web
services like Google, Twitter, and Wikipedia. The nice thing about
including these tools is how easy it makes it to both collect and train
on data in the same program.
Anoop V.S Introduction to Data Science March 10, 2017 44 / 48
Machine Learning & Big Data Analytics - The perfect
marriage
TWO Orthogonal Aspects !!
Big Data - Handling massive data volumes !
Analytics / Machine Learning - Learning insights from data !
Can be combined so that it gives accurate, effective analysis !!!
Anoop V.S Introduction to Data Science March 10, 2017 45 / 48
Books I recommend for Machine Learning
Anoop V.S Introduction to Data Science March 10, 2017 46 / 48
Books I recommend for Big Data, Machine Learning
Anoop V.S Introduction to Data Science March 10, 2017 47 / 48
Thank you for not yawning !
Questions ?
Anoop V.S Introduction to Data Science March 10, 2017 48 / 48

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideSlideTeam
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 

Was ist angesagt? (20)

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Big Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation SlideBig Data Analytics Powerpoint Presentation Slide
Big Data Analytics Powerpoint Presentation Slide
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data science
Data science Data science
Data science
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science
Data scienceData science
Data science
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
AI and Data Science.pdf
AI and Data Science.pdfAI and Data Science.pdf
AI and Data Science.pdf
 
Data science
Data scienceData science
Data science
 

Andere mochten auch

Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)Ian Beckett
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn
 
Tugas4 1412510602 dewi_apriliani
Tugas4 1412510602 dewi_aprilianiTugas4 1412510602 dewi_apriliani
Tugas4 1412510602 dewi_aprilianidewiapril1996
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends ReportIQbal KHan
 
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn
 
Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceAmazon Web Services
 
Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016 Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016 Ian Beckett
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaperRajesh Kumar
 
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...Edureka!
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dcBob Ward
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsLucas Jellema
 
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)Amazon Web Services Korea
 
Startup & VC Tech Trends
Startup & VC Tech Trends Startup & VC Tech Trends
Startup & VC Tech Trends Dave McClure
 

Andere mochten auch (20)

Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017
 
Tugas4 1412510602 dewi_apriliani
Tugas4 1412510602 dewi_aprilianiTugas4 1412510602 dewi_apriliani
Tugas4 1412510602 dewi_apriliani
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends Report
 
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017
 
Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017Tracxn Research - Industrial Robotics Landscape, February 2017
Tracxn Research - Industrial Robotics Landscape, February 2017
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 
Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016 Europa AI startup scaleups report 2016
Europa AI startup scaleups report 2016
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
 
Gs08 modernize your data platform with sql technologies wash dc
Gs08 modernize your data platform with sql technologies   wash dcGs08 modernize your data platform with sql technologies   wash dc
Gs08 modernize your data platform with sql technologies wash dc
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statements
 
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
Cross-regional Application Deplolyment on AWS - Channy Yun (JAWS Days 2017)
 
Startup & VC Tech Trends
Startup & VC Tech Trends Startup & VC Tech Trends
Startup & VC Tech Trends
 

Ähnlich wie Introduction to Data Science

Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docxHow Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docxpooleavelina
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovationssuresh sood
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research ReportIla Group
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 

Ähnlich wie Introduction to Data Science (20)

What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docxHow Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
KOHN.ppt
KOHN.pptKOHN.ppt
KOHN.ppt
 
KOHN.ppt
KOHN.pptKOHN.ppt
KOHN.ppt
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovations
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research Report
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Data Science for Finance Interview.
Data Science for Finance Interview. Data Science for Finance Interview.
Data Science for Finance Interview.
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Introduction to Data Science

  • 1. An Introduction to Data Science Anoop V.S Ph.D Research Scholar Data Engineering Lab Indian Institute of Information Technology and Management - Kerala (IIITM-K) Thiruvananthapuram, India anoop.res15@iiitmk.ac.in March 10, 2017 Anoop V.S Introduction to Data Science March 10, 2017 1 / 48
  • 2. Anoop V.S Introduction to Data Science March 10, 2017 2 / 48
  • 3. Why you should attend this talk ? Companies have recognized the immense business value which can be delivered using data. This has caused a huge demand of skilled professional in data related jobs around the world. Job profiles such as Data Scientist, Data Analyst, Big Data Engineer, Statistician are being largely hunted by companies. Not only they are being handsomely paid, but a career in analytics has much more to promise. After the U.S., India has the largest demand of analytics / big data / data science professionals. Amidst such demand, people find themselves confused to select an appropriate job profile for the best future. Anoop V.S Introduction to Data Science March 10, 2017 3 / 48
  • 4. How much a Data Science Professional can earn ? Anoop V.S Introduction to Data Science March 10, 2017 4 / 48
  • 5. Which cities are offering high salary ? Anoop V.S Introduction to Data Science March 10, 2017 5 / 48
  • 6. Data Scientist - the SEXIEST JOB OF 21st CENTURY ! Requires a mixture of multidisciplinary skills ranging from an intersection of mathematics, statistics, computer science, communication and business. Finding a Data Scientist is hard ! Finding people who understand who a Data Scientist is, is equally hard !! The trend is expected to accelerate in the coming years as data from mobile sensors, sophisticated instruments, the web, and more, grows It is predicted that in 2020, the world will generate 50 times the amount of data than in 2011 Anoop V.S Introduction to Data Science March 10, 2017 6 / 48
  • 7. What skills are needed ? Anoop V.S Introduction to Data Science March 10, 2017 7 / 48
  • 8. So, what really is Data Science ? Asking questions (formulating hypothesis), answers to which solve known problems or unearth unknown solutions that in turn drive business value Defining the data needed or working with an existing data set and employing tools (computer science based) to collect, store and explore such data generally in huge volume & variety Identifying the type of analysis to be done to get to the answers and performing such analysis by implementing various algorithms/tools, often in a distributed and parallel architecture Communicating the insights gathered from the analysis in the form of simple stories/visualizations/dashboards that a non-data scientist can understand and build conversation out of it Building a higher level abstraction that does steps 2-3-4 in an autonomous way, analyzing & taking actions on new data as they are fed to the system Anoop V.S Introduction to Data Science March 10, 2017 8 / 48
  • 9. Summing up in an image Anoop V.S Introduction to Data Science March 10, 2017 9 / 48
  • 10. Leading by an example Two of the most famous companies in the world use analytics and Big Data to shape their product, services and delivery - Amazon and Facebook. Amazon uses analytics to curate products on their customers homepages based on their previous purchases and browsing habits. Facebook uses analytics to fill your news feed with updates from people you interact with the most; content from sites you frequent and products you have checked out on other sites. Anoop V.S Introduction to Data Science March 10, 2017 10 / 48
  • 11. Type of analytics Descriptive Analytics, which use data aggregation and data mining to provide insight into the past and answer: ”What has happened?” Predictive Analytics, which use statistical models and forecasts techniques to understand the future and answer: ”What could happen?” Prescriptive Analytics, which use optimization and simulation algorithms to advice on possible outcomes and answer: ”What should we do?” Anoop V.S Introduction to Data Science March 10, 2017 11 / 48
  • 12. Descriptive Analytics: Insight into the past Descriptive analysis or statistics does exactly what the name implies they Describe, or summarize raw data and make it something that is interpretable by humans They are analytics that describe the past. The past refers to any point of time that an event has occurred, whether it is one minute ago, or one year ago Descriptive analytics are useful because they allow us to learn from past behaviors, and understand how they might influence future outcomes. Common examples of descriptive analytics are reports that provide historical insights regarding the companys production, financials, operations, sales, finance, inventory and customers Anoop V.S Introduction to Data Science March 10, 2017 12 / 48
  • 13. Predictive Analytics: Understanding the future Predictive analytics has its roots in the ability to ”Predict” what might happen Predictive analytics provides companies with actionable insights based on data. It is important to remember that no statistical algorithm can predict the future with 100% certainty. Companies use these statistics to forecast what might happen in the future. This is because the foundation of predictive analytics is based on probabilities Predictive analytics can be used throughout the organization, from forecasting customer behavior and purchasing patterns to identifying trends in sales activities Anoop V.S Introduction to Data Science March 10, 2017 13 / 48
  • 14. Prescriptive Analytics: Advise on possible outcomes The relatively new field of prescriptive analytics allows users to prescribe a number of different possible actions to and guide them towards a solution At their best, prescriptive analytics predicts not only what will happen, but also why it will happen providing recommendations regarding actions that will take advantage of the predictions. Prescriptive analytics use a combination of techniques and tools such as business rules, algorithms, machine learning and computational modelling procedures. These techniques are applied against input from many different data sets including historical and transactional data, real-time data feeds, and big data Anoop V.S Introduction to Data Science March 10, 2017 14 / 48
  • 15. Now into some basics - What is Data / Information / Knowledge ? Data is unprocessed facts and figures without any added interpretation or analysis. ”The price of crude oil is $80 per barrel.” Information is data that has been interpreted so that it has meaning for the user. ”The price of crude oil has risen from $70 to $80 per barrel” gives meaning to the data and so is said to be information to someone who tracks oil prices. Knowledge is a combination of information, experience and insight that may benefit the individual or the organisation. ”When crude oil prices go up by $10 per barrel, it’s likely that petrol prices will rise by Rs. 20 per litre” is knowledge. Anoop V.S Introduction to Data Science March 10, 2017 15 / 48
  • 16. Relationship of Data, Information and Intelligence Anoop V.S Introduction to Data Science March 10, 2017 16 / 48
  • 17. Categories of Data - A quick view Structured Data concerns all data which can be stored in database SQL in table with rows and columns. They have relationnal key and can be easily mapped into pre-designed fields. Today, those data are the most processed in development and the simpliest way to manage informations. Semistructured Data doesnt reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process you can store them in relation database. Unstructured Data represent around 80% of data. It often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. Unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data Anoop V.S Introduction to Data Science March 10, 2017 17 / 48
  • 18. Big Data - in recent News Anoop V.S Introduction to Data Science March 10, 2017 18 / 48
  • 19. Big Data - in recent News Anoop V.S Introduction to Data Science March 10, 2017 19 / 48
  • 20. Big Data - in recent News Anoop V.S Introduction to Data Science March 10, 2017 20 / 48
  • 21. Big Data - in recent News Anoop V.S Introduction to Data Science March 10, 2017 21 / 48
  • 22. Do you know ”90% of the worlds data was generated in the last few years.” !!! Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques Big data is not merely a data, rather it has become a complete subject, which involves various tools, techniques and frameworks. What comes under Big Data ? Black Box Data Social Media Data Stock Exchange Data Power Grid Data Transport Data Search Engine Data etc. Anoop V.S Introduction to Data Science March 10, 2017 22 / 48
  • 23. 3Vs of Big Data Volume Organizations collect data from a variety of sources, including business transactions, social media and information from sensor or machine-to-machine data. In the past, storing it wouldve been a problem but new technologies (such as Hadoop) have eased the burden. Velocity Data streams in at an unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Variety Data comes in all types of formats from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions. Anoop V.S Introduction to Data Science March 10, 2017 23 / 48
  • 24. Who uses Big Data ? Banking - its important to understand customers and boost their satisfaction, its equally important to minimize risk and fraud while maintaining regulatory compliance. Big data brings big insights, but it also requires financial institutions to stay one step ahead of the game with advanced analytics Education - Educators armed with data-driven insight can make a significant impact on school systems, students and curriculums. By analyzing big data, they can identify at-risk students, make sure students are making adequate progress, and can implement a better system for evaluation and support Government - When government agencies are able to harness and apply analytics to their big data, they gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime. Anoop V.S Introduction to Data Science March 10, 2017 24 / 48
  • 25. Who uses Big Data ? Health care - Patient records. Treatment plans. Prescription information. When it comes to health care, everything needs to be done quickly, accurately and, in some cases, with enough transparency to satisfy stringent industry regulations. When big data is managed effectively, health care providers can uncover hidden insights that improve patient care. Manufacturing - More and more manufacturers are working in an analytics-based culture, which means they can solve problems faster and make more agile business decisions. Retail - Retailers need to know the best way to market to customers, the most effective way to handle transactions, and the most strategic way to bring back lapsed business Anoop V.S Introduction to Data Science March 10, 2017 25 / 48
  • 26. Operational Vs. Analytical Big Data Operational Big Data provide operational features to run real-time, interactive workloads that ingest and store data. MongoDB is a top technology for operational Big Data applications with over 10 million downloads of its open source software. Analytical Big Data Analytical Big Data technologies, on the other hand, are useful for retrospective, sophisticated analytics of your data. Hadoop is the most popular example of an Analytical Big Data technology. But picking an operational vs analytical Big Data solution isnt the right way to think about the challenge. They are complementary technologies and you likely need both to develop a complete Big Data solution. Anoop V.S Introduction to Data Science March 10, 2017 26 / 48
  • 27. Traditional Vs. Google’s solution In Traditional approach will have a computer to store and process big data. Here data will be stored in an RDBMS like Oracle Database, MS SQL Server or DB2 and sophisticated softwares can be written to interact with the database, process the required data and present it to the users for analysis purpose. Limitations will have a computer to store and process big data. Here data will be stored in an RDBMS like Oracle Database, MS SQL Server or DB2 and sophisticated softwares can be written to interact with the database, process the required data and present it to the users for analysis purpose. Anoop V.S Introduction to Data Science March 10, 2017 27 / 48
  • 28. Google’s solution Google solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset. Doug Cutting, Mike Cafarella and team took the solution provided by Google and started an Open Source Project called HADOOP in 2005. Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel on different CPU nodes. In short, Hadoop framework is capable enough to develop applications capable of running on clusters of computers and they could perform complete statistical analysis for a huge amounts of data. Anoop V.S Introduction to Data Science March 10, 2017 28 / 48
  • 29. How MapReduce works ? Anoop V.S Introduction to Data Science March 10, 2017 29 / 48
  • 30. Machine Learning - Learning from DATA ! Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data over and over, faster and faster is a recent development. Anoop V.S Introduction to Data Science March 10, 2017 30 / 48
  • 31. Here are a few widely publicized examples of machine learning applications you may be familiar with The heavily hyped, self-driving Google car? The essence of machine learning. Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life. Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation. Fraud detection? One of the more obvious, important uses in our world today. Anoop V.S Introduction to Data Science March 10, 2017 31 / 48
  • 32. How to learn from DATA ? 1 Supervised Learning 1 we have training data with correct answers 2 use training data to prepare the algorithm 3 then apply it to a data without correct answer 2 Unsupervised Learning 1 no training data 2 throw data into the algorithm 3 hope it makes some kind of sense out of the data Anoop V.S Introduction to Data Science March 10, 2017 32 / 48
  • 33. Some types of learning algorithms Prediction Predicting a variable from data Classification Assigning records to predefined groups Clustering Splitting records into groups based on similarity Association Learning Seeing what often appears together with what Issues with learning - Data is usually noisy in some way, Inductive bias - the shape of the algorithm we choose may not fit the data at all, may induce induce under-fitting or over-fitting. Anoop V.S Introduction to Data Science March 10, 2017 33 / 48
  • 34. Testing our model and treating missing values When using for real problems, testing the model is crucial. Testing means splitting your dataset - training data (used as input to algorithm) and test data (used for evaluation only) Need to compute some measure of performance - precision / recall, root mean square error Usually there are missing values in the dataset and this cause problems for many Machine Learning algorithms. These can be solved by, Remove all records with NULL values Use a default value Estimate a replacement value etc. Anoop V.S Introduction to Data Science March 10, 2017 34 / 48
  • 35. Top 10 Machine Learning Algorithms Machine Learning algorithms are expected to replace 25% of the jobs across the world in the next 10 years !!! Nave Bayes Classifier Algorithm K Means Clustering Algorithm Support Vector Machine Algorithm Apriori Algorithm Linear Regression Logistic Regression Artificial Neural Networks Random Forests Decision Trees Nearest Neighbours Anoop V.S Introduction to Data Science March 10, 2017 35 / 48
  • 36. Nave Bayes Classifier Algorithm When to use Nave Bayes Classifier Algorithm ? If you have a moderate or large training data set. If the instances have several attributes. Given the classification parameter, attributes which describe the instances should be conditionally independent. Applications of Nave Bayes Classifier Algorithm Sentiment Analysis - It is used at Facebook to analyse status updates expressing positive or negative emotions. Document Categorization - Google uses document classification to index documents and find relevancy scores i.e. the PageRank Google Mail uses Nave Bayes algorithm to classify your emails as Spam or Not Spam Anoop V.S Introduction to Data Science March 10, 2017 36 / 48
  • 37. K Means Clustering Algorithm K-means is a popularly used unsupervised machine learning algorithm for cluster analysis The algorithm operates on a given data set through pre-defined number of clusters, k. The output of K Means algorithm is k clusters with input data partitioned among the clusters. Applications of K Means Clustering Algorithm K Means Clustering algorithm is used by most of the search engines like Yahoo, Google to cluster web pages by similarity and identify the relevance rate of search results This helps search engines reduce the computational time for the users. Anoop V.S Introduction to Data Science March 10, 2017 37 / 48
  • 38. Support Vector Machine Learning Algorithm Support Vector Machine is a supervised machine learning algorithm for classification or regression problems Dataset teaches SVM about the classes so that SVM can classify any new data It works by classifying the data into different classes by finding a line (hyperplane) which separates the training data set into classes SVM offers best classification performance (accuracy) on the training data. Applications of Support Vector Machine Learning Algorithm SVM is commonly used for stock market forecasting by various financial institutions. It can be used to compare the relative performance of the stocks when compared to performance of other stocks in the same sector The relative comparison of stocks helps manage investment making decisions based on the classifications made by the SVM learning algorithm. Anoop V.S Introduction to Data Science March 10, 2017 38 / 48
  • 39. Apriori Machine Learning Algorithm Apriori algorithm is an unsupervised machine learning algorithm that generates association rules from a given data set Association rule implies that if an item A occurs, then item B also occurs with a certain probability Most of the association rules generated are in the IF THEN format. For example, IF people buy an iPad THEN they also buy an iPad Case to protect it It is easy to implement and can be parallelized easily. Applications of Apriori Machine Learning Algorithm Detecting Adverse Drug Reactions Market Basket Analysis Auto-Complete Applications Anoop V.S Introduction to Data Science March 10, 2017 39 / 48
  • 40. Linear Regression Machine Learning Algorithm Linear Regression algorithm shows the relationship between 2 variables and how the change in one variable impacts the other The algorithm shows the impact on the dependent variable on changing the independent variable It is one of the most interpretable machine learning algorithms, making it easy to explain to others. It is the mostly widely used machine learning technique that runs fast. Applications of Linear Regression Machine Learning Algorithm Estimating Sales - Linear Regression finds great use in business, for sales forecasting based on the trends Risk Assessment - Linear Regression helps assess risk involved in insurance or financial domain. A health insurance company can do a linear regression analysis on the number of claims per customer against age Anoop V.S Introduction to Data Science March 10, 2017 40 / 48
  • 41. Decision Tree Machine Learning Algorithm A decision tree is a graphical representation that makes use of branching methodology to exemplify all possible outcomes of a decision, based on certain conditions In a decision tree, the internal node represents a test on the attribute, each branch of the tree represents the outcome of the test and the leaf node represents a particular class label The classification rules are represented through the path from root to the leaf node. Applications of Decision Tree Machine Learning Algorithm Decision trees are among the popular machine learning algorithms that find great use in finance for option pricing. Decision tree algorithms are used by banks to classify loan applicants by their probability of defaulting payments. Anoop V.S Introduction to Data Science March 10, 2017 41 / 48
  • 42. The Best Machine Learning Libraries in Python Python is one of the best languages you can use to learn (and implement) machine learning techniques for a few reasons: It’s simple - Python is now becoming the language of choice among new programmers thanks to its simple syntax and huge community It’s powerful - Just because something is simple doesn’t mean it isn’t capable. Python is also one of the most popular languages among data scientists and web programmers. Its community has created libraries to do just about anything you want, including machine learning Lots of ML libraries There are tons of machine learning libraries already written for Python. You can choose one of the hundreds of libraries based on your use-case, skill, and need for customization. Anoop V.S Introduction to Data Science March 10, 2017 42 / 48
  • 43. The Best Machine Learning Libraries in Python - contd.. Tensorflow - a high-level neural network library that helps you program your network architectures while avoiding the low-level details scikit-learn - The scikit-learn library is definitely one of, if not the most, popular ML libraries out there among all languages. It has a huge number of features for data mining and data analysis, making it a top choice for researches and developers alike. Theano - is a machine learning library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays, which can be a point of frustration for some developers in other libraries Anoop V.S Introduction to Data Science March 10, 2017 43 / 48
  • 44. The Best Machine Learning Libraries in Python - contd.. Pylearn2 - Most of Pylearn2’s functionality is actually built on top of Theano, so it has a pretty solid base. Pyevolve - Pyevolve provides a great framework to build and execute genetic algorithms and neural networks. Pattern - This is more of a ’full suite’ library as it provides not only some ML algorithms but also tools to help you collect and analyze data. The data mining portion helps you collect data from web services like Google, Twitter, and Wikipedia. The nice thing about including these tools is how easy it makes it to both collect and train on data in the same program. Anoop V.S Introduction to Data Science March 10, 2017 44 / 48
  • 45. Machine Learning & Big Data Analytics - The perfect marriage TWO Orthogonal Aspects !! Big Data - Handling massive data volumes ! Analytics / Machine Learning - Learning insights from data ! Can be combined so that it gives accurate, effective analysis !!! Anoop V.S Introduction to Data Science March 10, 2017 45 / 48
  • 46. Books I recommend for Machine Learning Anoop V.S Introduction to Data Science March 10, 2017 46 / 48
  • 47. Books I recommend for Big Data, Machine Learning Anoop V.S Introduction to Data Science March 10, 2017 47 / 48
  • 48. Thank you for not yawning ! Questions ? Anoop V.S Introduction to Data Science March 10, 2017 48 / 48