SlideShare a Scribd company logo
1 of 28
BIG DATA AND MACHINE
LEARNING
Big Data & IoT
Lecture #3
Umair Shafique (03246441789)
Scholar MS Information Technology - University of Gujrat
Table of contents
Define big data
Big data as
10V’s
Some Pros and
cons Of Big
Data
Perceived
Challenges of
Big Data
Define machine
learning
Real-world
examples
Working flow of
ML
types of ML
Challenges of
ML
Relate big data
with ML
Features of ML
with big data
Framework
based on ML for
big data
processing
Tools and
technologies for
big data and ML
Difference b/w
ML and Big data
Research
challenges and
open issues
Summary
References
What is Big Data?
Big Data is a collection of data that is
huge in volume, yet growing
exponentially with time. It is a data
with so large size and complexity that
none of traditional data management
tools can store it or process it
efficiently. Big data is also a data but
with huge size.
Who’s Generating Big Data?
The progress and innovation is no longer
hindered by the ability to collect data But,
by the ability to manage, analyze,
summarize, visualize, and discover
knowledge from the collected data in a
timely manner and in a scalable fashion.
Big data
as 10V’s:
Some Pros Of
Big Data :
Better decision-making
Increased productivity
Reduce costs
Improved customer service
Fraud detection
Greater innovation
Cons of big
data:
Need for talent
Data quality
Need for cultural change
Rapid change
Hardware needs
Costs
Perceived Challenges of Big Data
What is Machine
Learning?
Machine learning is an application
of AI that provides systems the
ability to learn on their own and
improve from experiences without
being programmed externally. If
your computer had machine
learning, it might be able to play
difficult parts of a game or solve a
complicated mathematical equation
for you.
Real world examples of machine learning
Machine learning is relevant in many fields, industries, and has the capability to grow over time. Here are
six real-life examples of how machine learning is being used.
1. Image recognition
Image recognition is a well-known and widespread example of machine learning in the real world. It can
identify an object as a digital image, based on the intensity of the pixels in black and white images or
colour images.
e.g.
• Label an x-ray as cancerous or not
• Assign a name to a photographed face (aka “tagging” on social media)
• Recognise handwriting by segmenting a single letter into smaller images
• Machine learning is also frequently used for facial recognition within an image. Using a database of
people, the system can identify commonalities and match them to faces. This is often used in law
enforcement.
2. Speech recognition
Machine learning can translate speech into text. Certain software applications can convert live voice and recorded
speech into a text file. The speech can be segmented by intensities on time-frequency bands as well.
• Voice search
• Voice dialling
• Appliance control
• Some of the most common uses of speech recognition software are devices like Google Home or Amazon Alexa.
3. Medical diagnosis
Machine learning can help with the diagnosis of diseases. Many physicians use chatbots with speech recognition
capabilities to discern patterns in symptoms.
• Assisting in formulating a diagnosis or recommends a treatment option
• Oncology and pathology use machine learning to recognise cancerous tissue
• Analyse bodily fluids
• In the case of rare diseases, the joint use of facial recognition software and machine learning helps scan patient
photos and identify phenotypes that correlate with rare genetic diseases.
4. Predictive analytics
Machine learning can classify available data into groups, which are then defined by rules set by analysts. When the
classification is complete, the analysts can calculate the probability of a fault.
• Predicting whether a transaction is fraudulent or legitimate
• Improve prediction systems to calculate the possibility of fault
• Predictive analytics is one of the most promising examples of machine learning. It's applicable for everything;
from product development to real estate pricing.
5. Extraction
Machine learning can extract structured information from unstructured data. Organizations amass huge volumes of
data from customers. A machine learning algorithm automates the process of annotating datasets for predictive
analytics tools.
• Generate a model to predict vocal cord disorders
• Develop methods to prevent, diagnose, and treat the disorders
• Help physicians diagnose and treat problems quickly
• Typically, these processes are tedious. But machine learning can track and extract information to obtain billions
of data samples.
How Machine Learning Works?
Consider a system with input data that contains photos of various kinds of fruits. You want the system to
group the data according to the different types of fruits.
First, the system will analyze the input data. Next, it tries to find patterns, like shapes, size, and color. Based
on these patterns, the system will try to predict the different types of fruit and segregate them. Finally, it
keeps track of all the decisions it made during the process to ensure it is learning. The next time you ask
the same system to predict and segregate the different types of fruits, it won't have to go through the
entire process again. That’s how machine learning works.
Types of Machine
Learning
• Supervised machine learning: You supervise the machine
while training it to work on its own. This requires labeled
training data
• Unsupervised learning: There is training data, but it won’t
be labeled
• Reinforcement learning: The system learns on its own
Supervised Learning
To understand how supervised learning works, look at the example
below, where you have to train a model or system to recognize an
apple.
• First, you have to provide a data set that contains pictures of a
kind of fruit, e.g., apples.
• Then, provide another data set that lets the model know that
these are pictures of apples. This completes the training phase.
• Next, provide a new set of data that only contains pictures of
apples. At this point, the system can recognize what the fruit it is and
will remember it.
• That's how supervised learning works. You are training the model
to perform a specific operation on its own. This kind of model is
often used in filtering spam mail from your email accounts.
Supervised learning include:
Classification: A typical supervised learning is a classification. The spam filter that we spoke
above is one such example. It is trained with many example emails along with its class (Spam,
Not-Spam) and then works automatically in classifying new emails.
Used for:
• Spam filtering
• Sentiment analysis
• Recognition of handwritten characters and numbers
• Fraud detection
Popular algorithms: Naive Bayes, Decision Tree, Linear Regression, Logistic Regression, K-Nearest
Neighbors, Support Vector Machine, Neural Networks
Regression: Regression is basically a classification where we forecast a number instead of
category. Examples are car price by its mileage, traffic by time of the day, demand volume by the
growth of the company, etc. Regression is perfect when something depends on time.
• Used for:
• Stock price forecasts
• Demand and sales volume analysis
• Medical diagnosis
• Any number-time correlations
Unsupervised
Learning
consider a cluttered dataset: a collection of pictures of different fruit.
You feed this data to the model, and the model analyzes it to
recognize any patterns. In the end, the machine categorizes the
photos into three types, as shown in the image, based on their
similarities. Flipkart uses this model to find and recommend products
that are well suited for you.
It include:
• Clustering: Clustering algorithm tries to find similar (by some
features) objects and merge them in a cluster. Those that have lots of
similar features are joined in one class. With some algorithms, you
even can specify the exact number of clusters you want.
Used:
• For market segmentation (types of customers, loyalty)
• For image compression
• To analyze and label new data
• To detect abnormal behavior
Popular Clustering algorithms are:
• K-Means
Reinforcement
Learning
Used today for:
• Replacement of all algorithms above
• Object identification of photos and videos
• Speech recognition and synthesis
• Image processing, style transfer
• Machine translation
Main Challenges of Machine
Learning
• Poor-Quality Data
• Irrelevant Features
• Testing and Validating
Big Data & Machine Learning (How Do They
Relate?)
According to recap, Big data refers to vast amounts of data that traditional storage
methods cannot handle. Machine learning is the ability of computer systems to learn to
make predictions from observations and data. Machine learning can use the information
provided by the study of big data to generate valuable business insights.
Machine learning tools use data-driven algorithms and statistical models to analyze data
sets and then draw inferences from identified patterns or make predictions based on them.
The algorithms learn from the data as they run against it, as opposed to traditional rules-
based analytics systems that follow explicit instructions.
Big data provides ample amounts of raw material from which machine learning systems
can derive insights. By combining them, organizations are producing significant analytics
findings and results.
Features of Machine Learning with
Big Data
•Sparse Representation
•Mining Structured Relations
•High Scalability and High Speed.
Reference Framework Based on Machine Learning for Big Data Processing
Big data processing procedure with
machine learning:
We suppose the big data processing procedure mainly consists of the following four
phases:
• pre-processing phase
• analysis phase
• model establishment phase
• model updating phase
Tools and technologies for big
data and ML:
Snowflake Data
Science
Matplotlib TensorFlow Bigml Apache Spark Knime Cloudera
Key difference b/w Big data and ML:
Summary of lecture
• In this lecture , we firstly provided an overview about big data and summarized the characteristics of big data.
• Then give over wiew on machine learing. In order to highlight the differences of machine learning techniques in the context of
big data, we then analyzed the new features of machine learning with big data.
• Next we relate big data and machine learning .
• We also proposed a reference framework for processing big data based on machine learning techniques with the power of
distributed storage and parallel computing. Finally, we presented several research challenges and open issues.
• We hope that this lecture can stimulate more interest in research and development of techniques based on machine learning for
big data processing.
References
• https://towardsdatascience.com/machine-learning-and-big-data-
real-world-applications-3ba3a3345cf5
• https://www.salesforce.com/eu/blog/2020/06/real-world-
examples-of-machine-learning.html
• https://www.google.com/amp/s/www.techtarget.com/searchbus
inessanalytics/tip/Big-data-vs-machine-learning-How-they-differ-
and-relate%3famp=1
• https://geekflare.com/big-data-tools-for-data-scientist/
• https://www.salesforce.com/eu/blog/2020/06/real-world-
examples-of-machine-learning.html
• https://towardsdatascience.com/machine-learning-and-big-data-
real-world-applications-3ba3a3345cf5

More Related Content

What's hot

Predictive model based on Supervised ML
Predictive model based on Supervised MLPredictive model based on Supervised ML
Predictive model based on Supervised MLUmeshchandraYadav5
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data ScienceChristophe Bontemps
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShrey Malik
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsMichał Łopuszyński
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial IntelligenceSuman Srinivasan
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data sciencenilashri2
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Introduction to Machine Learning
Introduction to Machine Learning   Introduction to Machine Learning
Introduction to Machine Learning snehal_152
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansStat Analytica
 

What's hot (20)

Predictive model based on Supervised ML
Predictive model based on Supervised MLPredictive model based on Supervised ML
Predictive model based on Supervised ML
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Data Visualisation for Data Science
Data Visualisation for Data ScienceData Visualisation for Data Science
Data Visualisation for Data Science
 
Data mining
Data miningData mining
Data mining
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data science
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Introduction to Machine Learning
Introduction to Machine Learning   Introduction to Machine Learning
Introduction to Machine Learning
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 

Similar to BIG DATA AND MACHINE LEARNING

Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsShrutika Oswal
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1Amruta Aphale
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfTemok IT Services
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docxJadhavArjun2
 
INTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptxINTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptxsrikanthkallem1
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Understanding The Pattern Of Recognition
Understanding The Pattern Of RecognitionUnderstanding The Pattern Of Recognition
Understanding The Pattern Of RecognitionRahul Bedi
 
UNIT III SUPERVISED LEARNING.pptx
UNIT III SUPERVISED LEARNING.pptxUNIT III SUPERVISED LEARNING.pptx
UNIT III SUPERVISED LEARNING.pptxKowsalyaG17
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognitionMinigranth
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine LearningKnoldus Inc.
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxJayChauhan100
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxDr.Shweta
 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptxNaveenkushwaha18
 

Similar to BIG DATA AND MACHINE LEARNING (20)

Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domains
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docx
 
INTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptxINTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Lab 7.pptx
Lab 7.pptxLab 7.pptx
Lab 7.pptx
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Understanding The Pattern Of Recognition
Understanding The Pattern Of RecognitionUnderstanding The Pattern Of Recognition
Understanding The Pattern Of Recognition
 
UNIT III SUPERVISED LEARNING.pptx
UNIT III SUPERVISED LEARNING.pptxUNIT III SUPERVISED LEARNING.pptx
UNIT III SUPERVISED LEARNING.pptx
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine Learning
 
ML_Module_1.pdf
ML_Module_1.pdfML_Module_1.pdf
ML_Module_1.pdf
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptx
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptx
 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptx
 

More from Umair Shafique

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data PreparationUmair Shafique
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With HadoopUmair Shafique
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING RUmair Shafique
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big DataUmair Shafique
 

More from Umair Shafique (6)

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Pre-Processing and Data Preparation
Pre-Processing and Data PreparationPre-Processing and Data Preparation
Pre-Processing and Data Preparation
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING R
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 

Recently uploaded

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

BIG DATA AND MACHINE LEARNING

  • 1. BIG DATA AND MACHINE LEARNING Big Data & IoT Lecture #3 Umair Shafique (03246441789) Scholar MS Information Technology - University of Gujrat
  • 2. Table of contents Define big data Big data as 10V’s Some Pros and cons Of Big Data Perceived Challenges of Big Data Define machine learning Real-world examples Working flow of ML types of ML Challenges of ML Relate big data with ML Features of ML with big data Framework based on ML for big data processing Tools and technologies for big data and ML Difference b/w ML and Big data Research challenges and open issues Summary References
  • 3. What is Big Data? Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size.
  • 4. Who’s Generating Big Data? The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion.
  • 6. Some Pros Of Big Data : Better decision-making Increased productivity Reduce costs Improved customer service Fraud detection Greater innovation
  • 7. Cons of big data: Need for talent Data quality Need for cultural change Rapid change Hardware needs Costs
  • 9. What is Machine Learning? Machine learning is an application of AI that provides systems the ability to learn on their own and improve from experiences without being programmed externally. If your computer had machine learning, it might be able to play difficult parts of a game or solve a complicated mathematical equation for you.
  • 10. Real world examples of machine learning Machine learning is relevant in many fields, industries, and has the capability to grow over time. Here are six real-life examples of how machine learning is being used. 1. Image recognition Image recognition is a well-known and widespread example of machine learning in the real world. It can identify an object as a digital image, based on the intensity of the pixels in black and white images or colour images. e.g. • Label an x-ray as cancerous or not • Assign a name to a photographed face (aka “tagging” on social media) • Recognise handwriting by segmenting a single letter into smaller images • Machine learning is also frequently used for facial recognition within an image. Using a database of people, the system can identify commonalities and match them to faces. This is often used in law enforcement.
  • 11. 2. Speech recognition Machine learning can translate speech into text. Certain software applications can convert live voice and recorded speech into a text file. The speech can be segmented by intensities on time-frequency bands as well. • Voice search • Voice dialling • Appliance control • Some of the most common uses of speech recognition software are devices like Google Home or Amazon Alexa. 3. Medical diagnosis Machine learning can help with the diagnosis of diseases. Many physicians use chatbots with speech recognition capabilities to discern patterns in symptoms. • Assisting in formulating a diagnosis or recommends a treatment option • Oncology and pathology use machine learning to recognise cancerous tissue • Analyse bodily fluids • In the case of rare diseases, the joint use of facial recognition software and machine learning helps scan patient photos and identify phenotypes that correlate with rare genetic diseases.
  • 12. 4. Predictive analytics Machine learning can classify available data into groups, which are then defined by rules set by analysts. When the classification is complete, the analysts can calculate the probability of a fault. • Predicting whether a transaction is fraudulent or legitimate • Improve prediction systems to calculate the possibility of fault • Predictive analytics is one of the most promising examples of machine learning. It's applicable for everything; from product development to real estate pricing. 5. Extraction Machine learning can extract structured information from unstructured data. Organizations amass huge volumes of data from customers. A machine learning algorithm automates the process of annotating datasets for predictive analytics tools. • Generate a model to predict vocal cord disorders • Develop methods to prevent, diagnose, and treat the disorders • Help physicians diagnose and treat problems quickly • Typically, these processes are tedious. But machine learning can track and extract information to obtain billions of data samples.
  • 13. How Machine Learning Works? Consider a system with input data that contains photos of various kinds of fruits. You want the system to group the data according to the different types of fruits. First, the system will analyze the input data. Next, it tries to find patterns, like shapes, size, and color. Based on these patterns, the system will try to predict the different types of fruit and segregate them. Finally, it keeps track of all the decisions it made during the process to ensure it is learning. The next time you ask the same system to predict and segregate the different types of fruits, it won't have to go through the entire process again. That’s how machine learning works.
  • 14. Types of Machine Learning • Supervised machine learning: You supervise the machine while training it to work on its own. This requires labeled training data • Unsupervised learning: There is training data, but it won’t be labeled • Reinforcement learning: The system learns on its own
  • 15. Supervised Learning To understand how supervised learning works, look at the example below, where you have to train a model or system to recognize an apple. • First, you have to provide a data set that contains pictures of a kind of fruit, e.g., apples. • Then, provide another data set that lets the model know that these are pictures of apples. This completes the training phase. • Next, provide a new set of data that only contains pictures of apples. At this point, the system can recognize what the fruit it is and will remember it. • That's how supervised learning works. You are training the model to perform a specific operation on its own. This kind of model is often used in filtering spam mail from your email accounts.
  • 16. Supervised learning include: Classification: A typical supervised learning is a classification. The spam filter that we spoke above is one such example. It is trained with many example emails along with its class (Spam, Not-Spam) and then works automatically in classifying new emails. Used for: • Spam filtering • Sentiment analysis • Recognition of handwritten characters and numbers • Fraud detection Popular algorithms: Naive Bayes, Decision Tree, Linear Regression, Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Neural Networks Regression: Regression is basically a classification where we forecast a number instead of category. Examples are car price by its mileage, traffic by time of the day, demand volume by the growth of the company, etc. Regression is perfect when something depends on time. • Used for: • Stock price forecasts • Demand and sales volume analysis • Medical diagnosis • Any number-time correlations
  • 17. Unsupervised Learning consider a cluttered dataset: a collection of pictures of different fruit. You feed this data to the model, and the model analyzes it to recognize any patterns. In the end, the machine categorizes the photos into three types, as shown in the image, based on their similarities. Flipkart uses this model to find and recommend products that are well suited for you. It include: • Clustering: Clustering algorithm tries to find similar (by some features) objects and merge them in a cluster. Those that have lots of similar features are joined in one class. With some algorithms, you even can specify the exact number of clusters you want. Used: • For market segmentation (types of customers, loyalty) • For image compression • To analyze and label new data • To detect abnormal behavior Popular Clustering algorithms are: • K-Means
  • 18. Reinforcement Learning Used today for: • Replacement of all algorithms above • Object identification of photos and videos • Speech recognition and synthesis • Image processing, style transfer • Machine translation
  • 19. Main Challenges of Machine Learning • Poor-Quality Data • Irrelevant Features • Testing and Validating
  • 20. Big Data & Machine Learning (How Do They Relate?) According to recap, Big data refers to vast amounts of data that traditional storage methods cannot handle. Machine learning is the ability of computer systems to learn to make predictions from observations and data. Machine learning can use the information provided by the study of big data to generate valuable business insights. Machine learning tools use data-driven algorithms and statistical models to analyze data sets and then draw inferences from identified patterns or make predictions based on them. The algorithms learn from the data as they run against it, as opposed to traditional rules- based analytics systems that follow explicit instructions. Big data provides ample amounts of raw material from which machine learning systems can derive insights. By combining them, organizations are producing significant analytics findings and results.
  • 21. Features of Machine Learning with Big Data •Sparse Representation •Mining Structured Relations •High Scalability and High Speed.
  • 22. Reference Framework Based on Machine Learning for Big Data Processing
  • 23. Big data processing procedure with machine learning: We suppose the big data processing procedure mainly consists of the following four phases: • pre-processing phase • analysis phase • model establishment phase • model updating phase
  • 24. Tools and technologies for big data and ML: Snowflake Data Science Matplotlib TensorFlow Bigml Apache Spark Knime Cloudera
  • 25. Key difference b/w Big data and ML:
  • 26.
  • 27. Summary of lecture • In this lecture , we firstly provided an overview about big data and summarized the characteristics of big data. • Then give over wiew on machine learing. In order to highlight the differences of machine learning techniques in the context of big data, we then analyzed the new features of machine learning with big data. • Next we relate big data and machine learning . • We also proposed a reference framework for processing big data based on machine learning techniques with the power of distributed storage and parallel computing. Finally, we presented several research challenges and open issues. • We hope that this lecture can stimulate more interest in research and development of techniques based on machine learning for big data processing.
  • 28. References • https://towardsdatascience.com/machine-learning-and-big-data- real-world-applications-3ba3a3345cf5 • https://www.salesforce.com/eu/blog/2020/06/real-world- examples-of-machine-learning.html • https://www.google.com/amp/s/www.techtarget.com/searchbus inessanalytics/tip/Big-data-vs-machine-learning-How-they-differ- and-relate%3famp=1 • https://geekflare.com/big-data-tools-for-data-scientist/ • https://www.salesforce.com/eu/blog/2020/06/real-world- examples-of-machine-learning.html • https://towardsdatascience.com/machine-learning-and-big-data- real-world-applications-3ba3a3345cf5

Editor's Notes

  1. Better decision-making: In the NewVantage Partners survey, 36.2 percent of respondents said that better decision-making was the number one goal of their big data analytics efforts. In addition, 84.1 percent had started working toward that goal, and 59.0 percent had experienced some measurable success, for an overall success rate of 69.0 percent. Analytics can give business decision-makers the data-driven insights they need to help their companies compete and grow. Increased productivity: A separate survey from vendor Syncsort found that 59.9 percent of respondents were using big data tools like Hadoop and Spark to increase business user productivity. Modern big data tools are allowing analysts to analyze more data, more quickly, which increases their personal productivity. In addition, the insights gained from those analytics often allow organizations to increase productivity more broadly throughout the company. Reduce costs: Both the Syncsort and the NewVantage surveys found that big data analytics were helping companies decrease their expenses. Nearly six out of ten (59.4 percent) respondents told Syncsort big data tools had helped them increase operational efficiency and reduce costs, and about two thirds (66.7 percent) of respondents to the NewVantage survey said they had started using big data to decrease expenses. Interestingly, however, only 13.0 percent of respondents selected cost reduction as their primary goal for big data analytics, suggesting that for many this is merely a very welcome side benefit. Improved customer service: Among respondents to the NewVantage survey, improving customer service was the second most common primary goal for big data analytics projects, and 53.4 percent of companies had experienced some success in this regard. Social media, customer relationship management (CRM) systems and other points of customer contact give today’s enterprises a wealth of information about their customers, and it is only natural that they would use this data to better serve those customers. Fraud detection: Another common use for big data analytics — particularly in the financial services industry — is fraud detection. One of the big advantages of big data analytics systems that rely on machine learning is that they are excellent at detecting patterns and anomalies. These abilities can give banks and credit card companies the ability to spot stolen credit cards or fraudulent purchases, often before the cardholder even knows that something is wrong. Greater innovation: Innovation is another common benefit of big data, and the NewVantage survey found that 11.6 percent of executives are investing in analytics primarily as a means to innovate and disrupt their markets. They reason that if they can glean insights that their competitors don’t have, they may be able to get out ahead of the rest of the market with new products and services.
  2. Need for talent: Data scientists and big data experts are among the most highly coveted —and highly paid — workers in the IT field. The AtScale survey found that the lack of a big data skill set has been the number one big data challenge for the past three years. And in the Syncsort survey, respondents ranked skills and staff as the second biggest challenge when creating a data lake. Hiring or training staff can increase costs considerably, and the process of acquiring big data skills can take considerable time. Data quality:In the Syncsort survey, the number one disadvantage to working with big data was the need to address data quality issues. Before they can use big data for analytics efforts, data scientists and analysts need to ensure that the information they are using is accurate, relevant and in the proper format for analysis. That slows the reporting process considerably, but if enterprises don’t address data quality issues, they may find that the insights generated by their analytics are worthless — or even harmful if acted upon. Need for cultural change: Many of the organizations that are utilizing big data analytics don’t just want to get a little bit better at reporting, they want to use analytics to create a data-driven culture throughout the company. In fact, in the NewVantage survey, a full 98.6 percent of executives said that their firms were in the process of creating this new type of corporate culture. However, changing culture is a tall order. So far, only 32.4 percent were reporting success on this front. Rapid change: Another potential drawback to big data analytics is that the technology is changing rapidly. Organizations face the very real possibility that they will invest in a particular technology only to have something much better come along a few months later. Syncsort respondents ranked this disadvantage of big data fourth among all the potential challenges they faced. Hardware needs: Another significant issue for organizations is the IT infrastructure necessary to support big data analytics initiatives. Storage space to house the data, networking bandwidth to transfer it to and from analytics systems, and compute resources to perform those analytics are all expensive to purchase and maintain. Some organizations can offset this problem by using cloud-based analytics, but that usually doesn’t eliminate the infrastructure problems entirely. Costs: Many of today’s big data tools rely on open source technology, which dramatically reduces software costs, but enterprises still face significant expenses related to staffing, hardware, maintenance and related services. It’s not uncommon for big data analytics initiatives to run significantly over budget and to take more time to deploy than IT managers had originally anticipated.
  3. Main Challenges of Machine Learning: In short, since our main task is to select a learning algorithm and train it on some data, the two things that can go wrong are “bad algorithm” and “bad data.” Machine Learning is not quite there yet; it takes a lot of data for most Machine Learning algorithms to work properly. Poor-Quality Data: Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor quality measurements), it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well. Irrelevant Features: Your system will only be capable of learning if the training data contains enough relevant features and not too many irrelevant ones. Testing and Validating The only way to know how well a model will generalize to new cases is to try it out on new cases. The recommended option is to split your data into two sets: the training set and the test set. As these names imply, you train the model using the training set, and you test it using the test set. The error rate on new cases is called the generalization error and by evaluating your model on the test set, you get an estimate of this error. This value tells you how well your model will perform on instances it has never seen before.
  4. In this section, We will highlight three aspects of abilities that are useful to deal with big data problems for machine learning techniques in detail, i.e., sparse representation and feature selection, mining structured relations, high scalability and high speed. Sparse Representation For the high-dimensional data, it is difficult to handle by using traditional data processing methods. Therefore, effective dimension reduction is increasingly viewed as a necessary step in dealing with these problems. In terms of high-dimensional big data, we highlight the feature selection and sparse representation methods for machine learning techniques, which are two commonly adopted approaches in dealing with high-dimensional data. Feature selection is a key issue in building robust data processing models through the process of selecting a subset of meaningful features. It should be able to help visualize the data, to construct better statistical models, and improve prediction accuracy through mapping the high dimensional data into the underlying low dimensional manifold. And for high-dimensional big data, a sparse data representation is more and more important for many algorithms. Mining Structured Relations Big data is generally from different sources with obviously heterogeneous types including structured, unstructured and semi-structured representation forms.Dealing with such a heterogeneous dataset, the great challenge is perceivable, thus machine learning system needs infer the structure behind the data when it is not known beforehand. One way of structuring data is to discover the relevance based on inherent data properties through structured learning and structured prediction. The main purpose of mining structured relations from a set of data is to aggregate massive amounts of data and divide it into smaller chunks which can be easily handled by machine learning systems. High Scalability and High Speed. The unprecedented volumes of big data require quite high scalability of their data mining and processing tools. In current researches, the techniques which are used to enhance the scalability issue of machine learning algorithms mainly focus on the following two aspects: i) the scalability of cloud computing makes it possible to analyze enormous datasets, which aggregates multiple workloads with varying performance goalsinto multi-tenanted computing clusters. Machine learning with cloud computing owns more efficient and higher performance for processing and analyzing big data; ii) distributed storage and parallel computing have helped to solve machine learning algorithms’ scalability problems. A useful approach to boost the speed of big data processing is through maximally identifying and exploiting the potential parallelism in the machine learning algorithms. High scalability and high speed can give machine learning high power to handle big data
  5. pre-processing phase Because data sources almost cover all different kinds of domains, raw big data collecting from the environment are greatly complex and has tremendous redundancies. Therefore, we need delete the invalid and dirty data at first in pre-processing phase In addition, we frequently have to face massive uncertain and incomplete data in real life and we need append some important attributes to improve their processing practicability analysis phase After raw data pre-processing phase, we need analyze these valid and useful data to find out how to utilize the data through trial and error. Data visualization is a fundamental problem in the analysis of big data, and we can adopt sparse representation to achieve effective dimension reduction for the high-dimensional data model establishment phase Through essential parameters analysis, we should be able to select some important features to establish the feasible model for dealing with real problems. In terms of model establishment phase, we try to mine the structured relations between data to obtain statistical information and trend at first, and then split data into training and testing sets model updating phase In the end, we can decide what kind of model should be generated for utilization and build up the corresponding model. While the model is established, we need configure parameters for the model and apply the generated model obtained from the model establishment phase into actual operations to test the performance of the big data processing model. In this phase, we emphasize the input data is real-time. We should make dynamic adjustments to update the model based on effects of model application In terms of the four phases in the procedure of big data processing, the anterior three phases are offline processing. In these phases, we are able to adopt offline learning methods which include two categories of supervised learning and unsupervised learning. In the model testing and updating phase, we mainly focus on the real-time characteristic of input data. To deal with the problem of real-time processing, online learning methods are necessary and the reinforcement learning is preferred.