SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
BIG DATA ANALYTICS FOR
REAL TIME SYSTEMS
Kamalika Dutta Manasi Jayapal
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
2 Big Data Analytics for Real Time Systems
Overview
3 Big Data Analytics for Real Time Systems
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
Where does Big Data come from?
4 Big Data Analytics for Real Time Systems
Courtesy: http://goo.gl/JWswfj
What makes it Big Data?
5 Big Data Analytics for Real Time Systems
Courtesy: Oracle
VARIABILITY
Evolution of Big Data
6 Big Data Analytics for Real Time Systems
1960s 1967
Automatic Data
Compression
1997
Information Explosion
Our Literature Survey!
Overview
7 Big Data Analytics for Real Time Systems
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
Big Data Analytics
“Big data analytics is the process of examining large data sets to
uncover hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.“
8 Big Data Analytics for Real Time Systems
 Predictive Analysis
 Text Analysis
 Data Mining
 Statistical Analysis
Courtesy: smartdatacollective.com
Sample Systems
9 Big Data Analytics for Real Time Systems
Analytics & 3 V‘s
10 Big Data Analytics for Real Time Systems
Courtesy: watalon.com
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
11 Big Data Analytics for Real Time Systems
Real Time Systems
“A real-time system is one that processes information and produces a
response within a specified time, else risk severe consequences,
sometimes including failure.“
12 Big Data Analytics for Real Time Systems
 Telecommunication
Systems
 Anti-Lock Brakes in a Car
 Air Traffic Control System
 Weather Forecasting
System
Courtesy: yourdon.com
Real-Time Analytics of Big Data
13 Big Data Analytics for Real Time Systems
What is Happening?
Kilobytes/
Sec
Megabytes/
Sec
Gigabytes 
Terabytes
Petabytes 
Exabytes
Seconds Milliseconds Minutes
Minutes 
Hours
Big Data
Real Time
Courtesy: infochimps.com
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
14 Big Data Analytics for Real Time Systems
Challenges of Real Time Analytics
15 Big Data Analytics for Real Time Systems
Expensive
Complex Architecture, Batch Processing
Semi and Unstructured Data: New Sources are unpredictable; Relational
databases are not capable, leaving us hamstrung
Market too Dynamic to Predict: Subscribers preferences change; competition
adds acceleration to it
Scalability: Requires sub-second response times; more than a single server can
handle
Thinking Beyond Hadoop!
16 Big Data Analytics for Real Time Systems
Manage & store huge
volume of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation
Courtesy: IBM
Our Solution
 Do the impossible: Incorporate any kind
of data
 Scale Big: Scale without any complexity
 Not Time Consuming: Seconds to
Minutes
 Real Time: Try to analyze data without
expensive data warehouse loads
17 Big Data Analytics for Real Time Systems
Powerful Analytics, In Place, In Real Time.
Courtesy: slideshare.com
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
18 Big Data Analytics for Real Time Systems
In-Memory Computing
In-memory computing primarily relies on keeping data in a server's RAM as a
means of processing at faster speeds. It uses a type of middleware software that
allows one to store data in RAM, across a cluster of computers, and process it in
parallel.
19 Big Data Analytics for Real Time Systems
Courtesy: Stratecast
Stream Processing
20 Big Data Analytics for Real Time Systems
Courtesy: EMC
 Stream-processing systems operate on continuous data streams e.g., click
streams on web pages, user request/query streams, monitoring events,
notifications, etc.
 Stream processing delivers real-time analytic processing on constantly changing
data in motion.
 Analyse first store later!
Complex Event Processing
Complex Event Processing (CEP) processes multiple event streams generated
within the enterprise to construct data abstraction and identify meaningful
patterns among those streams.
21 Big Data Analytics for Real Time Systems
 Analytics across both real-time and historical data.
 Real-time event capture, filtering, pattern detection, matching, and
aggregation.
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
22 Big Data Analytics for Real Time Systems
Tools for Real Time Analytics
Big Data is NOT new, the Tools ARE!
23 Big Data Analytics for Real Time Systems
IBM InfoSphere Streams
Kafka
 A high performance distributed publish-subscribe messaging system.
 Designed for processing of real time activity stream data.
 Initially developed at LinkedIn, now part of Apache.
 Kafka works in combination with Apache Storm, Apache HBase and Apache
Spark for real-time analysis and rendering of streaming data.
24 Big Data Analytics for Real Time Systems
 Fast
 Scalable
 Durable
 Fault-tolerant
Storm
 A highly distributed real-time computation system.
 Acquired by Twitter.
 Twitter claims, “Over a million tuples processed per second per node.”
 Fast, Scalable, Reliable and Fault-tolerant.
25 Big Data Analytics for Real Time Systems
 Stream: Unbounded
sequence of tuples
 Primitives
 Spouts: Pull messages
 Bolts: Perform core
functions of stream
computing
Stream
Spark Streaming
 Was developed in the AMPLab at
UC Berkeley.
 In-memory computing
capabilities deliver speed.
 Low latency
 High throughput
 Fault tolerant
 New programing model:
 Discretized streams (Dstreams)
 Resilient Distributed Datasets
26 Big Data Analytics for Real Time Systems
Spark Streaming uses micro-batching to support continuous stream processing. It is
an extension of Spark which is a batch-processing system.
Courtesy: Apache Spark
Spring XD (XD=eXtreme Data)
 Spring XD is a unified, distributed, and extensible system for data ingestion, real
time analytics, batch processing, and data export.
 Spring XD framework supports streams for the ingestion of event driven data
from a source to a sink that passes through any number of processors.
27 Big Data Analytics for Real Time Systems
Courtesy: Infoq
Comparison of Tools (1)
Spark Streaming Apache Storm Spring XD
Definition
A fast and general purpose
cluster computing system.
A distributed real-time
computation system.
A unified, distributed, and
extensible system for data
ingestion, real time analytics,
batch processing, and data
export.
Implemented in Scala Clojure Java
Programming API Scala, Java, Python
Java API and usable with any
programing language.
Java
Development A full top level Apache project. Undergoing Apache project. Spring project by Pivotal.
Processing Model
Batch processing framework
that also does micro-batching.
Stream Processing Framework
that processes and dispatches
messages as soon as they
arrive.
Unified platform for stream
processing.
Fault Tolerance
Recovery of lost work and
restart of workers via the
resource manager.
Restart of Workers,
Supervisors like nothing ever
happened.
Reassignment of work to
container working.
28 Big Data Analytics for Real Time Systems
Comparison of Tools (2)
Spark Streaming Apache Storm Spring XD
Data processing
Messages are not lost and
delivered once. (Small-scale
batching)
Keeps track of each and every
record.
Unacknowledged messages
are retried until the
container comes back.
Use Cases
• Combines batch and
stream processing
(Lambda Architecture).
• Machine Learning:
Improve performance of
iterative algorithms
• Power Real-time
Dashboards.
Prevention of:
• securities fraud
• compliance violations
• security breaches
• network outage
• Stream tweets to
Hadoop for sentiment
analysis.
• High throughput
distributed data
ingestion into HDFS from
a variety of input
sources.
• Real-time analytics at
ingestion time, e.g.
gathering metrics and
counting values.
29 Big Data Analytics for Real Time Systems
Which tools are right for you?
30 Big Data Analytics for Real Time Systems
Lambda Architecture
31 Big Data Analytics for Real Time Systems
 In 2013, Nathan Marz and James Warren proposed the Lambda Architecture
that attempts to provide a methodology to build a Big Data system.
 Such a system would balance latency, throughput, and fault-tolerance by
using batch processing to provide comprehensive and accurate pre-computed
views, while simultaneously using real-time stream processing to provide
dynamic views.
Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.
Courtesy: Trivadis
Lambda Architecture Example
32 Big Data Analytics for Real Time Systems
Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.
Courtesy: Trivadis
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
33 Big Data Analytics for Real Time Systems
Use Cases
34 Big Data Analytics for Real Time Systems
 Healthcare
 Capture and analyze real-time data from medical monitors,
alerting hospital staff to potential health problems before patients
manifest clinical signs of infection or other issues.
 Analyze privacy-protected streams of medical device data to
detect early signs of disease, identify correlations among multiple
patients.
 Finance
 Analyze ticks, tweets, satellite imagery, weather trends, and any
other type of data to inform trading algorithms in real time.
 Apply fraud insights to take action in real time. Use analytics on
streaming data to confidently differentiate legitimate actions,
while preventing or interrupting suspicious actions and respond
immediately to criminal patterns and activities.
Use Cases
35 Big Data Analytics for Real Time Systems
 Government
 Identify social program fraud within seconds based on program
history, citizen profile, and geospatial data.
 Identify items or patterns for deeper investigation in Cyber-
security.
 Transport
 Traffic managers can now respond quickly and accurately to
relevant insights from real-time analytics drawn from data feeds
and reports.
 Telematics can provide data-in-motion such as vehicle speed, data
relating to the transmission control system, braking, air bags, tire
pressure and wiper speed as well as geospatial and current
environmental conditions data. Hence, automotive companies can
strengthen customer relationships
Use Cases
36 Big Data Analytics for Real Time Systems
 Telecommunication
 Improve customer profitability analysis, end-to-end visibility for
new product rollouts and real-time analysis for better the network
customers.
 Perform capacity planning for mobile networks as new high-
bandwidth services are introduced. Improve customer experience.
 Retail
 See a product recurring in abandoned shopping carts. Run a
promotion to close more sales of that product.
 Evaluate sales performance in real time. Take measures now to
achieve sales quotas.
 An electric coupon delivery service sends e-mails to customers
with recommendations matched to their interests derived from
their location information, membership information, and
information on nearby stores.
37 Big Data Analytics for Real Time Systems
Courtesy: SAP
Overview
 Introduction
 Big Data Analytics
 Real Time Systems
 Challenges of Real Time Analytics
 Technologies
 Tools
 Use Cases
 Future Work and Conclusion
38 Big Data Analytics for Real Time Systems
Future Work
 Increased Level of Merging
 Application of Social and Digital Media
 New Technologies
 Further Development of Telemetric Data
 Self Learning Systems
 Complex Statistical Methods
39 Big Data Analytics for Real Time Systems
Conclusion
40 Big Data Analytics for Real Time Systems
Resources
Privacy Security
TimeCost
“Consumer Data will be the biggest differentiator in the next two to three years.
Whoever unlocks the reams of data and uses it strategically, will win”
-Angela Ahrendts, CEO, Burberry
?
41 Big Data Analytics for Real Time Systems

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Pagerank and hits
Pagerank and hitsPagerank and hits
Pagerank and hits
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning Algorithms
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 

Andere mochten auch

Liberating medical device data for clinical research
Liberating medical device data for clinical researchLiberating medical device data for clinical research
Liberating medical device data for clinical research
John Zaleski
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Accenture
 
Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3
Mannu Amrit
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
Stefan Schwarz
 

Andere mochten auch (20)

2016 IBM Interconnect - medical devices transformation
2016 IBM Interconnect  - medical devices transformation2016 IBM Interconnect  - medical devices transformation
2016 IBM Interconnect - medical devices transformation
 
Augmented Reality for E-Learning
Augmented Reality for E-LearningAugmented Reality for E-Learning
Augmented Reality for E-Learning
 
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
Business Procedure Modelling and Digitization Toolbox - Master Thesis - Kamal...
 
Augmented Reality in Education
Augmented Reality in Education Augmented Reality in Education
Augmented Reality in Education
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Curved Arrows Flat
Curved Arrows FlatCurved Arrows Flat
Curved Arrows Flat
 
Liberating medical device data for clinical research
Liberating medical device data for clinical researchLiberating medical device data for clinical research
Liberating medical device data for clinical research
 
Web, gaming, mobile : quel développeur serez-vous demain ?
Web, gaming, mobile : quel développeur serez-vous demain ?Web, gaming, mobile : quel développeur serez-vous demain ?
Web, gaming, mobile : quel développeur serez-vous demain ?
 
Rd big data & analytics v1.0
Rd big data & analytics v1.0Rd big data & analytics v1.0
Rd big data & analytics v1.0
 
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analyticsAst 0060878 wayne-eckerson_research_report_big_data_analytics
Ast 0060878 wayne-eckerson_research_report_big_data_analytics
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural Perspective
 
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
Innovation Diffusion: a (Big) Data-driven approach to the study of the geogra...
 
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
Use of Chemical Characterization to Assess the Equivalency of Medical Devices...
 
Meta Analysis of Medical Device Data Applications for Designing Studies and R...
Meta Analysis of Medical Device Data Applications for Designing Studies and R...Meta Analysis of Medical Device Data Applications for Designing Studies and R...
Meta Analysis of Medical Device Data Applications for Designing Studies and R...
 
The concept of Datalake with Hadoop
The concept of Datalake with HadoopThe concept of Datalake with Hadoop
The concept of Datalake with Hadoop
 
A big-data architecture for real-time analytics
A big-data architecture for real-time analyticsA big-data architecture for real-time analytics
A big-data architecture for real-time analytics
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3Studies in application of Augmented Reality in E Learning - Design Project 3
Studies in application of Augmented Reality in E Learning - Design Project 3
 
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
PARTNERS 2013 - Dr. Stefan Schwarz - Big Data Analytics as a Service
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 

Ähnlich wie Big Data Analytics for Real Time Systems

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Ähnlich wie Big Data Analytics for Real Time Systems (20)

Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White Paper
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Big Data Analytics for Real Time Systems

  • 1. BIG DATA ANALYTICS FOR REAL TIME SYSTEMS Kamalika Dutta Manasi Jayapal
  • 2. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 2 Big Data Analytics for Real Time Systems
  • 3. Overview 3 Big Data Analytics for Real Time Systems  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion
  • 4. Where does Big Data come from? 4 Big Data Analytics for Real Time Systems Courtesy: http://goo.gl/JWswfj
  • 5. What makes it Big Data? 5 Big Data Analytics for Real Time Systems Courtesy: Oracle VARIABILITY
  • 6. Evolution of Big Data 6 Big Data Analytics for Real Time Systems 1960s 1967 Automatic Data Compression 1997 Information Explosion Our Literature Survey!
  • 7. Overview 7 Big Data Analytics for Real Time Systems  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion
  • 8. Big Data Analytics “Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.“ 8 Big Data Analytics for Real Time Systems  Predictive Analysis  Text Analysis  Data Mining  Statistical Analysis Courtesy: smartdatacollective.com
  • 9. Sample Systems 9 Big Data Analytics for Real Time Systems
  • 10. Analytics & 3 V‘s 10 Big Data Analytics for Real Time Systems Courtesy: watalon.com
  • 11. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 11 Big Data Analytics for Real Time Systems
  • 12. Real Time Systems “A real-time system is one that processes information and produces a response within a specified time, else risk severe consequences, sometimes including failure.“ 12 Big Data Analytics for Real Time Systems  Telecommunication Systems  Anti-Lock Brakes in a Car  Air Traffic Control System  Weather Forecasting System Courtesy: yourdon.com
  • 13. Real-Time Analytics of Big Data 13 Big Data Analytics for Real Time Systems What is Happening? Kilobytes/ Sec Megabytes/ Sec Gigabytes  Terabytes Petabytes  Exabytes Seconds Milliseconds Minutes Minutes  Hours Big Data Real Time Courtesy: infochimps.com
  • 14. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 14 Big Data Analytics for Real Time Systems
  • 15. Challenges of Real Time Analytics 15 Big Data Analytics for Real Time Systems Expensive Complex Architecture, Batch Processing Semi and Unstructured Data: New Sources are unpredictable; Relational databases are not capable, leaving us hamstrung Market too Dynamic to Predict: Subscribers preferences change; competition adds acceleration to it Scalability: Requires sub-second response times; more than a single server can handle
  • 16. Thinking Beyond Hadoop! 16 Big Data Analytics for Real Time Systems Manage & store huge volume of any data Hadoop File System MapReduce Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Data WarehousingStructure and control data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM Understand and navigate federated big data sources Federated Discovery and Navigation Courtesy: IBM
  • 17. Our Solution  Do the impossible: Incorporate any kind of data  Scale Big: Scale without any complexity  Not Time Consuming: Seconds to Minutes  Real Time: Try to analyze data without expensive data warehouse loads 17 Big Data Analytics for Real Time Systems Powerful Analytics, In Place, In Real Time. Courtesy: slideshare.com
  • 18. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 18 Big Data Analytics for Real Time Systems
  • 19. In-Memory Computing In-memory computing primarily relies on keeping data in a server's RAM as a means of processing at faster speeds. It uses a type of middleware software that allows one to store data in RAM, across a cluster of computers, and process it in parallel. 19 Big Data Analytics for Real Time Systems Courtesy: Stratecast
  • 20. Stream Processing 20 Big Data Analytics for Real Time Systems Courtesy: EMC  Stream-processing systems operate on continuous data streams e.g., click streams on web pages, user request/query streams, monitoring events, notifications, etc.  Stream processing delivers real-time analytic processing on constantly changing data in motion.  Analyse first store later!
  • 21. Complex Event Processing Complex Event Processing (CEP) processes multiple event streams generated within the enterprise to construct data abstraction and identify meaningful patterns among those streams. 21 Big Data Analytics for Real Time Systems  Analytics across both real-time and historical data.  Real-time event capture, filtering, pattern detection, matching, and aggregation.
  • 22. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 22 Big Data Analytics for Real Time Systems
  • 23. Tools for Real Time Analytics Big Data is NOT new, the Tools ARE! 23 Big Data Analytics for Real Time Systems IBM InfoSphere Streams
  • 24. Kafka  A high performance distributed publish-subscribe messaging system.  Designed for processing of real time activity stream data.  Initially developed at LinkedIn, now part of Apache.  Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis and rendering of streaming data. 24 Big Data Analytics for Real Time Systems  Fast  Scalable  Durable  Fault-tolerant
  • 25. Storm  A highly distributed real-time computation system.  Acquired by Twitter.  Twitter claims, “Over a million tuples processed per second per node.”  Fast, Scalable, Reliable and Fault-tolerant. 25 Big Data Analytics for Real Time Systems  Stream: Unbounded sequence of tuples  Primitives  Spouts: Pull messages  Bolts: Perform core functions of stream computing Stream
  • 26. Spark Streaming  Was developed in the AMPLab at UC Berkeley.  In-memory computing capabilities deliver speed.  Low latency  High throughput  Fault tolerant  New programing model:  Discretized streams (Dstreams)  Resilient Distributed Datasets 26 Big Data Analytics for Real Time Systems Spark Streaming uses micro-batching to support continuous stream processing. It is an extension of Spark which is a batch-processing system. Courtesy: Apache Spark
  • 27. Spring XD (XD=eXtreme Data)  Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export.  Spring XD framework supports streams for the ingestion of event driven data from a source to a sink that passes through any number of processors. 27 Big Data Analytics for Real Time Systems Courtesy: Infoq
  • 28. Comparison of Tools (1) Spark Streaming Apache Storm Spring XD Definition A fast and general purpose cluster computing system. A distributed real-time computation system. A unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. Implemented in Scala Clojure Java Programming API Scala, Java, Python Java API and usable with any programing language. Java Development A full top level Apache project. Undergoing Apache project. Spring project by Pivotal. Processing Model Batch processing framework that also does micro-batching. Stream Processing Framework that processes and dispatches messages as soon as they arrive. Unified platform for stream processing. Fault Tolerance Recovery of lost work and restart of workers via the resource manager. Restart of Workers, Supervisors like nothing ever happened. Reassignment of work to container working. 28 Big Data Analytics for Real Time Systems
  • 29. Comparison of Tools (2) Spark Streaming Apache Storm Spring XD Data processing Messages are not lost and delivered once. (Small-scale batching) Keeps track of each and every record. Unacknowledged messages are retried until the container comes back. Use Cases • Combines batch and stream processing (Lambda Architecture). • Machine Learning: Improve performance of iterative algorithms • Power Real-time Dashboards. Prevention of: • securities fraud • compliance violations • security breaches • network outage • Stream tweets to Hadoop for sentiment analysis. • High throughput distributed data ingestion into HDFS from a variety of input sources. • Real-time analytics at ingestion time, e.g. gathering metrics and counting values. 29 Big Data Analytics for Real Time Systems
  • 30. Which tools are right for you? 30 Big Data Analytics for Real Time Systems
  • 31. Lambda Architecture 31 Big Data Analytics for Real Time Systems  In 2013, Nathan Marz and James Warren proposed the Lambda Architecture that attempts to provide a methodology to build a Big Data system.  Such a system would balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate pre-computed views, while simultaneously using real-time stream processing to provide dynamic views. Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013. Courtesy: Trivadis
  • 32. Lambda Architecture Example 32 Big Data Analytics for Real Time Systems Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013. Courtesy: Trivadis
  • 33. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 33 Big Data Analytics for Real Time Systems
  • 34. Use Cases 34 Big Data Analytics for Real Time Systems  Healthcare  Capture and analyze real-time data from medical monitors, alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues.  Analyze privacy-protected streams of medical device data to detect early signs of disease, identify correlations among multiple patients.  Finance  Analyze ticks, tweets, satellite imagery, weather trends, and any other type of data to inform trading algorithms in real time.  Apply fraud insights to take action in real time. Use analytics on streaming data to confidently differentiate legitimate actions, while preventing or interrupting suspicious actions and respond immediately to criminal patterns and activities.
  • 35. Use Cases 35 Big Data Analytics for Real Time Systems  Government  Identify social program fraud within seconds based on program history, citizen profile, and geospatial data.  Identify items or patterns for deeper investigation in Cyber- security.  Transport  Traffic managers can now respond quickly and accurately to relevant insights from real-time analytics drawn from data feeds and reports.  Telematics can provide data-in-motion such as vehicle speed, data relating to the transmission control system, braking, air bags, tire pressure and wiper speed as well as geospatial and current environmental conditions data. Hence, automotive companies can strengthen customer relationships
  • 36. Use Cases 36 Big Data Analytics for Real Time Systems  Telecommunication  Improve customer profitability analysis, end-to-end visibility for new product rollouts and real-time analysis for better the network customers.  Perform capacity planning for mobile networks as new high- bandwidth services are introduced. Improve customer experience.  Retail  See a product recurring in abandoned shopping carts. Run a promotion to close more sales of that product.  Evaluate sales performance in real time. Take measures now to achieve sales quotas.  An electric coupon delivery service sends e-mails to customers with recommendations matched to their interests derived from their location information, membership information, and information on nearby stores.
  • 37. 37 Big Data Analytics for Real Time Systems Courtesy: SAP
  • 38. Overview  Introduction  Big Data Analytics  Real Time Systems  Challenges of Real Time Analytics  Technologies  Tools  Use Cases  Future Work and Conclusion 38 Big Data Analytics for Real Time Systems
  • 39. Future Work  Increased Level of Merging  Application of Social and Digital Media  New Technologies  Further Development of Telemetric Data  Self Learning Systems  Complex Statistical Methods 39 Big Data Analytics for Real Time Systems
  • 40. Conclusion 40 Big Data Analytics for Real Time Systems Resources Privacy Security TimeCost “Consumer Data will be the biggest differentiator in the next two to three years. Whoever unlocks the reams of data and uses it strategically, will win” -Angela Ahrendts, CEO, Burberry ?
  • 41. 41 Big Data Analytics for Real Time Systems