SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Analysing data analytics use cases
to understand purpose of big data ecosystem components
by
Purpose of any data platform (big / not big)
is to enable analytics on data
dataeaze
Why?
Different analytics use cases expect different set of
features from data platform
Components part of big data ecosystem
are made
to serve needed features of analytics use cases
dataeaze
Why?
So to understand data platform
to understand data platform components
It is necessary to know purpose
It is necessary to know needs of analytics use cases
which are served by data platform
dataeaze
Why?
Here
We take look at all categories of analytics use
cases on data platform
dataeaze
What?
Analytics data processing use case categories
dataeaze
What?
We analyse each use case as
Nature of data
processing in order to
serve this use case
Expectations from data
platform to enable
required data processing
dataeaze
What?
Static Reports
are summary reports prepared for the purpose of
giving status to decision makers
Example
Report for top management at end of day specifying
daily sales, transactions, revenue, total traffic
dataeaze
Nature of data processing
Static reports are
Scheduled to execute at fixed time interval,
Generate analysis reports for given time period,
Can execute on raw data directly or on intermediate store
dataeaze
Static Reports
Expectations from data platform
Scheduled data processing
Static reports are executed at predefined schedule repeatedly
Timely arrival of data
Generated reports should represent complete picture of given
timeframe, and should be generated before deadline.
Process raw data to get result
Capability to generate report from raw data if it cannot be
extracted from intermediate data form
dataeaze
Static Reports
Dashboard Reports
Dashboard is reporting user interface where user can interactively
choose his own view of data with limited set of filters.
Example
An e-commerce company having dashboard for sellers where
sellers get to know how much inventory sold across demographic,
across product categories, across time range.
dataeaze
Nature of data processing
Periodically keep on processing raw data to
bring it in form required by dashboards
Populate transformed data into interactive
store backend of dashboards
dataeaze
Dashboard
Expectations from data platform
ETL
To convert raw data in format required by dashboard
Scheduled data processing
Timely repeated executions of ETL jobs to populate
dashboards with latest updates
Interactive data store
Dashboard reports are interactive in nature, so backend store
is supposed to return results in near real time
dataeaze
Dashboard
Ad Hoc data analysis
This is for business queries which are raised as per need,
This is not scheduled and is executed one time whenever necessary
Example
A product manager wanting to know detail analysis about
customer behavior on a navigation panel, so as to define optimised
ad placements.
dataeaze
Nature of data processing
Steps to serve an ad hoc report,
Identify data sources which will satisfy given
request
Execute data processing (preferable sql like
query) on identified source
Load results in data representation tool
dataeaze
Ad Hoc
Expectations from data platform
data processing SQL engine
SQL query engine makes it easy to represent required analysis
in form of SQL query, saves analyst’s time
complex data processing
A platform which supports writing custom complex data
analysis, which is not possible through SQL
dataeaze
Ad Hoc
BI Reporting
Business Intelligence tools provide advanced general purpose
dashboards which host wide array of dimensions in backend data
store. User can define and save transformations, analysis queries
through BI tool and get back reports in tabular or graphical form.
Example
A BI report representing weekly sales stats across multiple regions
for previous 6 months. This report is once created and saved. Users
execute saved report whenever they want.
dataeaze
Nature of data processing
Scheduled ETL jobs to convert raw data to
required intermediate data form
Data is loaded to interactive SQL data stores
BI tools are connected to SQL data store as
backend
dataeaze
BI Reporting
Expectations from data platform
ETL
Raw data should be transformed to required format and get
loaded to SQL data warehouse
Scheduling of ETL
Defined ETL jobs should be scheduled to execute at fixed time
interval.
data processing SQL engine
SQL query engine makes it easy to extract data out, saves
time. BI tools can connect to this SQL data store.
dataeaze
BI Reporting
Data Processing for Applications
This is data processing done to provide feedback input to business
applications. Business applications take better decisions based on
latest data feedback.
Example
Ad servers getting periodically updated about latest minimum
ecpm to expect for an ad placement getting filled dynamically.
dataeaze
Nature of data processing
Complex data processing (machine learning) on raw
data
Scheduled data processing
Update result into interactive key-value store which get
fetched directly from applications
dataeaze
App data processing
Expectations from data platform
Capability to implement custom complex data processing
User should be able to easily define custom complex data processing
algorithms (like machine learning)
Scheduled data processing
Required for periodic execution of data processing jobs
dataeaze
App data processing
Real time stream data processing
It is analysing an event as soon as it happens. Sooner the analysis
better is value obtained from it.
Example
Stock ticker getting displayed on yahoo finance
dataeaze
Nature of data processing
As soon as event happens its log entry is
collected
All log entries are buffered, made available
for processing layer.
Pull records from message buffer and
perform processing on it.
dataeaze
Real time stream
Expectations from data platform
Scalable message buffer
A message buffer to keep received messages which are pulled
from this buffer for processing
Real time stream processing engine
To pull and process records in real time. Provide user ability to
define custom data processing.
dataeaze
Real time stream
Let us take a look at super set of expectations across
all use cases
dataeaze
Expectations from data platform
across all use cases
Summarise all
dataeaze
Super set of expectations
Expectation / Capability Use caseNeeded by
Complex data analysis using query
language
Scheduled ETL data processing
Data store for interactive data
analysis
Data ingestion with timely arrival of
data
Scalable message buffer to be
consumed by stream data processing
Streaming data processing platform
Static reports
ad hoc data analysis
BI reporting
Dashboard reports
app specific data processing
Real time stream data processing
Summarise all
dataeaze
Let’s conclude
dataeaze
We have identified common set of features expected
from data platform
by most of analytics use cases
Let us map these to data platform components
Conclude
dataeaze
Capabilities provided by data platform components
Expectation / Capability Data platform
component
Supported by
Complex data analysis using query
language
Scheduled ETL data processing
Data store for interactive data
analysis
Data ingestion with timely arrival of
data
Scalable message buffer to be
consumed by stream data processing
Streaming data processing platform
Data Ingestion
Batch data processing
Workflow scheduler
Interactive data stores
Message buffers
Real time stream
engine
Data Platform
Tools
Flume, Kafka, Scribe
Hive, Mapred
Oozie
Hbase, Spark, ..
Kafka
Storm, Spark
Conclude
dataeaze
Data platform components satisfying expectations
Conclude
dataeaze
Going backwords
Now you know about
Data platform components
capabilities supported by those
satisfying features of analytics use cases
Conclude
dataeaze
Thank You
dataeaze

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Spark Summit
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Qubole
 

Was ist angesagt? (20)

Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
 
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big Engagement
 
Customer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data PerspectiveCustomer Experience at Disney+ Through Data Perspective
Customer Experience at Disney+ Through Data Perspective
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
 

Ähnlich wie Analysing data analytics use cases to understand big data platform

Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
ashok kumar
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
Vipul Neema
 
Dataware housing
Dataware housingDataware housing
Dataware housing
work
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 

Ähnlich wie Analysing data analytics use cases to understand big data platform (20)

Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data mining
Data miningData mining
Data mining
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
 
Bi Dw Presentation
Bi Dw PresentationBi Dw Presentation
Bi Dw Presentation
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Dw Concepts
Dw ConceptsDw Concepts
Dw Concepts
 
SAP BI/BW
SAP BI/BWSAP BI/BW
SAP BI/BW
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Sap Bw 3.5 Overview
Sap Bw 3.5 OverviewSap Bw 3.5 Overview
Sap Bw 3.5 Overview
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
Click to Disk Troubleshooting with AppDynamics and OpsDataStore - AppSphere16
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 

Kürzlich hochgeladen

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Kürzlich hochgeladen (20)

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 

Analysing data analytics use cases to understand big data platform

  • 1. Analysing data analytics use cases to understand purpose of big data ecosystem components by
  • 2. Purpose of any data platform (big / not big) is to enable analytics on data dataeaze Why?
  • 3. Different analytics use cases expect different set of features from data platform Components part of big data ecosystem are made to serve needed features of analytics use cases dataeaze Why?
  • 4. So to understand data platform to understand data platform components It is necessary to know purpose It is necessary to know needs of analytics use cases which are served by data platform dataeaze Why?
  • 5. Here We take look at all categories of analytics use cases on data platform dataeaze What?
  • 6. Analytics data processing use case categories dataeaze What?
  • 7. We analyse each use case as Nature of data processing in order to serve this use case Expectations from data platform to enable required data processing dataeaze What?
  • 8. Static Reports are summary reports prepared for the purpose of giving status to decision makers Example Report for top management at end of day specifying daily sales, transactions, revenue, total traffic dataeaze
  • 9. Nature of data processing Static reports are Scheduled to execute at fixed time interval, Generate analysis reports for given time period, Can execute on raw data directly or on intermediate store dataeaze Static Reports
  • 10. Expectations from data platform Scheduled data processing Static reports are executed at predefined schedule repeatedly Timely arrival of data Generated reports should represent complete picture of given timeframe, and should be generated before deadline. Process raw data to get result Capability to generate report from raw data if it cannot be extracted from intermediate data form dataeaze Static Reports
  • 11. Dashboard Reports Dashboard is reporting user interface where user can interactively choose his own view of data with limited set of filters. Example An e-commerce company having dashboard for sellers where sellers get to know how much inventory sold across demographic, across product categories, across time range. dataeaze
  • 12. Nature of data processing Periodically keep on processing raw data to bring it in form required by dashboards Populate transformed data into interactive store backend of dashboards dataeaze Dashboard
  • 13. Expectations from data platform ETL To convert raw data in format required by dashboard Scheduled data processing Timely repeated executions of ETL jobs to populate dashboards with latest updates Interactive data store Dashboard reports are interactive in nature, so backend store is supposed to return results in near real time dataeaze Dashboard
  • 14. Ad Hoc data analysis This is for business queries which are raised as per need, This is not scheduled and is executed one time whenever necessary Example A product manager wanting to know detail analysis about customer behavior on a navigation panel, so as to define optimised ad placements. dataeaze
  • 15. Nature of data processing Steps to serve an ad hoc report, Identify data sources which will satisfy given request Execute data processing (preferable sql like query) on identified source Load results in data representation tool dataeaze Ad Hoc
  • 16. Expectations from data platform data processing SQL engine SQL query engine makes it easy to represent required analysis in form of SQL query, saves analyst’s time complex data processing A platform which supports writing custom complex data analysis, which is not possible through SQL dataeaze Ad Hoc
  • 17. BI Reporting Business Intelligence tools provide advanced general purpose dashboards which host wide array of dimensions in backend data store. User can define and save transformations, analysis queries through BI tool and get back reports in tabular or graphical form. Example A BI report representing weekly sales stats across multiple regions for previous 6 months. This report is once created and saved. Users execute saved report whenever they want. dataeaze
  • 18. Nature of data processing Scheduled ETL jobs to convert raw data to required intermediate data form Data is loaded to interactive SQL data stores BI tools are connected to SQL data store as backend dataeaze BI Reporting
  • 19. Expectations from data platform ETL Raw data should be transformed to required format and get loaded to SQL data warehouse Scheduling of ETL Defined ETL jobs should be scheduled to execute at fixed time interval. data processing SQL engine SQL query engine makes it easy to extract data out, saves time. BI tools can connect to this SQL data store. dataeaze BI Reporting
  • 20. Data Processing for Applications This is data processing done to provide feedback input to business applications. Business applications take better decisions based on latest data feedback. Example Ad servers getting periodically updated about latest minimum ecpm to expect for an ad placement getting filled dynamically. dataeaze
  • 21. Nature of data processing Complex data processing (machine learning) on raw data Scheduled data processing Update result into interactive key-value store which get fetched directly from applications dataeaze App data processing
  • 22. Expectations from data platform Capability to implement custom complex data processing User should be able to easily define custom complex data processing algorithms (like machine learning) Scheduled data processing Required for periodic execution of data processing jobs dataeaze App data processing
  • 23. Real time stream data processing It is analysing an event as soon as it happens. Sooner the analysis better is value obtained from it. Example Stock ticker getting displayed on yahoo finance dataeaze
  • 24. Nature of data processing As soon as event happens its log entry is collected All log entries are buffered, made available for processing layer. Pull records from message buffer and perform processing on it. dataeaze Real time stream
  • 25. Expectations from data platform Scalable message buffer A message buffer to keep received messages which are pulled from this buffer for processing Real time stream processing engine To pull and process records in real time. Provide user ability to define custom data processing. dataeaze Real time stream
  • 26. Let us take a look at super set of expectations across all use cases dataeaze
  • 27. Expectations from data platform across all use cases Summarise all dataeaze
  • 28. Super set of expectations Expectation / Capability Use caseNeeded by Complex data analysis using query language Scheduled ETL data processing Data store for interactive data analysis Data ingestion with timely arrival of data Scalable message buffer to be consumed by stream data processing Streaming data processing platform Static reports ad hoc data analysis BI reporting Dashboard reports app specific data processing Real time stream data processing Summarise all dataeaze
  • 30. We have identified common set of features expected from data platform by most of analytics use cases Let us map these to data platform components Conclude dataeaze
  • 31. Capabilities provided by data platform components Expectation / Capability Data platform component Supported by Complex data analysis using query language Scheduled ETL data processing Data store for interactive data analysis Data ingestion with timely arrival of data Scalable message buffer to be consumed by stream data processing Streaming data processing platform Data Ingestion Batch data processing Workflow scheduler Interactive data stores Message buffers Real time stream engine Data Platform Tools Flume, Kafka, Scribe Hive, Mapred Oozie Hbase, Spark, .. Kafka Storm, Spark Conclude dataeaze
  • 32. Data platform components satisfying expectations Conclude dataeaze
  • 33. Going backwords Now you know about Data platform components capabilities supported by those satisfying features of analytics use cases Conclude dataeaze