SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
Big Data in Practice:
A Pragmatic approach to Adoption and
Value creation
Raj Nair
Data Practitioner and Consultant
Application Services
• Enterprise Resource
Planning (ERP)
• eCommerce /
eBusiness
• Enterprise App Dev
and ECM
• Legacy Support,
Systems Integration
and Conversion
Info Management
• Business Intelligence
and Analytics
• Dashboards,
Scorecards, Reporting
• MDM & Data
Modeling
• Data Marts, ODS,
ETL, Data Mining
IT Infrastructure
• IT Professional
Services
• Network
Administration &
Support
• dB Admin &
Maintenance
• Hosting and
Application Support
Process & Governance
• SDLC – Agile, TDD,
TFD Iterative
• Requirements
Analysis, PMP,
Change Management
and Automated QA
• Training & Knowledge
Transition and
Technical
Documentation
Content NOT FOR DISTRIBUTION: Property
of Raj Nair
Object Technology Solutions Inc. (OTSI) is a leading Information
Technology (IT) Services and Solutions company founded in 1999.
Clientele of Fortune 500 companies providing IT Solutions in the areas
of SDLC, Information Management, Business Intelligence, ERP,
eCommerce (B2B, B2C), Mobile, Enterprise Solutions, Middleware and
Infrastructure.
Technology Expertise and Experience
SAP - Business Objects, ERP, Microsoft - SharePoint, .Net, SQL Server,
Project Server, IBM - WebSphere, Cognos, Rational Suite, HP - Testing
tools, PPM
Data - Oracle, DB2, SQLServer, Teradata, OS – Windows, Unix (AIX, Linux,
HP-UX) etc., Open Source, Java
Certified Diversity Supplier in KS, MO and IL
1Big Data – The Original Use Case
2Mainstream Big Data
3Real World Use Cases and Applications
4Practical Adoption : Opportunity Identification
5Big Data 2.0 – What’s on the Horizon ?
6Conclusion
An Open Source Engine
The Year was 2002 ….
Doug Cutting Mike Caferella
Already Somebody’s Biz Problem
• Problem of Capacity & Scale
http://
The Perfect Storm
MapReduce Google File System
BigTable
MapReduce
Google File System
+
=
1Big Data – The Original Use Case
2Mainstream Big Data
3Real World Use Cases and Applications
4Practical Adoption : Opportunity Identification
5Big Data 2.0 – What’s on the Horizon ?
6Conclusion
Yes, But… We are not Google
Sears: Dynamic
Pricing
AT&T, quantifying
customer impact from
failed cell towers
Nokia: Holistic view of how
users interact with apps
across the world
Zions Bancorp:
Analyze 130 data
sources for fraud Cerner:
Detecting Health
Risks
Every Day Big Data
Reaching scale-up limits on your server
Represents tools, technologies, frameworks
for storage and processing at scale
Represents Opportunity
Every Day Big Data
Reaching scale-up limits on your server
Represents tools, technologies, frameworks
for storage and processing at scale
Represents Opportunity
Every Day Big Data
Reaching scale-up limits on your server
Represents tools, technologies, frameworks
for storage and processing at scale
Represents Opportunity
Big Data 1.0 – The Hadoop Ecosystem
Software library
Framework for large scale distributed processing
Ability to scale to thousands of computers
Design Principles
- Large Data Sets
Classic Hadoop MapReduce – Batch Processing
- Moving computation is cheaper than
moving data
- Hardware Failure, redundancy
This not “That”
Is Is Not
A Software Framework
(Storage/Compute)
A Database Management System
An appliance
Batch Processing For real-time or interaction
Write Once, Read Many Delete and Update or “ACID”
Unassuming of data formats Imposing any schemas
Open Source Lock In
Made for commodity servers
with local disks
Meant to be run in virtualized
environments
What is this you call data?
Unlearn current notion of “Data”
Native Data Source
HDFS
Storage and Archival
MapReduce
Programming Library
Crunch
Data Pipeline
processing HBase
Real time access
(low latency)
Pig
M/R Abstraction
Hive
Data Warehouse
Sqoop
Data Transfer
Flume
Data Streaming
(High
Latency)
Data Processing Workload Management
Data Movement
Purpose Use it for
HDFS Distributed Storage Raw data storage and archival
Flume Data Movement Continuous Streaming into HDFS
Sqoop Data Movement Data transfer from RDBMS to
HDFS/HBase
HBase Workload Mgmt Near real-time read/write access to
large data sets
Hive Workload Mgmt Analytical queries; data warehouse
Map
Reduce
Data Processing Low level custom code for data
processing
Crunch Data Processing (Java) Coding M/R pipelines, aggregations
Pig Data Processing Scripting language; similar to Crunch
A Powerful Paradigm
Storage Layer
Query
Engine
Processing
Engine
Metadata
Hadoop – Separate Layers
Multiple Query Engines
Data in Native format
Oracle SQL Server
Storage
Query
Storage
Query
Storage
Query
DB2
Tightly integrated Proprietary
Stacks, cannot free your data
1Big Data – The Original Use Case
2Mainstream Big Data
3Real World Use Cases and Applications
4Practical Adoption : Opportunity Identification
5Big Data 2.0 – What’s on the Horizon ?
6Conclusion
Opportunity…
Transform Data Processing
Exploration
Information Enrichment
Data Archival
Data Processing Pipeline
Several sources
Varying Frequencies
Varying Formats
Quality check
Validations, Scrubbing
Transformations/Rules
Prune app data sources
Discard/Archive
Data Processing
Engine
Data Warehouse
Data
Storage
ETL Engine
Data Warehouse
Data
Storage
ELT
Data Warehouse
Data
Storage
From Source to Business Value
Shoe-horning
Relational fit
Loading
Archiving /
Purging
Biz Rules
Validations
Scrubbing
Mapping
Transforms
Staging Distribution
Prep
Tuning
Data stores
Minutes/Hours
Subset of Data
Hours
Reliability
Sourcing
Missed SLAs = Biz Frustration
From Source to Business Value
Significantly more
data sources
Highly scalable,
significantly performant
data processing
New business value,
Faster time to value
Data Exploration
Large reservoir of data
Descriptive Statistics
Central Tendencies
Dispersion
Visualization
Surprise Me!
Data Exploration
Courtesy: Data Science Central
http://www.datasciencecentral.com/profiles/blogs/r-hadoop-data-analytics-heaven
Information Enrichment
Information Enrichment
Data Archival
Recycle Policy
Data Archival
Storage in Native Format
Redundancy , Replication
Easily accessible, inexpensive
1Big Data – The Original Use Case
2Mainstream Big Data
3Real World Use Cases and Applications
4Practical Adoption : Opportunity Identification
5Big Data 2.0 – What’s on the Horizon ?
6Conclusion
Practical Adoption
Big Data Technologies don’t solve all
problems
Leveraging existing investments
Complexities of existing systems
Proof of Concept
Use your own data – realistic results
Focus on very specific pain points
Know what you are going to measure
Opportunity Identification
Shoe-horning
Relational fit
Loading
Archiving /
Purging
Biz Rules
Validations
Scrubbing
Mapping
Staging Distribution
Prep
Tuning
Data stores
Minutes/Hours
Subset of Data
Hours
Reliability
Sourcing
Data Processing
Engine
Data Warehouse
Data
Storage
Data Processing
Engine
Data Warehouse
Data
Storage
Keep all your raw data
Cheaper Hardware
Low cost per byte $$
High value per byte
Offload from RDBMS
Improve scale, performance
Leverage existing tools
Hardware on a budget
Master:
- 12 cores
- 32 GB RAM
- 2 TB SATA Drives, 7.2K RPM
Workers:
- 4 Nodes
- 12 cores
- 16 GB RAM
- 4 TB SATA Drives each, 7.2 PRM
$5000
$5000 each
4-Port 10 Gig Switch - $1500 Grand Total < $30,000
Software costs ? - 0
NoSQL
Data Processing
Engine
Data Warehouse
Data
Storage
Keep all your raw data
Cheaper Hardware
NoSQL
Low cost per byte $$
High value per byte
Exploratory BI / Analysis
Data
Storage
Makes Data exploration practically cheaper and faster
Use existing visualization tools (Tableau or other)
Check for integration with R
Data Architecture
• Single Important factor
• Don’t miss technology trends
But ….
It’s more about the battle plan
1Big Data – The Road to Now
2Mainstream Big Data
3Real World Use Cases and Applications
4Practical Adoption : Opportunity Identification
5Big Data 2.0 – What’s on the Horizon ?
6Conclusion
What about that RDBMS?
Too many new data types
Extreme demands for loading & query access
Dynamic / just in time schemas
SQL is great, but why limit to relational?
Still great for transactional workloads
What’s Next?
Multi-tenant
Hadoop
SQL on Hadoop
Security
In-memory Real
Time
HDFS 2
Storage and Archival
MapReduce
(BATCH)
HBase
(online)
Hive
(interactive)
YARN
Yet Another Resource Manager
In-memory Search
Application Container - scale resource management
Map Reduce becomes “one type of application workload”
Multi-tenant Hadoop
SQL on Hadoop
Impala
Tez
Phoenix
• Cloudera
• MPP Engine
• HortonWorks
• SQL on Hive
• Apache
• SQL on HBase
In memory and Real Time
Spark
Storm
Apache Drill
• 100x faster than
M/R
• Event processing
• Low latency ad
hoc queries
• Interactive
queries at scale
Honorable (Proprietary) mentions
RDBMS on Hadoop
Complete Package
MPP, SMP, DataFlow
HortonWorks underneath
Manage, Analyze
machine generated
data
1Big Data – The Road to Now
2Mainstream Big Data
3Real World Use Cases and Applications
4Practical Adoption : Opportunity Identification
5Big Data 2.0 – What’s on the Horizon ?
6Conclusion
Where can I get Hadoop?
Distributors
Open Source Apache Project
And these guys…
Cloud
Conclusion
The Power & Paradigm of Distributed Computing
“Nativity” of Data – Unlearn old notions
Identify, understand your data processing pipeline
POC with a measurable, specific use case
Data Architecture – key to sustainable scalability
Stay informed
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

Weitere ähnliche Inhalte

Was ist angesagt?

Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCloudera, Inc.
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldCloudera, Inc.
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data IntegrationJeffrey T. Pollock
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...Chad Lawler
 

Was ist angesagt? (20)

Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWS
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
 

Ähnlich wie The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxAIMLSEMINARS
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesDenodo
 

Ähnlich wie The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios (20)

Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Getting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solvesGetting Started with Data Virtualization – What problems DV solves
Getting Started with Data Virtualization – What problems DV solves
 

Kürzlich hochgeladen

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

  • 1. Big Data in Practice: A Pragmatic approach to Adoption and Value creation Raj Nair Data Practitioner and Consultant
  • 2. Application Services • Enterprise Resource Planning (ERP) • eCommerce / eBusiness • Enterprise App Dev and ECM • Legacy Support, Systems Integration and Conversion Info Management • Business Intelligence and Analytics • Dashboards, Scorecards, Reporting • MDM & Data Modeling • Data Marts, ODS, ETL, Data Mining IT Infrastructure • IT Professional Services • Network Administration & Support • dB Admin & Maintenance • Hosting and Application Support Process & Governance • SDLC – Agile, TDD, TFD Iterative • Requirements Analysis, PMP, Change Management and Automated QA • Training & Knowledge Transition and Technical Documentation
  • 3. Content NOT FOR DISTRIBUTION: Property of Raj Nair Object Technology Solutions Inc. (OTSI) is a leading Information Technology (IT) Services and Solutions company founded in 1999. Clientele of Fortune 500 companies providing IT Solutions in the areas of SDLC, Information Management, Business Intelligence, ERP, eCommerce (B2B, B2C), Mobile, Enterprise Solutions, Middleware and Infrastructure. Technology Expertise and Experience SAP - Business Objects, ERP, Microsoft - SharePoint, .Net, SQL Server, Project Server, IBM - WebSphere, Cognos, Rational Suite, HP - Testing tools, PPM Data - Oracle, DB2, SQLServer, Teradata, OS – Windows, Unix (AIX, Linux, HP-UX) etc., Open Source, Java Certified Diversity Supplier in KS, MO and IL
  • 4. 1Big Data – The Original Use Case 2Mainstream Big Data 3Real World Use Cases and Applications 4Practical Adoption : Opportunity Identification 5Big Data 2.0 – What’s on the Horizon ? 6Conclusion
  • 5. An Open Source Engine The Year was 2002 …. Doug Cutting Mike Caferella
  • 6. Already Somebody’s Biz Problem • Problem of Capacity & Scale http://
  • 7. The Perfect Storm MapReduce Google File System BigTable
  • 9. 1Big Data – The Original Use Case 2Mainstream Big Data 3Real World Use Cases and Applications 4Practical Adoption : Opportunity Identification 5Big Data 2.0 – What’s on the Horizon ? 6Conclusion
  • 10. Yes, But… We are not Google Sears: Dynamic Pricing AT&T, quantifying customer impact from failed cell towers Nokia: Holistic view of how users interact with apps across the world Zions Bancorp: Analyze 130 data sources for fraud Cerner: Detecting Health Risks
  • 11. Every Day Big Data Reaching scale-up limits on your server Represents tools, technologies, frameworks for storage and processing at scale Represents Opportunity
  • 12. Every Day Big Data Reaching scale-up limits on your server Represents tools, technologies, frameworks for storage and processing at scale Represents Opportunity
  • 13. Every Day Big Data Reaching scale-up limits on your server Represents tools, technologies, frameworks for storage and processing at scale Represents Opportunity
  • 14. Big Data 1.0 – The Hadoop Ecosystem Software library Framework for large scale distributed processing Ability to scale to thousands of computers
  • 15. Design Principles - Large Data Sets Classic Hadoop MapReduce – Batch Processing - Moving computation is cheaper than moving data - Hardware Failure, redundancy
  • 16. This not “That” Is Is Not A Software Framework (Storage/Compute) A Database Management System An appliance Batch Processing For real-time or interaction Write Once, Read Many Delete and Update or “ACID” Unassuming of data formats Imposing any schemas Open Source Lock In Made for commodity servers with local disks Meant to be run in virtualized environments
  • 17. What is this you call data? Unlearn current notion of “Data” Native Data Source
  • 18. HDFS Storage and Archival MapReduce Programming Library Crunch Data Pipeline processing HBase Real time access (low latency) Pig M/R Abstraction Hive Data Warehouse Sqoop Data Transfer Flume Data Streaming (High Latency) Data Processing Workload Management Data Movement
  • 19. Purpose Use it for HDFS Distributed Storage Raw data storage and archival Flume Data Movement Continuous Streaming into HDFS Sqoop Data Movement Data transfer from RDBMS to HDFS/HBase HBase Workload Mgmt Near real-time read/write access to large data sets Hive Workload Mgmt Analytical queries; data warehouse Map Reduce Data Processing Low level custom code for data processing Crunch Data Processing (Java) Coding M/R pipelines, aggregations Pig Data Processing Scripting language; similar to Crunch
  • 20. A Powerful Paradigm Storage Layer Query Engine Processing Engine Metadata Hadoop – Separate Layers Multiple Query Engines Data in Native format Oracle SQL Server Storage Query Storage Query Storage Query DB2 Tightly integrated Proprietary Stacks, cannot free your data
  • 21. 1Big Data – The Original Use Case 2Mainstream Big Data 3Real World Use Cases and Applications 4Practical Adoption : Opportunity Identification 5Big Data 2.0 – What’s on the Horizon ? 6Conclusion
  • 23. Data Processing Pipeline Several sources Varying Frequencies Varying Formats Quality check Validations, Scrubbing Transformations/Rules Prune app data sources Discard/Archive
  • 27. From Source to Business Value Shoe-horning Relational fit Loading Archiving / Purging Biz Rules Validations Scrubbing Mapping Transforms Staging Distribution Prep Tuning Data stores Minutes/Hours Subset of Data Hours Reliability Sourcing Missed SLAs = Biz Frustration
  • 28. From Source to Business Value Significantly more data sources Highly scalable, significantly performant data processing New business value, Faster time to value
  • 29. Data Exploration Large reservoir of data Descriptive Statistics Central Tendencies Dispersion Visualization Surprise Me!
  • 30. Data Exploration Courtesy: Data Science Central http://www.datasciencecentral.com/profiles/blogs/r-hadoop-data-analytics-heaven
  • 34. Data Archival Storage in Native Format Redundancy , Replication Easily accessible, inexpensive
  • 35. 1Big Data – The Original Use Case 2Mainstream Big Data 3Real World Use Cases and Applications 4Practical Adoption : Opportunity Identification 5Big Data 2.0 – What’s on the Horizon ? 6Conclusion
  • 36. Practical Adoption Big Data Technologies don’t solve all problems Leveraging existing investments Complexities of existing systems
  • 37. Proof of Concept Use your own data – realistic results Focus on very specific pain points Know what you are going to measure
  • 38. Opportunity Identification Shoe-horning Relational fit Loading Archiving / Purging Biz Rules Validations Scrubbing Mapping Staging Distribution Prep Tuning Data stores Minutes/Hours Subset of Data Hours Reliability Sourcing
  • 40. Data Processing Engine Data Warehouse Data Storage Keep all your raw data Cheaper Hardware Low cost per byte $$ High value per byte Offload from RDBMS Improve scale, performance Leverage existing tools
  • 41. Hardware on a budget Master: - 12 cores - 32 GB RAM - 2 TB SATA Drives, 7.2K RPM Workers: - 4 Nodes - 12 cores - 16 GB RAM - 4 TB SATA Drives each, 7.2 PRM $5000 $5000 each 4-Port 10 Gig Switch - $1500 Grand Total < $30,000 Software costs ? - 0
  • 42. NoSQL Data Processing Engine Data Warehouse Data Storage Keep all your raw data Cheaper Hardware NoSQL Low cost per byte $$ High value per byte
  • 43. Exploratory BI / Analysis Data Storage Makes Data exploration practically cheaper and faster Use existing visualization tools (Tableau or other) Check for integration with R
  • 44. Data Architecture • Single Important factor • Don’t miss technology trends But …. It’s more about the battle plan
  • 45. 1Big Data – The Road to Now 2Mainstream Big Data 3Real World Use Cases and Applications 4Practical Adoption : Opportunity Identification 5Big Data 2.0 – What’s on the Horizon ? 6Conclusion
  • 46. What about that RDBMS? Too many new data types Extreme demands for loading & query access Dynamic / just in time schemas SQL is great, but why limit to relational? Still great for transactional workloads
  • 47. What’s Next? Multi-tenant Hadoop SQL on Hadoop Security In-memory Real Time
  • 48. HDFS 2 Storage and Archival MapReduce (BATCH) HBase (online) Hive (interactive) YARN Yet Another Resource Manager In-memory Search Application Container - scale resource management Map Reduce becomes “one type of application workload” Multi-tenant Hadoop
  • 49. SQL on Hadoop Impala Tez Phoenix • Cloudera • MPP Engine • HortonWorks • SQL on Hive • Apache • SQL on HBase
  • 50. In memory and Real Time Spark Storm Apache Drill • 100x faster than M/R • Event processing • Low latency ad hoc queries • Interactive queries at scale
  • 51. Honorable (Proprietary) mentions RDBMS on Hadoop Complete Package MPP, SMP, DataFlow HortonWorks underneath Manage, Analyze machine generated data
  • 52. 1Big Data – The Road to Now 2Mainstream Big Data 3Real World Use Cases and Applications 4Practical Adoption : Opportunity Identification 5Big Data 2.0 – What’s on the Horizon ? 6Conclusion
  • 53. Where can I get Hadoop? Distributors Open Source Apache Project And these guys… Cloud
  • 54. Conclusion The Power & Paradigm of Distributed Computing “Nativity” of Data – Unlearn old notions Identify, understand your data processing pipeline POC with a measurable, specific use case Data Architecture – key to sustainable scalability Stay informed