Big Data Adoption Status

•

0 likes•1,080 views

Xpand IT

Presentation about Big Data Adoption Status by Nuno Barreto - Partner & Big Data Lead @Xpand IT

Technology

Big Data Adoption Status
Nuno Barreto
Partner | Big Data Lead

AGENDA
• STATUS CHECK
• TECHNOLOGY HIGHLIGHTS
• COOL STUFF WE’RE DOING
• LOOKING AHEAD

from Batch to Near-Real-Time
from Analytics to Operational

ACTIVE INDUSTRIES
• RETAIL
• UTILITIES
• TELCO
• MOBILITY
• FINANCIAL SERVICES
• E-BUSINESS

CLUSTER SIZES & TOPOLOGIES
• 5 TO TENS OF NODES
• COUPLE HUNDRED GiB TO DUZEN
TiB OF RAM
• COUPLE TiB TO HUNDRED TiB OF
RAW SPACE
• ON-PREM AND CLOUD

HBASE AND HDFS/PARQUET
• HDFS/PARQUET IS
GREAT FOR
LARGE SCANS,
BUT…
• HBASE IS GREAT
FOR INDEXED
READS/WRITES,
BUT…

COMPLEX ARCHITECTURES
source: http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/

THE “WORST” OF BOTH WORLDS
• KUDU IS SLIGHTLY
WORSE THAN
HBASE FOR
INDEXED OPS
• KUDU SHOULD BE
NO MORE THAN 2x
WORSE THAN
PARQUET FOR
LARGE SCANS

KUDU BENCHMARKS – INGEST RATE
HIGHER IS BETTER
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

KUDU BENCHMARKS – RANDOM LOOKUP
LOWER IS BETTER
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

KUDU BENCHMARKS – SCAN RATE
HIGHER IS BETTER
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

KUDU BENCHMARKS – SUMMARY
source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

ZWOOX - INGESTION FRAMEWORK
• LOW LATENCY, HIGH THROUGHPUT, HIGHLY
AVAILABLE
• NEAR-REAL TIME for KUDU & BATCH for HIVE/IMPALA
• BATCH & STREAMING REPLICATIONS
• AUTOMATIC CONSOLIDATION INTO HDFS BASED
TABLES
• MULTIPLE TABLEs WITH SPECIFIC PARTITIONING
SCHEME
• IN-LINE PROCESSING
• AUTOMATIC AUDIT DATA

MESSAGE BUS
• JMS SEMANTICS IS LIMITED
• JMS SCALING IS HARD
• JMS PERFORMANCE IS POOR COMPARED TO KAFKA

IOT – VALUE IS IN THE CORE
DEVICE CENTRIC
VS
GATEWAY CENTRIC
VS
CENTRALIZED
PLATFORM
CENTRIC

What's hot

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks

Surviving the Hadoop RevolutionDataWorks Summit/Hadoop Summit

FordDataWorks Summit/Hadoop Summit

LinkedIn2DataWorks Summit/Hadoop Summit

Smart data for a predictive bankDataWorks Summit/Hadoop Summit

The Life of an Internet of Things ElectronDataWorks Summit/Hadoop Summit

Big Data Ecosystem- Impetus TechnologiesImpetus Technologies

Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Databricks

NYC Cassandra March 13- lighting talkSanjay Sharma

Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Sanjay Sharma

Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic

Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media

DATA @ NFLX (Tableau Conference 2014 Presentation)Blake Irvine

Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformRackspace

Hadoop for Humans: Introducing SnapReduce 2.0SnapLogic

Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use CasesSanjay Sharma

Streaming Analytics for IoT with Apache SparkImpetus Technologies

Webinar: Big Data Integration - Why Same Old, Same Old Won't Cut ItSnapLogic

In-Memory Computing Webcast. Market Predictions 2017SingleStore

NetApp IT’s Tiered Archive Approach for Active IQNetApp

What's hot (20)

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

Surviving the Hadoop Revolution

Ford

LinkedIn2

Smart data for a predictive bank

The Life of an Internet of Things Electron

Big Data Ecosystem- Impetus Technologies

Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...

NYC Cassandra March 13- lighting talk

Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...

Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014

Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...

DATA @ NFLX (Tableau Conference 2014 Presentation)

Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform

Hadoop for Humans: Introducing SnapReduce 2.0

Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases

Streaming Analytics for IoT with Apache Spark

Webinar: Big Data Integration - Why Same Old, Same Old Won't Cut It

In-Memory Computing Webcast. Market Predictions 2017

NetApp IT’s Tiered Archive Approach for Active IQ

Viewers also liked

Customer Sucess Story: Big Data in EDP Xpand IT

Live Seminar Cloudera & Big Data Ecosystem Xpand IT

Design Thinking for Big Data Applications Xpand IT

Customer Success Story: Brisa Xpand IT

Racing to Big Data in the Cloud with Microsoft AzureXpand IT

Xamarin Experience London: What is your Mobile Strategy?Xpand IT

Service desk integrationsXpand IT

Deliver Fast and Reliably with Dev Ops and AtlassianXpand IT

Data Mashups for AnalyticsPentaho

Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaData Driven Innovation

Big Data ChallengeXpand IT

Xamarin - Como Otimizar o Desenvolvimento Mobile MultiplataformaXpand IT

Xamarin Experience London: CA Mobile Banking with XamarinXpand IT

Jira as a Test Management ToolXpand IT

Xpand IT - Tableau Lisbon Seminar 2016Xpand IT

The Big Data ChallengeXpand IT

JBoss SOA Platform - OverviewXpand IT

Big Data and Advanced AnalyticsMcKinsey on Marketing & Sales

McKinsey presentationConstructingeq

Big data pptNasrin Hussain

Viewers also liked (20)

Customer Sucess Story: Big Data in EDP

Live Seminar Cloudera & Big Data Ecosystem

Design Thinking for Big Data Applications

Customer Success Story: Brisa

Racing to Big Data in the Cloud with Microsoft Azure

Xamarin Experience London: What is your Mobile Strategy?

Service desk integrations

Deliver Fast and Reliably with Dev Ops and Atlassian

Data Mashups for Analytics

Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna

Big Data Challenge

Xamarin - Como Otimizar o Desenvolvimento Mobile Multiplataforma

Xamarin Experience London: CA Mobile Banking with Xamarin

Jira as a Test Management Tool

Xpand IT - Tableau Lisbon Seminar 2016

The Big Data Challenge

JBoss SOA Platform - Overview

Big Data and Advanced Analytics

McKinsey presentation

Big data ppt

Similar to Big Data Adoption Status

Hadoop summit-ams-2014-04-03SDanzanvilliersCriteo

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering

Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.

The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...Platfora

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.

From Zero to Data Flow in Hours with Apache NiFiDataWorks Summit/Hadoop Summit

Containers and Big Data DataWorks Summit

Hortonworks and Platfora in Financial Services - WebinarHortonworks

MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies

Is Hadoop the Demise of Data Warehousing? The Impact of Hadoop/Big Data on BI...Senturus

Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events

Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA

HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopMapR Technologies

Conflict in the Cloud – Issues & Solutions for Big DataHalo BI

Does Big Data Spell Big Costs- Impetus WebinarImpetus Technologies

SplunkLive! Customer Presentation – HarrisSplunk

HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems

Hadoop is not an Island in the EnterpriseDataWorks Summit

Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.

Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely

Similar to Big Data Adoption Status (20)

Hadoop summit-ams-2014-04-03

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013

Accelerate Analytics and ML in the Hybrid Cloud Era

The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil B...

Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...

From Zero to Data Flow in Hours with Apache NiFi

Containers and Big Data

Hortonworks and Platfora in Financial Services - Webinar

MapR on Azure: Getting Value from Big Data in the Cloud -

Is Hadoop the Demise of Data Warehousing? The Impact of Hadoop/Big Data on BI...

Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...

Hadoop and NoSQL joining forces by Dale Kim of MapR

HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop

Conflict in the Cloud – Issues & Solutions for Big Data

Does Big Data Spell Big Costs- Impetus Webinar

SplunkLive! Customer Presentation – Harris

HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...

Hadoop is not an Island in the Enterprise

Accelerate Analytics and ML in the Hybrid Cloud Era

Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...

Recently uploaded

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Install Stable Diffusion in windows machinePadma Pradeep

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Gen AI in Business - Global Trends Report 2024.pdfAddepto

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Story boards and shot lists for my a level piececharlottematthew16

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Recently uploaded (20)

Search Engine Optimization SEO PDF for 2024.pdf

Install Stable Diffusion in windows machine

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

DevEX - reference for building teams, processes, and platforms

My INSURER PTE LTD - Insurtech Innovation Award 2024

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Anypoint Exchange: It’s Not Just a Repo!

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Gen AI in Business - Global Trends Report 2024.pdf

SAP Build Work Zone - Overview L2-L3.pptx

Are Multi-Cloud and Serverless Good or Bad?

Developer Data Modeling Mistakes: From Postgres to NoSQL

Advanced Test Driven-Development @ php[tek] 2024

SIP trunking in Janus @ Kamailio World 2024

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Story boards and shot lists for my a level piece

Powerpoint exploring the locations used in television show Time Clash

Big Data Adoption Status

2. Big Data Adoption Status Nuno Barreto Partner | Big Data Lead

3. AGENDA • STATUS CHECK • TECHNOLOGY HIGHLIGHTS • COOL STUFF WE’RE DOING • LOOKING AHEAD

4. STATUS CHECK

5. HADOOP – THE ULTIMATE DATA TOOLKIT

6. HADOOP – THE ECO-SYSTEM

7. from Batch to Near-Real-Time from Analytics to Operational

8. ACTIVE INDUSTRIES • RETAIL • UTILITIES • TELCO • MOBILITY • FINANCIAL SERVICES • E-BUSINESS

9. CLUSTER SIZES & TOPOLOGIES • 5 TO TENS OF NODES • COUPLE HUNDRED GiB TO DUZEN TiB OF RAM • COUPLE TiB TO HUNDRED TiB OF RAW SPACE • ON-PREM AND CLOUD

10.

11. TECHNOLOGY HIGHLIGHTS

12.

13. HBASE AND HDFS/PARQUET • HDFS/PARQUET IS GREAT FOR LARGE SCANS, BUT… • HBASE IS GREAT FOR INDEXED READS/WRITES, BUT…

14. COMPLEX ARCHITECTURES source: http://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-storage-for-fast-analytics-on-fast-data/

15. THE “WORST” OF BOTH WORLDS • KUDU IS SLIGHTLY WORSE THAN HBASE FOR INDEXED OPS • KUDU SHOULD BE NO MORE THAN 2x WORSE THAN PARQUET FOR LARGE SCANS

16. KUDU BENCHMARKS – INGEST RATE HIGHER IS BETTER source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

17. KUDU BENCHMARKS – RANDOM LOOKUP LOWER IS BETTER source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

18. KUDU BENCHMARKS – SCAN RATE HIGHER IS BETTER source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

19. KUDU BENCHMARKS – SUMMARY source: http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/

20. COOL STUFF WE’RE DOING

21. NEAR–REAL–TIME DATA LAKE

22. BY

23. ZWOOX - INGESTION FRAMEWORK • LOW LATENCY, HIGH THROUGHPUT, HIGHLY AVAILABLE • NEAR-REAL TIME for KUDU & BATCH for HIVE/IMPALA • BATCH & STREAMING REPLICATIONS • AUTOMATIC CONSOLIDATION INTO HDFS BASED TABLES • MULTIPLE TABLEs WITH SPECIFIC PARTITIONING SCHEME • IN-LINE PROCESSING • AUTOMATIC AUDIT DATA

24. MESSAGE BUS • JMS SEMANTICS IS LIMITED • JMS SCALING IS HARD • JMS PERFORMANCE IS POOR COMPARED TO KAFKA

25. LOOKING AHEAD (two trends)

26. IOT – VALUE IS IN THE CORE DEVICE CENTRIC VS GATEWAY CENTRIC VS CENTRALIZED PLATFORM CENTRIC

27. DATA SCIENCE

28. CLOUDERA WORKBENCH

29. THANK YOU