SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Downloaden Sie, um offline zu lesen
© 2014 MapR Technologies 1
© MapR Technologies, confidential
Big Data Everywhere
Tel Aviv, June 2014
Building HBase Applications
© 2014 MapR Technologies 2
Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr #DataIsreal
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR
© 2014 MapR Technologies 4
Topics For Today
• What is special about HBase applications?
• Example: Time Series Database
• Example: Web-fronted Dashboard
• Questions and Discussion
© 2014 MapR Technologies 5
Disks have gotten slower
then: Fujitsu Eagle
380MB / 1.8MB / s = 221 s
now: WD4001FAEX
4TB / 154MB / s = 25 k s = 7.2 hours
© 2014 MapR Technologies 6
Memory has gotten smaller
then:
64MB / 1 x Fujitsu Eagle = 0.168
now:
128GB / 12 x WD4001FAEX = 2.7 x 10-3
© 2014 MapR Technologies 7
The Task Has Changed
The primary job for databases
is to discard data
(speaking historically)
© 2014 MapR Technologies 8
Modern Database Goals
• Use modern disks fully
• Work around lack of memory
• Retain all the data
© 2014 MapR Technologies 9
Modern Database Methods
• Use large sequential I/O transfers
• Use many machines
• Handle write-mostly work load
• Store related data elements together
• Relax constraints (ACID? Schema? Indexes?)
© 2014 MapR Technologies 10
How Does This Work?
• Split data into tablets
– Store tablets on many computers
• Allow many columns
– Only store data for live columns
– Allows for innovative data arrangement
• Allow applications to encode data
• Buffer data to allow updates to be organized before writing
• Previously written data may be merged periodically to improve
organization, but avoid rewrite storms
© 2014 MapR Technologies 11
MapR, HBase Table Architecture
• Tables are divided into key ranges (tablets or regions)
• Tablets are served automatically by MapR FS or region-server
• Columns are divided into access groups (column families)
CF1 CF2 CF3 CF4 CF5
R1
R2
R3
R4
© 2014 MapR Technologies 12
MapR, HBase Tables are Divided into Regions
• A table is divided into one or more regions
• Each region is 1-5 GB in size, contains (start and end keys)
• Regions contained within a single container (MapR)
• Initially one Region per Table
– Support pre-split tables using HBase APIs and HBase shell
– You can also pre-splits with known access patterns
• Important to spread regions across all available nodes
• Region splits into 2 different regions when it becomes too large
– Split is very quick
• Uses MapR FS to spread data, manage space (MapR)
© 2014 MapR Technologies 13
RDBMS versus MapR Tables
RDBMS tables MapR tables
ACID Row based ACID
Sharding/Partitioning Distributed Regions
SQL builtin Key lookup/key range scans
No Unix file metadata operation on
tables
Unix File metadata operation on
tables
Indexes (B+Tree, R-Tree) row-key, no built in secondary index
Primitive data types Byte arrays
Inplace update Cell versioning
© 2014 MapR Technologies 14
HBase versus MapR Tables
HBase tables MapR tables
Table/region/column family Table/region/column family
Distributed Regions Distributed Regions
Wide variation of latency Consistent latency
No Unix file metadata operation on
tables
Unix File metadata operation on
tables
Limited column family count 64 column families
Fuzzy snapshots Precise snapshots
Replication API Not supported
© 2014 MapR Technologies 16
Column Families
• Columns are defined per row
• Columns in HBase and MapR tables are grouped into column families
– MapR supports up-to 64 column families
– in-memory column-families
• Grouping should facilitate common access patterns, not just reflect
logical connection
– Columns written or read together make good column families
– Rarely needed columns should probably be in own column family
– Radically different cardinality may suggest separate column families
• Physically, all column family members are stored together on the file
system
– This makes access fast
© 2014 MapR Technologies 19
Technical Summary
• Tables are split into tablets or regions
• Regions contain column families, stored together
• Columns are only stored where needed
• Many, many, many columns are allowed
• Rows are accessed by a single key, filters are allowed on scans
• Values are byte arrays
© 2014 MapR Technologies 20
Technical Summary
• Tables are split into tablets or regions
• Regions contain column families, stored together
• Columns are only stored where needed
• Many, many, many columns are allowed
• Rows are accessed by a single key, filters are allowed on scans
• Values are byte arrays
• You get low-level access to control speed, allow scaling
• This is not your father’s RDBMS!
© 2014 MapR Technologies 21
Cost/Benefits Summary
• Pro
– Predictable disk layout
– Flexibility in key design, data format
– Allows nested, document or relational models
– Superb scalability, speed are possible
• Con
– Technically more demanding than a small Postgres instance
– Hotspot risk requires proper design
– Latency can be highly variable (for vanilla HBase, not MapR)
© 2014 MapR Technologies 22
© 2014 MapR Technologies 23
Let’s build something!
© 2014 MapR Technologies 24
© 2014 MapR Technologies 25
Time Series Database Example
• Let’s build a time series database
• See http://opentsdb.net/
© 2014 MapR Technologies 26
The Problem
• We have about 100,000 metrics with an average of about 10,000
distinct measurements per second
• Some things change over seconds, some over hours
• We want to query over time ranges from seconds to months to
produce plots of time window aggregates
– What is max latency per hour for last three weeks on all web tier
machines?
© 2014 MapR Technologies 27
Non-solution
• Munin, RDF, Ganglia, Graphite all discard data
– Remember the primary job of a classic database?
• We want full resolution for historical comparisons
• Size is no longer an issue, big has gotten quite small
– 1012 data points << 10 nodes @ 12 x 4 TB per node
– We can piggy back on another cluster
© 2014 MapR Technologies 28
Why is This Hard?
• 10,000 points per second x 84,600 seconds/day x 1000 days
• That is nearly a trillion data points! (0.8 x 1012)
• Queries require summarizing hundreds of thousands of points in
200 ms
• We want the solution to be low impact and inexpensive
– And be ready to scale by several orders of magnitude
© 2014 MapR Technologies 29
Step 1: Compound keys give
control over layout
© 2014 MapR Technologies 30
Key Composition #1
Time Metric Node Value
10667 load1m n1 1.3
10667 load5m n1 1.0
10668 load1m n2 0.1
10668 load5m n2 0.1
10727 load1m n1 0.9
10727 load5m n1 0.9
All samples go to a
single machine for a
long time
© 2014 MapR Technologies 31
Key Composition #2
Metric Time Node Value
load1m 10667 n1 1.3
load1m 10668 n2 0.1
load1m 10727 n1 0.9
load5m 10667 n1 1.0
load5m 10668 n2 0.1
load5m 10727 n1 0.9
All samples for same
metric go to a single
machine
Queries commonly focus
on one or a few metrics
at a time
© 2014 MapR Technologies 32
Key Composition #3
Node Metric Time Value
n1 load1m 10667 1.3
n1 load1m 10727 0.9
n1 load5m 10667 1.0
n1 load5m 10727 0.9
n2 load1m 10668 0.1
n2 load5m 10668 0.1
All samples for same
node go to a single
machine
Unfortunately, queries
commonly require data
for a single metric, but
many machines
© 2014 MapR Technologies 33
Lesson: Pick door #2
Maximize density of desired data
© 2014 MapR Technologies 34
Protip: Add key-value pairs to
end of key for additional tags
© 2014 MapR Technologies 35
© 2014 MapR Technologies 36
Step 1: Relational not
required
© 2014 MapR Technologies 37
Tall and Skinny? Or Wide and Fat?
Metric Time Node Value
© 2014 MapR Technologies 38
Tall and Skinny? Or Wide and Fat?
Metric Window Node +1
7
+1
8
+7
7
+7
8
+13
7
load1
m
13:00 n1 1.3 0.9
load1
m
13:00 n2 0.1 0.1
load5
m
13:00 n1 1.0 0.9
load5
m
13:00 n2 0.1
Filtering overhead is non-trivial …
wide and fat has to filter fewer rows
© 2014 MapR Technologies 39
Or non-relational?
Metric Window Node Compressed
load1
m
13:00 n1 {t:[17,77],v:[1.3,0.9]}
load1
m
13:00 n2 {t:[18,78],v:[0.1,0.1]}
load5
m
13:00 n1 {t:[17,77],v:[1.0,0.9]}
load5
m
13:00 n2 {t:[18,78],v:[0.1,0.1]}
Cleanup process can sweep up old
values after the hour is finished.
Blob data can be compressed using
fancy tricks.
© 2014 MapR Technologies 40
Lesson: Schemas can be very
flexible and can even
change on the fly
© 2014 MapR Technologies 41
© 2014 MapR Technologies 42
Step 3: Sequential reads hide
many sins if density is high
© 2014 MapR Technologies 43
Which Queries? Which Data?
• Most common is 1-3 metrics for 5-100% of nodes based on tags
– Which nodes have unusual load?
– Do any nodes stand out for response latency?
– Alarm bots
• Also common to get 5-20 metrics for single node
– Render dashboard for particular machine
• Result density should be high for all common queries
• Most data is never read but is retained as insurance policy
– Can’t predict what you will need to diagnose future failure modes
© 2014 MapR Technologies 44
Lesson: Have to know the
queries to design in
performance
© 2014 MapR Technologies 45
© 2014 MapR Technologies 46
Step 3: Time to Level up!
© 2014 MapR Technologies 47
What About the Major Leagues?
• Industrial sensors can dwarf current TSDB loads
– Assume 100 (drill rigs | generators | heating systems | turbines)
– Each has 10,000 sensors
– Each is sampled once per second
– Total sample rate is 106 samples / s (100x faster than before)
• Industrial applications require extensive testing at scale
– Want to load years of test data in a few days
– Sample rate for testing is 100 x 106 samples / s (10,000x faster)
• And you thought the first example was extreme
© 2014 MapR Technologies 48
What About the Major Leagues? World Cup?
• Industrial sensors can dwarf current TSDB loads
– Assume 100 (drill rigs | generators | heating systems | turbines)
– Each has 10,000 sensors
– Each is sampled once per second
– Total sample rate is 106 samples / s (100x faster than before)
• Industrial applications require extensive testing at scale
– Want to load years of test data in a few days
– Sample rate for testing is 100 x 106 samples / s (10,000x faster)
• And you thought the first example was extreme
© 2014 MapR Technologies 49
Rough Design Outline
• Want to record and query 100M samples / s at full resolution
• Each MapR node serving tables can do ~20-40k inserts per
second @ 1kB/record, ~60k inserts/s @ 100B / record
• Each MapR node serving files can insert at ~1GB / s
• We can buffer data in file system until we get >1000 samples per
metric
• Once data is sufficient, we do one insert per metric per hour
– 3600x fewer inserts
© 2014 MapR Technologies 50
Data Flow – High Speed TSDB
Web tier
Data
catcherData
catcherData
catcher
Flat
files
Consolidator
Consolidator
Consolidator
Consolidator
ConsolidatorMeasurement
Systems
TSDB
tables
Browser
© 2014 MapR Technologies 51
Quick Results
• Estimated data volumes
– 100 M p / s / (3600 p/row) = 28 k row / s
• Estimated potential throughput
– 4 nodes @ 10 k row / s = 40 k row / s = 144 M p / s
• Observed throughput for 2 day prototype
– 1 feeder node, 4 table nodes, 10 M p / s
– Feeder node CPU bound, table nodes < 5% CPU, disk ~ 0
• Simple prototype is limited by generator
• Compare to observed max 100 k p / s on SQL Server
– “Only” 100x faster
© 2014 MapR Technologies 52
Lesson: Very high rates look
plausible with hybrid design
© 2014 MapR Technologies 53
© 2014 MapR Technologies 54
Quick Example: Xactly
Dashboard
© 2014 MapR Technologies 55
MapR’s higher performance solution is far more efficient and cost-effective.
“I can do something on a 10-node cluster that might require a 20-node cluster from a
different Hadoop vendor”.
CTO & SVP of Engineering
Xactly: Sales Performance Management
Xactly Insights: Delivering incentive compensation data for sales operations
• Provide cloud-based performance management solutions to sales ops
teams to help them design/manage optimal incentive compensation plans.
• RDBMS-based platform was unable to scale in a cost effective way
• Stringent performance and responsiveness expectations of users in a SaaS
application
• Highly responsive application that scaled to a growing customer base
• Multi-tenancy capabilities in MapR helped ensure each customer’s data
was isolated and separate from other customers in the SaaS application
• MapR delivered on Xactly’s need for scale and low operational overhead
OBJECTIVES
CHALLENGES
SOLUTION
Business
Impact
© 2014 MapR Technologies 56
Dashboard Problem
• Hundreds to thousands of customers have hundreds to
thousands of sales team members
• Want to be able to compare stats for each team member, team,
company against relevant roll-ups
• Prototyped system in RDB, Mongo, MapR tables
– Natural fit to relational cubes
– Easy extension to Mongo documents with indexes
– HBase application architecture has only one index
• Production solution used special key design in MapR tables
– Disk-based speed matched in-memory speed of Mongo
© 2014 MapR Technologies 57
Lesson: Obviously relational
problems often have effective
non-relational solutions
© 2014 MapR Technologies 58
Summary
• HBase and MapR tables are conceptually very simple
– But require careful design
– Composite key design crucial
– Non-relational column usage often important
• Practical systems can exceed relational throughput by many
orders of magnitude with very small clusters
• Composite file/table designs can be very powerful
– The world is not a database
© 2014 MapR Technologies 59
Me, Us
• Ted Dunning, Chief Application Architect, MapR
Committer PMC member, Mahout, Zookeeper, Drill
Bought the beer at the first HUG
• MapR
Distributes more open source components for Hadoop
Adds major technology for performance, HA, industry standard API’s
• Info
Hash tag - #mapr
See also - @ApacheMahout @ApacheDrill
@ted_dunning and @mapR

Weitere ähnliche Inhalte

Was ist angesagt?

Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Meet Hadoop Family: part 2
Meet Hadoop Family: part 2Meet Hadoop Family: part 2
Meet Hadoop Family: part 2caizer_x
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast DataMapR Technologies
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Meet Hadoop Family: part 1
Meet Hadoop Family: part 1Meet Hadoop Family: part 1
Meet Hadoop Family: part 1caizer_x
 

Was ist angesagt? (20)

Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Meet Hadoop Family: part 2
Meet Hadoop Family: part 2Meet Hadoop Family: part 2
Meet Hadoop Family: part 2
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Yarn
YarnYarn
Yarn
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Meet Hadoop Family: part 1
Meet Hadoop Family: part 1Meet Hadoop Family: part 1
Meet Hadoop Family: part 1
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 

Andere mochten auch

Andere mochten auch (6)

A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Connected Vehicle Data Platform
Connected Vehicle Data PlatformConnected Vehicle Data Platform
Connected Vehicle Data Platform
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Ähnlich wie Building HBase Applications - Ted Dunning

How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15MapR Technologies
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...
Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...
Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...Vinod Kumar Vavilapalli
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 

Ähnlich wie Building HBase Applications - Ted Dunning (20)

How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
try
trytry
try
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
Zeta architecture - Hive London May15
Zeta architecture - Hive London May15Zeta architecture - Hive London May15
Zeta architecture - Hive London May15
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...
Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...
Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query perf...
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 

Kürzlich hochgeladen

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Kürzlich hochgeladen (20)

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

Building HBase Applications - Ted Dunning

  • 1. © 2014 MapR Technologies 1 © MapR Technologies, confidential Big Data Everywhere Tel Aviv, June 2014 Building HBase Applications
  • 2. © 2014 MapR Technologies 2 Me, Us • Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG • MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • Info Hash tag - #mapr #DataIsreal See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR
  • 3. © 2014 MapR Technologies 4 Topics For Today • What is special about HBase applications? • Example: Time Series Database • Example: Web-fronted Dashboard • Questions and Discussion
  • 4. © 2014 MapR Technologies 5 Disks have gotten slower then: Fujitsu Eagle 380MB / 1.8MB / s = 221 s now: WD4001FAEX 4TB / 154MB / s = 25 k s = 7.2 hours
  • 5. © 2014 MapR Technologies 6 Memory has gotten smaller then: 64MB / 1 x Fujitsu Eagle = 0.168 now: 128GB / 12 x WD4001FAEX = 2.7 x 10-3
  • 6. © 2014 MapR Technologies 7 The Task Has Changed The primary job for databases is to discard data (speaking historically)
  • 7. © 2014 MapR Technologies 8 Modern Database Goals • Use modern disks fully • Work around lack of memory • Retain all the data
  • 8. © 2014 MapR Technologies 9 Modern Database Methods • Use large sequential I/O transfers • Use many machines • Handle write-mostly work load • Store related data elements together • Relax constraints (ACID? Schema? Indexes?)
  • 9. © 2014 MapR Technologies 10 How Does This Work? • Split data into tablets – Store tablets on many computers • Allow many columns – Only store data for live columns – Allows for innovative data arrangement • Allow applications to encode data • Buffer data to allow updates to be organized before writing • Previously written data may be merged periodically to improve organization, but avoid rewrite storms
  • 10. © 2014 MapR Technologies 11 MapR, HBase Table Architecture • Tables are divided into key ranges (tablets or regions) • Tablets are served automatically by MapR FS or region-server • Columns are divided into access groups (column families) CF1 CF2 CF3 CF4 CF5 R1 R2 R3 R4
  • 11. © 2014 MapR Technologies 12 MapR, HBase Tables are Divided into Regions • A table is divided into one or more regions • Each region is 1-5 GB in size, contains (start and end keys) • Regions contained within a single container (MapR) • Initially one Region per Table – Support pre-split tables using HBase APIs and HBase shell – You can also pre-splits with known access patterns • Important to spread regions across all available nodes • Region splits into 2 different regions when it becomes too large – Split is very quick • Uses MapR FS to spread data, manage space (MapR)
  • 12. © 2014 MapR Technologies 13 RDBMS versus MapR Tables RDBMS tables MapR tables ACID Row based ACID Sharding/Partitioning Distributed Regions SQL builtin Key lookup/key range scans No Unix file metadata operation on tables Unix File metadata operation on tables Indexes (B+Tree, R-Tree) row-key, no built in secondary index Primitive data types Byte arrays Inplace update Cell versioning
  • 13. © 2014 MapR Technologies 14 HBase versus MapR Tables HBase tables MapR tables Table/region/column family Table/region/column family Distributed Regions Distributed Regions Wide variation of latency Consistent latency No Unix file metadata operation on tables Unix File metadata operation on tables Limited column family count 64 column families Fuzzy snapshots Precise snapshots Replication API Not supported
  • 14. © 2014 MapR Technologies 16 Column Families • Columns are defined per row • Columns in HBase and MapR tables are grouped into column families – MapR supports up-to 64 column families – in-memory column-families • Grouping should facilitate common access patterns, not just reflect logical connection – Columns written or read together make good column families – Rarely needed columns should probably be in own column family – Radically different cardinality may suggest separate column families • Physically, all column family members are stored together on the file system – This makes access fast
  • 15. © 2014 MapR Technologies 19 Technical Summary • Tables are split into tablets or regions • Regions contain column families, stored together • Columns are only stored where needed • Many, many, many columns are allowed • Rows are accessed by a single key, filters are allowed on scans • Values are byte arrays
  • 16. © 2014 MapR Technologies 20 Technical Summary • Tables are split into tablets or regions • Regions contain column families, stored together • Columns are only stored where needed • Many, many, many columns are allowed • Rows are accessed by a single key, filters are allowed on scans • Values are byte arrays • You get low-level access to control speed, allow scaling • This is not your father’s RDBMS!
  • 17. © 2014 MapR Technologies 21 Cost/Benefits Summary • Pro – Predictable disk layout – Flexibility in key design, data format – Allows nested, document or relational models – Superb scalability, speed are possible • Con – Technically more demanding than a small Postgres instance – Hotspot risk requires proper design – Latency can be highly variable (for vanilla HBase, not MapR)
  • 18. © 2014 MapR Technologies 22
  • 19. © 2014 MapR Technologies 23 Let’s build something!
  • 20. © 2014 MapR Technologies 24
  • 21. © 2014 MapR Technologies 25 Time Series Database Example • Let’s build a time series database • See http://opentsdb.net/
  • 22. © 2014 MapR Technologies 26 The Problem • We have about 100,000 metrics with an average of about 10,000 distinct measurements per second • Some things change over seconds, some over hours • We want to query over time ranges from seconds to months to produce plots of time window aggregates – What is max latency per hour for last three weeks on all web tier machines?
  • 23. © 2014 MapR Technologies 27 Non-solution • Munin, RDF, Ganglia, Graphite all discard data – Remember the primary job of a classic database? • We want full resolution for historical comparisons • Size is no longer an issue, big has gotten quite small – 1012 data points << 10 nodes @ 12 x 4 TB per node – We can piggy back on another cluster
  • 24. © 2014 MapR Technologies 28 Why is This Hard? • 10,000 points per second x 84,600 seconds/day x 1000 days • That is nearly a trillion data points! (0.8 x 1012) • Queries require summarizing hundreds of thousands of points in 200 ms • We want the solution to be low impact and inexpensive – And be ready to scale by several orders of magnitude
  • 25. © 2014 MapR Technologies 29 Step 1: Compound keys give control over layout
  • 26. © 2014 MapR Technologies 30 Key Composition #1 Time Metric Node Value 10667 load1m n1 1.3 10667 load5m n1 1.0 10668 load1m n2 0.1 10668 load5m n2 0.1 10727 load1m n1 0.9 10727 load5m n1 0.9 All samples go to a single machine for a long time
  • 27. © 2014 MapR Technologies 31 Key Composition #2 Metric Time Node Value load1m 10667 n1 1.3 load1m 10668 n2 0.1 load1m 10727 n1 0.9 load5m 10667 n1 1.0 load5m 10668 n2 0.1 load5m 10727 n1 0.9 All samples for same metric go to a single machine Queries commonly focus on one or a few metrics at a time
  • 28. © 2014 MapR Technologies 32 Key Composition #3 Node Metric Time Value n1 load1m 10667 1.3 n1 load1m 10727 0.9 n1 load5m 10667 1.0 n1 load5m 10727 0.9 n2 load1m 10668 0.1 n2 load5m 10668 0.1 All samples for same node go to a single machine Unfortunately, queries commonly require data for a single metric, but many machines
  • 29. © 2014 MapR Technologies 33 Lesson: Pick door #2 Maximize density of desired data
  • 30. © 2014 MapR Technologies 34 Protip: Add key-value pairs to end of key for additional tags
  • 31. © 2014 MapR Technologies 35
  • 32. © 2014 MapR Technologies 36 Step 1: Relational not required
  • 33. © 2014 MapR Technologies 37 Tall and Skinny? Or Wide and Fat? Metric Time Node Value
  • 34. © 2014 MapR Technologies 38 Tall and Skinny? Or Wide and Fat? Metric Window Node +1 7 +1 8 +7 7 +7 8 +13 7 load1 m 13:00 n1 1.3 0.9 load1 m 13:00 n2 0.1 0.1 load5 m 13:00 n1 1.0 0.9 load5 m 13:00 n2 0.1 Filtering overhead is non-trivial … wide and fat has to filter fewer rows
  • 35. © 2014 MapR Technologies 39 Or non-relational? Metric Window Node Compressed load1 m 13:00 n1 {t:[17,77],v:[1.3,0.9]} load1 m 13:00 n2 {t:[18,78],v:[0.1,0.1]} load5 m 13:00 n1 {t:[17,77],v:[1.0,0.9]} load5 m 13:00 n2 {t:[18,78],v:[0.1,0.1]} Cleanup process can sweep up old values after the hour is finished. Blob data can be compressed using fancy tricks.
  • 36. © 2014 MapR Technologies 40 Lesson: Schemas can be very flexible and can even change on the fly
  • 37. © 2014 MapR Technologies 41
  • 38. © 2014 MapR Technologies 42 Step 3: Sequential reads hide many sins if density is high
  • 39. © 2014 MapR Technologies 43 Which Queries? Which Data? • Most common is 1-3 metrics for 5-100% of nodes based on tags – Which nodes have unusual load? – Do any nodes stand out for response latency? – Alarm bots • Also common to get 5-20 metrics for single node – Render dashboard for particular machine • Result density should be high for all common queries • Most data is never read but is retained as insurance policy – Can’t predict what you will need to diagnose future failure modes
  • 40. © 2014 MapR Technologies 44 Lesson: Have to know the queries to design in performance
  • 41. © 2014 MapR Technologies 45
  • 42. © 2014 MapR Technologies 46 Step 3: Time to Level up!
  • 43. © 2014 MapR Technologies 47 What About the Major Leagues? • Industrial sensors can dwarf current TSDB loads – Assume 100 (drill rigs | generators | heating systems | turbines) – Each has 10,000 sensors – Each is sampled once per second – Total sample rate is 106 samples / s (100x faster than before) • Industrial applications require extensive testing at scale – Want to load years of test data in a few days – Sample rate for testing is 100 x 106 samples / s (10,000x faster) • And you thought the first example was extreme
  • 44. © 2014 MapR Technologies 48 What About the Major Leagues? World Cup? • Industrial sensors can dwarf current TSDB loads – Assume 100 (drill rigs | generators | heating systems | turbines) – Each has 10,000 sensors – Each is sampled once per second – Total sample rate is 106 samples / s (100x faster than before) • Industrial applications require extensive testing at scale – Want to load years of test data in a few days – Sample rate for testing is 100 x 106 samples / s (10,000x faster) • And you thought the first example was extreme
  • 45. © 2014 MapR Technologies 49 Rough Design Outline • Want to record and query 100M samples / s at full resolution • Each MapR node serving tables can do ~20-40k inserts per second @ 1kB/record, ~60k inserts/s @ 100B / record • Each MapR node serving files can insert at ~1GB / s • We can buffer data in file system until we get >1000 samples per metric • Once data is sufficient, we do one insert per metric per hour – 3600x fewer inserts
  • 46. © 2014 MapR Technologies 50 Data Flow – High Speed TSDB Web tier Data catcherData catcherData catcher Flat files Consolidator Consolidator Consolidator Consolidator ConsolidatorMeasurement Systems TSDB tables Browser
  • 47. © 2014 MapR Technologies 51 Quick Results • Estimated data volumes – 100 M p / s / (3600 p/row) = 28 k row / s • Estimated potential throughput – 4 nodes @ 10 k row / s = 40 k row / s = 144 M p / s • Observed throughput for 2 day prototype – 1 feeder node, 4 table nodes, 10 M p / s – Feeder node CPU bound, table nodes < 5% CPU, disk ~ 0 • Simple prototype is limited by generator • Compare to observed max 100 k p / s on SQL Server – “Only” 100x faster
  • 48. © 2014 MapR Technologies 52 Lesson: Very high rates look plausible with hybrid design
  • 49. © 2014 MapR Technologies 53
  • 50. © 2014 MapR Technologies 54 Quick Example: Xactly Dashboard
  • 51. © 2014 MapR Technologies 55 MapR’s higher performance solution is far more efficient and cost-effective. “I can do something on a 10-node cluster that might require a 20-node cluster from a different Hadoop vendor”. CTO & SVP of Engineering Xactly: Sales Performance Management Xactly Insights: Delivering incentive compensation data for sales operations • Provide cloud-based performance management solutions to sales ops teams to help them design/manage optimal incentive compensation plans. • RDBMS-based platform was unable to scale in a cost effective way • Stringent performance and responsiveness expectations of users in a SaaS application • Highly responsive application that scaled to a growing customer base • Multi-tenancy capabilities in MapR helped ensure each customer’s data was isolated and separate from other customers in the SaaS application • MapR delivered on Xactly’s need for scale and low operational overhead OBJECTIVES CHALLENGES SOLUTION Business Impact
  • 52. © 2014 MapR Technologies 56 Dashboard Problem • Hundreds to thousands of customers have hundreds to thousands of sales team members • Want to be able to compare stats for each team member, team, company against relevant roll-ups • Prototyped system in RDB, Mongo, MapR tables – Natural fit to relational cubes – Easy extension to Mongo documents with indexes – HBase application architecture has only one index • Production solution used special key design in MapR tables – Disk-based speed matched in-memory speed of Mongo
  • 53. © 2014 MapR Technologies 57 Lesson: Obviously relational problems often have effective non-relational solutions
  • 54. © 2014 MapR Technologies 58 Summary • HBase and MapR tables are conceptually very simple – But require careful design – Composite key design crucial – Non-relational column usage often important • Practical systems can exceed relational throughput by many orders of magnitude with very small clusters • Composite file/table designs can be very powerful – The world is not a database
  • 55. © 2014 MapR Technologies 59 Me, Us • Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG • MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s • Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR

Hinweis der Redaktion

  1. See http://hbase.apache.org/book.html#regions.arch in-memory column-families, that data is high priority to put into memory (cache) recently used algorithum
  2. See http://outerthought.org/blog/417-ot.html While I don’t agree with the suggestion of using versioning to add a dimension because I believe it will help create a bad design pattern, his blog is spot on.