Suche senden
Hochladen
Philly DB MapR M7 - March 2013
•
11 gefällt mir
•
1,178 views
MapR Technologies
Folgen
Technologie
Melden
Teilen
Melden
Teilen
1 von 65
Empfohlen
natchatra
natchatra
natchatra
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
Data Science with Spark by Saeed Aghabozorgi
Data Science with Spark by Saeed Aghabozorgi
Sachin Aggarwal
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
NoSQLmatters
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
Empfohlen
natchatra
natchatra
natchatra
Machine Learning Recommendations with Spark
Machine Learning Recommendations with Spark
Carol McDonald
Data Science with Spark by Saeed Aghabozorgi
Data Science with Spark by Saeed Aghabozorgi
Sachin Aggarwal
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
NoSQLmatters
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
NoSQLmatters
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
Cloudera, Inc.
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
Apache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
Collecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databases
Matteo Redaelli
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Introduction to Spark SQL training workshop
Introduction to Spark SQL training workshop
(Susan) Xinh Huynh
Big data processing with apache spark
Big data processing with apache spark
sarith divakar
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Chris Fregly
Spark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
Time-Series Apache HBase
Time-Series Apache HBase
HBaseCon
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
HBase: Just the Basics
HBase: Just the Basics
HBaseCon
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
Spark SQL
Spark SQL
Caserta
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
Databricks
Introduction to Spark Streaming
Introduction to Spark Streaming
datamantra
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Databricks
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
Dchug m7-30 apr2013
Dchug m7-30 apr2013
jdfiori
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
Weitere ähnliche Inhalte
Andere mochten auch
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
Apache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
Collecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databases
Matteo Redaelli
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Introduction to Spark SQL training workshop
Introduction to Spark SQL training workshop
(Susan) Xinh Huynh
Big data processing with apache spark
Big data processing with apache spark
sarith divakar
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Chris Fregly
Spark streaming: Best Practices
Spark streaming: Best Practices
Prakash Chockalingam
Time-Series Apache HBase
Time-Series Apache HBase
HBaseCon
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
HBase: Just the Basics
HBase: Just the Basics
HBaseCon
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
Spark SQL
Spark SQL
Caserta
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
Databricks
Introduction to Spark Streaming
Introduction to Spark Streaming
datamantra
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Databricks
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Databricks
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
Andere mochten auch
(20)
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
Apache Spark Tutorial
Apache Spark Tutorial
Collecting and analyzing sensor data with hadoop or other no sql databases
Collecting and analyzing sensor data with hadoop or other no sql databases
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Introduction to Spark SQL training workshop
Introduction to Spark SQL training workshop
Big data processing with apache spark
Big data processing with apache spark
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Spark streaming: Best Practices
Spark streaming: Best Practices
Time-Series Apache HBase
Time-Series Apache HBase
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
HBase: Just the Basics
HBase: Just the Basics
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Spark SQL
Spark SQL
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
Introduction to Spark Streaming
Introduction to Spark Streaming
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Apache Hadoop and HBase
Apache Hadoop and HBase
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Ähnlich wie Philly DB MapR M7 - March 2013
Dchug m7-30 apr2013
Dchug m7-30 apr2013
jdfiori
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
Talk About Apache Cassandra
Talk About Apache Cassandra
Jacky Chu
Getting Started with HBase
Getting Started with HBase
Carol McDonald
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
Benoit Perroud
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
trihug
Parquet overview
Parquet overview
Julien Le Dem
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
Benoit Perroud
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
Murat Çakal
MyCassandra (Full English Version)
MyCassandra (Full English Version)
Shun Nakamura
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Jaemyung Kim
No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
01 hbase
01 hbase
Subhas Kumar Ghosh
Cassandra Overview
Cassandra Overview
btoddb
HBase internals
HBase internals
Matteo Bertozzi
Using Scala for building DSLs
Using Scala for building DSLs
IndicThreads
Getting started with HBase
Getting started with HBase
Carol McDonald
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Carol McDonald
Unit 4 memory system
Unit 4 memory system
chidabdu
Distributed Key-Value Stores- Featuring Riak
Distributed Key-Value Stores- Featuring Riak
samof76
Ähnlich wie Philly DB MapR M7 - March 2013
(20)
Dchug m7-30 apr2013
Dchug m7-30 apr2013
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Talk About Apache Cassandra
Talk About Apache Cassandra
Getting Started with HBase
Getting Started with HBase
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
Parquet overview
Parquet overview
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
MyCassandra (Full English Version)
MyCassandra (Full English Version)
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
No sql solutions - 공개용
No sql solutions - 공개용
01 hbase
01 hbase
Cassandra Overview
Cassandra Overview
HBase internals
HBase internals
Using Scala for building DSLs
Using Scala for building DSLs
Getting started with HBase
Getting started with HBase
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
Unit 4 memory system
Unit 4 memory system
Distributed Key-Value Stores- Featuring Riak
Distributed Key-Value Stores- Featuring Riak
Mehr von MapR Technologies
Converging your data landscape
Converging your data landscape
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
Mehr von MapR Technologies
(20)
Converging your data landscape
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
Kürzlich hochgeladen
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
Kürzlich hochgeladen
(20)
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Philly DB MapR M7 - March 2013
1.
Hbase and M7
Technical Overview Keys Botzum Senior Principal Technologist MapR Technologies March 2013 ©MapR Technologies 1
2.
Agenda
HBase MapR M7 Containers ©MapR Technologies 2
3.
HBase A sparse, distributed, persistent, indexed, and sorted map OR A NoSQL database OR A Columnar data store ©MapR Technologies 3
4.
Key-‐Value Store §
Row key – Binary sortable value § Row content key (analogous to a column) – Column family (string) – Column qualifier (binary) – Version/Omestamp (number) § A row key, column family, column qualifier, and version uniquely idenOfies a parOcular cell – A cell contains a single binary value ©MapR Technologies 4
5.
A Row
C0 C1 C2 C3 C4 CN Row Key Value1 Value2 Value3 Value4 ValueN … Column Column Row Key Version Value1 Family Qualifier Column Column Row Key Version Value2 Family Qualifier … Column Column Row Key Version ValueN Family Qualifier ©MapR Technologies 5
6.
Not A TradiAonal
RDBMS § Weakly typed and schema-‐less (unstructured or perhaps semi-‐ structured) – Almost everything is binary § No constraints – You can put any binary value in any cell – You can even put incompaOble types in two different instances of the same column family:column qualifier § Column (qualifiers) are created implicitly § Different rows can have different columns § No transacOons/no ACID – Only unit of atomic operaOon is a single row ©MapR Technologies 6
7.
API §
APIs for querying (get), scanning, and updaOng (put) – Operate on row key, column family, qualifier, version, and values – Can parOally specify and will retrieve union of results • if just specify row key, will get all values for it (with column family, qualifier) – By default only largest version (most recent if Omestamp) is returned • Specify row key and column family to get will retrieve all values for that row and column family – Scanning is just get over a range of row keys § Version – While defaults to a Omestamp, any integer is acceptable ©MapR Technologies 7
8.
Columnar §
Rather than storing table rows linearly on disk and each row on disk as a single byte range with fixed size fields, store columns of row separately – Very efficient storage for sparse data sets (NULL is free) – Compression works befer on similar data – Fetches of only subsets of row very efficient (less disk IO) – No fixed size on column values – No requirement to even define columns § Columns are grouped together into column families – Basically a file on disk – A unit of opOmizaOon – In Hbase, adding column is implicit, adding column family is explicit ©MapR Technologies 8
9.
HBase Table Architecture
§ Tables are divided into key ranges (regions) § Regions are served by nodes (RegionServers) § Columns are divided into access groups (columns families) CF1 CF2 CF3 CF4 CF5 R1 R2 R3 R4 ©MapR Technologies 9
10.
Storage Model Highlights
§ Data is stored in sorted order – A table contains rows – A sequence of rows are grouped together into a region • A region consists of various files related to those rows and is loaded into a region server • Regions are stored in HDFS for high availability – A single region server manages mulOple regions • Region assignment can change – load balancing, failures, etc. § Clients connect to tables – HBase runOme transparently determines the region (based on key ranges) and contacts the appropriate region server § At any given Ome exactly one region server provides access to a region – Master region servers (with Zookeeper) manage that ©MapR Technologies 10
11.
What’s Great About
This? § Very scalable § Easy to add region servers § Easy to move regions around § Scans are efficient – Unlike hashing based models § Access via row key is very efficient – Note: there are no secondary indexes § No schema, can store whatever you want when you want § Strong consistency § Integrated with Hadoop – Map-‐reduce on HBase is straighlorward – HDFS/MapR-‐FS provides data replicaOon ©MapR Technologies 11
12.
Data Storage Architecture
§ Data from a region column family is stored in an HFile – An HFile contains row key:column qualifier:version:value entries – Index at the end into the data – 64KB “blocks” by default § Update – New value is wrifen persistently to Write Ahead Log (WAL) – Cached in memory – When memory fills, write out new HFile § Read – Checks in memory, then all of the Hfiles – Read data cached in memory § Delete – Create a tombstone record (purged at major compacOon) ©MapR Technologies 12
13.
Apache HBase HFile
Structure Each cell is an individual key + value -‐ a row repeats the key for each column 64Kbyte blocks Key-‐value are compressed pairs are laid out in increasing order An index into the compressed blocks is created as a btree ©MapR Technologies 13
14.
HBase Region OperaAon
§ Typical region size is a few GB, someOmes even 10G or 20G § RS holds data in memory unOl full, then writes a new HFile – Logical view of database constructed by layering these files, with the latest on top newest oldest Key range represented by this region ©MapR Technologies 14
15.
HBase Read AmplificaAon
§ When a get/scan comes in, all the files have to be examined – schema-‐less, so where is the column? – Done in-‐memory and does not change what's on disk • Bloom-‐filters do not help in scans newest oldest With 7 files, a 1K-‐record get() potenOally takes about 30 seeks, 7 block fetches and decompressions, from HDFS. Even with the index in memory 7 seeks and 7 block fetches are required. ©MapR Technologies 15
16.
HBase Write AmplificaAon
§ To reduce the read-‐amplificaOon, HBase merges the HFiles periodically – process called compacOon – runs automaOcally when too many files – usually turned off due to I/O storms which interfere with client access – and kicked-‐off manually on weekends Major compacOon reads all files and merges into a single HFile ©MapR Technologies 16
17.
HBase Server Architecture
Zookeeper HDFS Server Coordinates Lookup Hbase Master Linux Client Filesystem Data Hbase Region Server HFiles WAL ©MapR Technologies 17
18.
WAL File §
A persistent record of every update/insert in sequence order – Shared by all regions on one region server – WAL files periodically rolled to limit size but older WALs sOll needed – WAL file no longer needed once every region with updates in WAL file has flushed those from memory to an HFile • Remember that more HFiles slow read path! § Must be replayed as part of recovery process since in memory updates are “lost” – This is very expensive and delays bringing a region back online ©MapR Technologies 18
19.
What’s Not So
Good Reliability • Complex coordinaOon between ZK, HDFS, HBase Master, and Region Server during region movement • CompacOons disrupt operaOons • Very slow crash recovery because of • CoordinaOon complexity • WAL log reading (one log/server) Business conAnuity • Many administraOve acOons require downOme • Not well integrated into MapR-‐FS mirroring and snapshot funcOonality ©MapR Technologies 19
20.
What’s Not So
Good Performance • Very long read/write path • Significant read and write amplificaOon • MulOple JVMs in read/write path – GC delays! Manageability • CompacOons, splits and merges must be done manually (in reality) • Lots of “well known” problems maintaining reliable cluster – spliwng, compacOons, region assignment, etc. • PracOcal limits on number of regions/region server and size of regions – can make it hard to fully uOlize hardware ©MapR Technologies 20
21.
Region Assignment in
Apache HBase ©MapR Technologies 21
22.
Apache HBase on
MapR Limited data management, data protecOon and disaster recovery for tables. ©MapR Technologies 22
23.
Agenda
HBase MapR M7 Containers ©MapR Technologies 23
24.
MapR
A provider of enterprise grade Hadoop with uniquely differenOated features ©MapR Technologies 24
25.
MapR: The Enterprise
Grade DistribuAon ©MapR Technologies 25
26.
One PlaVorm for
Big Data Broad RecommendaOon Engines Fraud DetecOon Billing LogisOcs range of applicaOons Risk Modeling Market SegmentaOon Inventory ForecasOng … Batch InteracOve Real-‐Ome Map File-‐Based SQL Stream Database Search Reduce ApplicaOons Processing … 99.999% Data Disaster Scalability Enterprise MulO-‐ & HA ProtecOon Recovery Performance IntegraOon tenancy ©MapR Technologies 26
27.
Dependable: Lights Out
Data Center Ready Reliable Compute Dependable Storage § Automated stateful failover § Business conOnuity with § Automated re-‐replicaOon snapshots and mirrors § Recover to a point in Ome § Self-‐healing from HW and SW failures § End-‐to-‐end check summing § Load balancing § Strong consistency § No lost jobs or data § Data safe § 99999’s of upOme § Mirror across sites to meet Recovery Time ObjecOves ©MapR Technologies 27
28.
Fast: World Record
Performance Benchmark MapR 2.1.1 CDH 4.1.1 MapR Speed Increase Terasort (1x replicaOon, compression disabled) Total 13m 35s 26m 6s 2X Map 7m 58s 21m 8s 3X Reduce 13m 32s 23m 37s 1.8X DFSIO throughput/node Read 1003 MB/s 656 MB/s 1.5X MinuteSort Record Write 924 MB/s 654 MB/s 1.4X 1.5 TB in 60 seconds YCSB (50% read, 50% update) 2103 nodes Throughput 36,584.4 op/s 12,500.5 op/s 2.9X RunOme 3.80 hr 11.11 hr 2.9X YCSB (95% read, 5% update) Throughput 24,704.3 op/s 10,776.4 op/s 2.3X RunOme 0.56 hr 1.29 hr 2.3X Benchmark hardware configuraOon: 10 servers, 12 x 2 cores (2.4 GHz), 12 x 2TB, 48 GB, 1 x 10GbE ©MapR Technologies 28
29.
The Cloud Leaders
Pick MapR Amazon EMR is the largest Google chose MapR to Hadoop provider in revenue provide Hadoop on Google and # of clusters Compute Engine ©MapR Technologies 29
30.
MapR Supports Broad
Set of Customers Global Credit Card Issuer Leading Retailer § RecommendaOon Engine § Customer Behavior Analysis § Customer targeOng § Fraud detecOon and PrevenOon § Brand Monitoring § Viewer Behavioral analyOcs § Global threat analyOcs § Intrusion detecOon & prevenOon § RecommendaOon Engine § Virus analysis § Forensic analysis § Family tree connecOons § Clickstream Analysis § PaOent care § Log analysis § Quality profiling/field monitoring § HBase failure analysis § Fraud DetecOon § AdverOsing exchange § Monitoring and measuring § Channel analyOcs analysis and opOmizaOon online behavior § Customer Revenue § Enterprise Grade AnalyOcs § Customer targeOng Plalorm § ETL Offload § Social media analysis § COOP features ©MapR Technologies 30
31.
MapR EdiAons
§ Control System § Control System § All the Features of M5 § NFS Access § NFS Access § Simplified § Performance AdministraOon for § Performance HBase § Unlimited Nodes § High Availability § Increased Performance § Free § Snapshots & Mirroring § Consistent Low Latency § 24 X 7 Support § Unified Snapshots, § Annual SubscripOon Mirroring Also Available through: Compute Engine ©MapR Technologies 31
32.
Agenda
Hbase MapR M7 Containers ©MapR Technologies 32
33.
M7
An integrated system for unstructured and structured data ©MapR Technologies 33
34.
Introducing MapR M7
§ An integrated system – Unified namespace for files and tables – Built-‐in data management & protecOon – No extra administraOon § Architected for reliability and performance – Fewer layers – Single hop to data – No compacOons, low i/o amplificaOon – Seamless splits, automaOc merges – Instant recovery ©MapR Technologies 34
35.
Binary CompaAble with
HBase APIs § HBase applicaOons work "as is" with M7 – No need to recompile (binary compaOble) § Can run M7 and HBase side-‐by-‐side on the same cluster – e.g., during a migraOon – can access both M7 table and HBase table in same program § Use standard Apache HBase CopyTable tool to copy a table from HBase to M7 or vice-‐versa % hbase org.apache.hadoop.hbase.mapreduce.CopyTable -‐-‐new.name=/user/srivas/mytable oldtable ©MapR Technologies 35
36.
M7: Remove
Layers, Simplify Take note! No JVM! MapR M7 ©MapR Technologies 36
37.
M7: No
Master and No RegionServers No JVM problems One hop to data Unified cache No extra daemons to manage ©MapR Technologies 37
38.
Region Assignment in
Apache HBase None of this complexity is present in MapR M7 ©MapR Technologies 38
39.
Unified Namespace for
Files and Tables $ pwd /mapr/default/user/dave $ ls file1 file2 table1 table2 $ hbase shell hbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3' 0 row(s) in 0.1570 seconds $ ls file1 file2 table1 table2 table3 $ hadoop fs -‐ls /user/dave Found 5 items -‐rw-‐r-‐-‐r-‐-‐ 3 mapr mapr 16 2012-‐09-‐28 08:34 /user/dave/file1 -‐rw-‐r-‐-‐r-‐-‐ 3 mapr mapr 22 2012-‐09-‐28 08:34 /user/dave/file2 trwxr-‐xr-‐x 3 mapr mapr 2 2012-‐09-‐28 08:32 /user/dave/table1 trwxr-‐xr-‐x 3 mapr mapr 2 2012-‐09-‐28 08:33 /user/dave/table2 trwxr-‐xr-‐x 3 mapr mapr 2 2012-‐09-‐28 08:38 /user/dave/table3 ©MapR Technologies 39
40.
Tables for End
Users § Users can create and manage their own tables – Unlimited # of tables § Tables can be created in any directory – Tables count towards volume and user quotas § No admin intervenOon needed – I can create a file or a directory without opening a Ocket with admin team, why not a table? – Do stuff on the fly, no stop/restart servers § AutomaOc data protecOon and disaster recovery – Users can recover from snapshots/mirrors on their own ©MapR Technologies 40
41.
M7 – An
Integrated System ©MapR Technologies 41
42.
M7
ComparaOve Analysis with Apache HBase, Level-‐DB and a BTree ©MapR Technologies 42
43.
HBase Write AmplificaAon
Analysis § Assume 10G per region, write 10% per day, grow 10% per week – 1G of writes – a€er 7 days, 7 files of 1G and 1file of 10G (only 1G is growth) § IO Cost – Wrote 7G to WAL + 7G to HFiles – CompacOon adds sOll more • read: 17G (= 7 x 1G + 1 x 10G) • write: 11G write to new Hfile – WAF – wrote 7G “for real” but actual disk IO a€er compacOon is read 17G + write 25G and that’s assuming no applicaOon reads! § IO Cost of 1000 regions similar to above – read 17T, write 25T è major impact on node § Best pracOce, limit # of regions/node à can’t fully uOlize storage ©MapR Technologies 43
44.
AlternaAve: Level-‐DB
§ Tiered, logarithmic increase – L1: 2 x 1M files – L2: 10 x 1M – L3: 100 x 1M – L4: 1,000 x 1M, etc § CompacOon overhead – avoids IO storms (i/o done in smaller increments of ~10M) – but significantly extra bandwidth compared to HBase § Read overhead is sOll high – 10-‐15 seeks, perhaps more if the lowest level is very large – 40K -‐ 60K read from disk to retrieve a 1K record ©MapR Technologies 44
45.
BTree analysis
§ Read finds data directly, proven to be fastest – interior nodes only hold keys – very large branching factor – values only at leaves – thus index caches work – R = logN seeks, if no caching – 1K record read will transfer about logN blocks from disk § Writes are slow on inserts – inserted into correct place right away – otherwise read will not find it – requires btree to be conOnuously rebalanced – causes extreme random i/o in insert path – W = 2.5x + logN seeks if no caching ©MapR Technologies 45
46.
Log-‐Structured Merge Trees
§ LSM Trees reduce insert cost by deferring and batching index changes – If don't compact o€en, read perf is impacted – If compact too o€en, write perf is impacted § B-‐Trees are great for reads – but expensive to update in real-‐Ome Can we combine both ideas? Writes cannot be done befer than W = 2.5x write to log + write data to somewhere + update meta-‐data Memory Disk Index Log Write Read Index ©MapR Technologies 46
47.
M7 from MapR
§ TwisOng BTree's – leaves are variable size (8K -‐ 8M or larger) – can stay unbalanced for long periods of Ome • more inserts will balance it eventually • automaOcally throfles updates to interior btree nodes – M7 inserts "close to" where the data is supposed to go § Reads – Uses BTree structure to get "close" very fast • very high branching with key-‐prefix-‐compression – UOlizes a separate lower-‐level index to find it exactly • updated "in-‐place” bloom-‐filters for gets, range-‐maps for scans § Overhead – 1K record read will transfer about 32K from disk in logN seeks ©MapR Technologies 47
48.
M7 provides
Instant Recovery § Instead of having one WAL/region server or even one/region, we have many micro-‐WALs/region § 0-‐40 microWALs per region – idle WALs “compacted”, so most are empty – region is up before all microWALs are recovered – recovers region in background in parallel – when a key is accessed, that microWAL is recovered inline – 1000-‐10000x faster recovery § Never perform equivalent of HBase major or minor compacOon § Why doesn't HBase do this? M7 uses MapR-‐FS, not HDFS – No limit to # of files on disk – No limit to # open files – I/O path translates random writes to sequenOal writes on disk ©MapR Technologies 48
49.
Summary
1K record -‐read CompacAon Recovery amplificaAon HBase with 7 hfiles 30 seeks IO Storms Huge WAL to recover 130K xfer good bandwidth HBase with 3 hfiles 15 seeks, IO Storms Huge WAL to recover 70K xfer high bandwidth LevelDB with 5 levels 13 seeks No i/o storms WAL is Ony 48K xfer Very high b/w BTree logN seeks No i/o storms WAL is proporOonal to logN xfer but 100% random concurrency + cache MapR M7 logN seeks No i/o storms microWALs allow 32K xfer low bandwidth recovery < 100ms ©MapR Technologies 49
50.
M7: Fileservers
Serve Regions § Region lives enOrely inside a container – Does not coordinate through ZooKeeper § Containers support distributed transacOons – with replicaOon built-‐in § Only coordinaOon in the system is for splits – Between region-‐map and data-‐container – already solved this problem for files and its chunks ©MapR Technologies 50
51.
Agenda
Hbase MapR M7 Containers ©MapR Technologies 51
52.
What's a MapR container? ©MapR Technologies 52
53.
MapR's Containers
Files/directories are sharded into blocks, and placed in containers on disks l Each container contains l Directories & files l Data blocks Containers are l BTrees ~32 GB segments of 100% random writes disk, placed on l nodes Patent Pending ©MapR Technologies 53
54.
M7 Containers
§ Container holds many files – regular, dir, symlink, btree, chunk-‐map, region-‐map, … – all random-‐write capable § Container is replicated to servers – unit of resynchronizaOon § Region lives enOrely inside 1 container – all files + WALs + btree's + bloom-‐filters + range-‐maps ©MapR Technologies 54
55.
Read-‐write ReplicaAon
§ Write are synchronous client2 – All copies have same data client1 clientN § Data is replicated in a "chain" fashion – befer bandwidth, uOlizes full-‐duplex network links well § Meta-‐data is replicated in a "star" manner – response Ome befer, bandwidth not of concern – data can also be done this way ©MapR Technologies 55 55
56.
Random WriAng in
MapR S1 Ask for Client 64M block wriAng CLDB Create cont. data S1, S2, S4 afach S1, S3 Write S1, S4, S5 next chunk S2 Picks master S2, S4, S5 to S2 and 2 replica slaves S3 S2, S3, S5 S4 S5 S3 ©MapR Technologies 56
57.
Container Balancing
• Servers keep a bunch of containers "ready to go". • Writes get distributed around the cluster. l As data size increases, writes spread more, like dropping a pebble in a pond l Larger pebbles spread the ripples farther l Space balanced by moving idle containers ©MapR Technologies 57
58.
Failure Handling
Containers managed at CLDB (HB, container-‐reports). l HB loss + upstream enOty reports failure => server dead l Incr epoch at CLDB l Rearrange repl path l Exact same code for files Container LocaOon DataBase and M7 tables (CLDB) l No ZK ©MapR Technologies 58
59.
Architectural Params
HDFS 'block' § Unit of I/O – 4K/8K (8K in MapR) 10^3 10^6 10^9 i/o map-‐red resync admin § Unit of Chunking (a map-‐reduce split) § Unit of AdministraOon (snap, – 10-‐100's of megabytes repl, mirror, quota, backup) – 1 gigabyte -‐ 1000's of terabytes § Unit of Resync (a replica) – volume in MapR – 10-‐100's of gigabytes – what data is affected by my missing blocks? – container in MapR ©MapR Technologies 59
60.
Other M7 Features
§ Smaller disk footprint – M7 never repeats the key or column name § Columnar layout – M7 supports 64 column families – in-‐memory column-‐families § Online admin – M7 schema changes on the fly – delete/rename/redistribute tables ©MapR Technologies 60
61.
Thank you!
QuesAons? ©MapR Technologies 61
62.
Examples: Reliability Issues
§ CompacAons disrupt HBase operaAons: I/O bursts overwhelm nodes (hfp://hbase.apache.org/book.html#compacOon) § Very slow crash recovery: RegionServer crash can cause data to be unavailable for up to 30 minutes while WALs are replayed for impacted regions. (HBASE-‐1111) § Unreliable splibng: Region spliwng may cause data to be inconsistent and unavailable. ( hfp://chilinglam.blogspot.com/2011/12/my-‐experience-‐with-‐ hbase-‐dynamic.html) § No client throcling: HBase client can easily overwhelm RegionServers and cause downOme. (HBASE-‐5161, HBASE-‐5162) ©MapR Technologies 62
63.
Examples: Business ConAnuity
Issues § No Snapshots: MapR provides all-‐or-‐nothing snapshots for HBase. The WALs are shared among tables so single-‐table and selecOve mulO-‐table snapshots are not possible. (HDFS-‐2802, HDFS-‐3370, HBASE-‐50, HBASE-‐6055) § Complex Backup Process: complex, unreliable and inefficient. ( hfp://bruteforcedata.blogspot.com/2012/08/hbase-‐disaster-‐ recovery-‐and-‐whisky.html) § AdministraAon Requires DownAme: The enOre cluster must be taken down in order to merge regions. Tables must be disabled to change schema, replicaOon and other properOes. (HBASE-‐420, HBASE-‐1621, HBASE-‐5504, HBASE-‐5335, HBASE-‐3909) ©MapR Technologies 63
64.
Examples: Performance Issues
§ Limited support for mulAple column families: HBase has issues handling mulOple column family due to compacOons. The standard HBase documentaOon recommends no more than 2-‐3 column families. (HBASE-‐3149) § Limited data locality: HBase does not take into account block locaOons when assigning regions. A€er a reboot, RegionServers are o€en reading data over the network rather than the local drives. (HBASE-‐4755, HBASE-‐4491) § Cannot uAlize disk space: HBase RegionServers struggle with more than 50-‐150 regions per RegionServer so a commodity server can only handle about 1TB of HBase data, wasOng disk space. ( hfp://hbase.apache.org/book/important_configuraOons.html, hfp://www.cloudera.com/blog/2011/04/hbase-‐dos-‐and-‐donts/) § Limited # of tables: A single cluster can only handle several tens of tables effecOvely. ( hfp://hbase.apache.org/book/important_configuraOons.html) ©MapR Technologies 64
65.
Examples: Manageability Issues
§ Manual major compacAons: HBase major compacOons are disrupOve so producOon clusters keep them disabled and rely on the administrator to manually trigger compacOons. ( hfp://hbase.apache.org/book.html#compacOon) § Manual splibng: HBase auto-‐spliwng does not work properly in a busy cluster so users must pre-‐split a table based on their esOmate of data size/ growth. ( hfp://chilinglam.blogspot.com/2011/12/my-‐experience-‐with-‐hbase-‐ dynamic.html) § Manual merging: HBase does not automaOcally merge regions that are too small. The administrator must take down the cluster and trigger the merges manually. § Basic administraAon is complex: Renaming a table requires copying all the data. Backing up a cluster is a complex process. (HBASE-‐643) ©MapR Technologies 65