SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
BigTable & Hbase
A Distributed Storage System for Structured Data


             Edward J. Yoon
              edwardyoon@apache.org
Three Major Component
• Master Server
- Responsible for assigning tablets to tablet servers, detecting the addition and
   expiration of tablet servers, balancing tablet-server load, and garbage collection of
   files in HDFS
- Handles schema changed such as table and CF creations


• Tablet Server
- Manages a set tablets(10~1000 per tablet server)
- Handles read write requests to the tablets
- Splits tablets that have grown too large (100-200 MB)


• Client Library
- Communicate directly with tablet servers for reads and writes
Architecture

-   Assigning tablets
-   Detecting the addition and expiration of tablet
                                                                       Client
-   Balancing tablet-server load
-   Handle schema changed



                        Master Server
                                                                                               Chubby
                         GFS master
                         server, CMS
                                                                                     -   Handle master election
                            client
                                                                                     -   Store the bootstrap location of Hbase data
                                                                                     -   Discover region server
                                                                                     -   Store access control lists



                                                                                CMS server
        Tablet Server            Tablet Server         Tablet Server
                                                                            - Scheduling jobs
                                                                            - Managing resources on the cluster
          GFS chunk               GFS chunk             GFS chunk           - dealing with machine failures
         server, CMS             server, CMS           server, CMS
            client                  client                client
Data Model
• Doesn’t support a full relational data model
• Multi-dimensional sorted map
• Indexed by a row, column, timestamp
  (row: string, column : string, time : int 64)  string




• Column-oriented storage
- Most queries only involve a few columns out of many, so greatly reduces I/O.
Tablet Location
• Use three-level hierarchy analogous to that of a B+
  tree
 - Location is ip : port of relevant server
   - 1st level: Bootstrapped from lock server, points to location of root tablet
   - 2nd level: Uses META 0 data to find owner of appropriate META 1 tablet
   - 3rd level: META1 table holds locations of tablets of all other tables
Tablet Assignment

                                                                                       Tablet servers
                              Master keeps track of the set of live tablet
                              servers the current assignment of tablets to
    Cluster                   region servers, including which tablets are
    manager                   unassigned.



                                               Chubby
1) Start
a server
           2) Create a lock
                                                                             8) Acquire and
                                                                                               9) Reassign unassigned
                                                                             Delete the lock
                                                                                                       tablets
                              3) Acquire the lock
                                                                4) Monitor


                                           5) Assign tablets
  Region Server                                                                       Master Server
                                         6) Check lock status
Tablet Serving
• To recover a tablet

- reads its metadata from the METADATA table
 - metadata contains
    - the list of SS-Tables that comprise a tablet
    - a set of a redo points, which are pointers into any commit logs that may   contain data for
   the tablet.
- reads the indices of the SSTables into memory
- reconstructs the memtable by applying all of the updates that have committed
   since the redo points
Compaction
                     V5.0
Create new
memtable
                                                 Read op
                  memtable                                        Deleted data are
                                                                  removed
 Frozen                                                           Storage can be re-
memtable                                                          used

                                   Memory                         Major compaction


                                    DFS                           Memtable + all
        Tablet log                                                SSTables
                                                                   -> to one SSTable

   Write op                 V4.0    V3.0         V2.0      V1.0
                                                                  Merging compaction
                                                                  Memtable + a few SSTables
                                                                  -> A new SSTable
                                           SSTable files

                                                                  Periodically done.
    Minor compaction
                                                                  Deleted data are still alive.
    Memtable -> a new
                                              V6.0
         SSTable
Compression
• Clients can control whether or not SSTables for a
  locality group are compressed

• Tow-pass custom compression scheme
  -   First-pass: long common strings across a large window (BMDiff)
  -   Second-pass: looks for repetitions in a small 16KB window (zippy)
  -   Both compression passes are very fast
  -   Space reduction

• Allow to identify large amounts of shared boilerplate in
  pages from same host
  - Choose their row names so that similar data ends up clustered and therefore
  achieve very good performance
Caching for read performance
• Use two level of caching to improve read performance

• Scan cache
  - Higher-level cache
  - Most useful for applications that tend to read the same data

• Block cache
  - Lower-level cache
  - Useful for applications or random read of different columns in same locality
  group within a hot row
Hbase : BigTable clone project
• http://hadoop.apache.org/hbase/

• Written in java

• we do not have chubby or a CMS server, we have Job-
  Tracker and zookeeper coming soon.

• Since Hadoop (GFS) doesn't provide file-append
  function, Current Hbase have a problem of data loss
  when Hbase crashed.
 - Hadoop 0.19.x provides file append function

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...Altinity Ltd
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL DatabaseHeman Hosainpana
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Thinking BIG
Thinking BIGThinking BIG
Thinking BIGLilia Sfaxi
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache KylinShi Shao Feng
 

Was ist angesagt? (20)

Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Thinking BIG
Thinking BIGThinking BIG
Thinking BIG
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Spark tunning in Apache Kylin
Spark tunning in Apache KylinSpark tunning in Apache Kylin
Spark tunning in Apache Kylin
 

Andere mochten auch

Google Big Table
Google Big TableGoogle Big Table
Google Big TableOmar Al-Sabek
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google BigtableRomain Jacotin
 
Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02Camilo LĂłpez
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
 
Big table
Big tableBig table
Big tablePSIT
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentationvanjakom
 
Aprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesAprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesEdgar Marca
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonGrisha Weintraub
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
Big table por Matias tesoriero
Big table por Matias tesorieroBig table por Matias tesoriero
Big table por Matias tesorieromtesoriero
 
Mallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation FrameworkMallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation FrameworkEmilio Torrens
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduceRomain Jacotin
 

Andere mochten auch (20)

GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 
Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02
 
Bigtable
BigtableBigtable
Bigtable
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
Big table
Big tableBig table
Big table
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
Aprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesAprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y Aplicaciones
 
Cloud Computing y MapReduce
Cloud Computing y MapReduceCloud Computing y MapReduce
Cloud Computing y MapReduce
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
Big table por Matias tesoriero
Big table por Matias tesorieroBig table por Matias tesoriero
Big table por Matias tesoriero
 
Mallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation FrameworkMallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation Framework
 
MapReduce en Hadoop
MapReduce en HadoopMapReduce en Hadoop
MapReduce en Hadoop
 
HDFS
HDFSHDFS
HDFS
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 

Ähnlich wie BigTable And Hbase

Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase
 
CloudStack Architecture Future
CloudStack Architecture FutureCloudStack Architecture Future
CloudStack Architecture FutureKimihiko Kitase
 
Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15
Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15
Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15Remi Bergsma
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practicesHaseeb Alam
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Ontico
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache AccumuloJared Winick
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingSergey Bushik
 
Migration of a computation cluster to Debian
Migration of a computation cluster to DebianMigration of a computation cluster to Debian
Migration of a computation cluster to DebianLogilab
 
TSM 6.4 update 22.11.2012
TSM 6.4 update 22.11.2012TSM 6.4 update 22.11.2012
TSM 6.4 update 22.11.2012Solv AS
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
Veritas Software Foundations
Veritas Software FoundationsVeritas Software Foundations
Veritas Software Foundations.GastĂłn. .Bx.
 
Highly available (ha) kubernetes
Highly available (ha) kubernetesHighly available (ha) kubernetes
Highly available (ha) kubernetesTarek Ali
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAlan Renouf
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshootingglbsolutions
 
Varrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationVarrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationpittmantony
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbaseiammutex
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage InternalsDataWorks Summit
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 

Ähnlich wie BigTable And Hbase (20)

Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
CloudStack Architecture Future
CloudStack Architecture FutureCloudStack Architecture Future
CloudStack Architecture Future
 
Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15
Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15
Operating CloudStack: Sharing My Tool Box @ApacheCon NA'15
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Evaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for BenchmarkingEvaluating NoSQL Performance: Time for Benchmarking
Evaluating NoSQL Performance: Time for Benchmarking
 
Migration of a computation cluster to Debian
Migration of a computation cluster to DebianMigration of a computation cluster to Debian
Migration of a computation cluster to Debian
 
TSM 6.4 update 22.11.2012
TSM 6.4 update 22.11.2012TSM 6.4 update 22.11.2012
TSM 6.4 update 22.11.2012
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Veritas Software Foundations
Veritas Software FoundationsVeritas Software Foundations
Veritas Software Foundations
 
Highly available (ha) kubernetes
Highly available (ha) kubernetesHighly available (ha) kubernetes
Highly available (ha) kubernetes
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 
VMware Performance Troubleshooting
VMware Performance TroubleshootingVMware Performance Troubleshooting
VMware Performance Troubleshooting
 
Varrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentationVarrow madness 2013 virtualizing sql presentation
Varrow madness 2013 virtualizing sql presentation
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 

Mehr von Edward Yoon

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learningEdward Yoon
 
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)Introduction to apache horn (incubating)
Introduction to apache horn (incubating)Edward Yoon
 
Apache Hama at Samsung Open Source Conference
Apache Hama at Samsung Open Source ConferenceApache Hama at Samsung Open Source Conference
Apache Hama at Samsung Open Source ConferenceEdward Yoon
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링Edward Yoon
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스Edward Yoon
 
Quick Understanding of NoSQL
Quick Understanding of NoSQLQuick Understanding of NoSQL
Quick Understanding of NoSQLEdward Yoon
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big dataEdward Yoon
 
Apache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyApache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyEdward Yoon
 
Apache Hama 0.4
Apache Hama 0.4Apache Hama 0.4
Apache Hama 0.4Edward Yoon
 
Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Edward Yoon
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introductionEdward Yoon
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsEdward Yoon
 
Apache hama 0.2-userguide
Apache hama 0.2-userguideApache hama 0.2-userguide
Apache hama 0.2-userguideEdward Yoon
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time applicationEdward Yoon
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopEdward Yoon
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear AlgebraEdward Yoon
 
Heart Proposal
Heart ProposalHeart Proposal
Heart ProposalEdward Yoon
 

Mehr von Edward Yoon (17)

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
 
Introduction to apache horn (incubating)
Introduction to apache horn (incubating)Introduction to apache horn (incubating)
Introduction to apache horn (incubating)
 
Apache Hama at Samsung Open Source Conference
Apache Hama at Samsung Open Source ConferenceApache Hama at Samsung Open Source Conference
Apache Hama at Samsung Open Source Conference
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스
 
Quick Understanding of NoSQL
Quick Understanding of NoSQLQuick Understanding of NoSQL
Quick Understanding of NoSQL
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Apache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyApache hama @ Samsung SW Academy
Apache hama @ Samsung SW Academy
 
Apache Hama 0.4
Apache Hama 0.4Apache Hama 0.4
Apache Hama 0.4
 
Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introduction
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in clouds
 
Apache hama 0.2-userguide
Apache hama 0.2-userguideApache hama 0.2-userguide
Apache hama 0.2-userguide
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
Heart Proposal
Heart ProposalHeart Proposal
Heart Proposal
 

KĂźrzlich hochgeladen

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervĂŠ Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

KĂźrzlich hochgeladen (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

BigTable And Hbase

  • 1. BigTable & Hbase A Distributed Storage System for Structured Data Edward J. Yoon edwardyoon@apache.org
  • 2. Three Major Component • Master Server - Responsible for assigning tablets to tablet servers, detecting the addition and expiration of tablet servers, balancing tablet-server load, and garbage collection of files in HDFS - Handles schema changed such as table and CF creations • Tablet Server - Manages a set tablets(10~1000 per tablet server) - Handles read write requests to the tablets - Splits tablets that have grown too large (100-200 MB) • Client Library - Communicate directly with tablet servers for reads and writes
  • 3. Architecture - Assigning tablets - Detecting the addition and expiration of tablet Client - Balancing tablet-server load - Handle schema changed Master Server Chubby GFS master server, CMS - Handle master election client - Store the bootstrap location of Hbase data - Discover region server - Store access control lists CMS server Tablet Server Tablet Server Tablet Server - Scheduling jobs - Managing resources on the cluster GFS chunk GFS chunk GFS chunk - dealing with machine failures server, CMS server, CMS server, CMS client client client
  • 4. Data Model • Doesn’t support a full relational data model • Multi-dimensional sorted map • Indexed by a row, column, timestamp (row: string, column : string, time : int 64)  string • Column-oriented storage - Most queries only involve a few columns out of many, so greatly reduces I/O.
  • 5. Tablet Location • Use three-level hierarchy analogous to that of a B+ tree - Location is ip : port of relevant server - 1st level: Bootstrapped from lock server, points to location of root tablet - 2nd level: Uses META 0 data to find owner of appropriate META 1 tablet - 3rd level: META1 table holds locations of tablets of all other tables
  • 6. Tablet Assignment Tablet servers Master keeps track of the set of live tablet servers the current assignment of tablets to Cluster region servers, including which tablets are manager unassigned. Chubby 1) Start a server 2) Create a lock 8) Acquire and 9) Reassign unassigned Delete the lock tablets 3) Acquire the lock 4) Monitor 5) Assign tablets Region Server Master Server 6) Check lock status
  • 7. Tablet Serving • To recover a tablet - reads its metadata from the METADATA table - metadata contains - the list of SS-Tables that comprise a tablet - a set of a redo points, which are pointers into any commit logs that may contain data for the tablet. - reads the indices of the SSTables into memory - reconstructs the memtable by applying all of the updates that have committed since the redo points
  • 8. Compaction V5.0 Create new memtable Read op memtable Deleted data are removed Frozen Storage can be re- memtable used Memory Major compaction DFS Memtable + all Tablet log SSTables -> to one SSTable Write op V4.0 V3.0 V2.0 V1.0 Merging compaction Memtable + a few SSTables -> A new SSTable SSTable files Periodically done. Minor compaction Deleted data are still alive. Memtable -> a new V6.0 SSTable
  • 9. Compression • Clients can control whether or not SSTables for a locality group are compressed • Tow-pass custom compression scheme - First-pass: long common strings across a large window (BMDiff) - Second-pass: looks for repetitions in a small 16KB window (zippy) - Both compression passes are very fast - Space reduction • Allow to identify large amounts of shared boilerplate in pages from same host - Choose their row names so that similar data ends up clustered and therefore achieve very good performance
  • 10. Caching for read performance • Use two level of caching to improve read performance • Scan cache - Higher-level cache - Most useful for applications that tend to read the same data • Block cache - Lower-level cache - Useful for applications or random read of different columns in same locality group within a hot row
  • 11. Hbase : BigTable clone project • http://hadoop.apache.org/hbase/ • Written in java • we do not have chubby or a CMS server, we have Job- Tracker and zookeeper coming soon. • Since Hadoop (GFS) doesn't provide file-append function, Current Hbase have a problem of data loss when Hbase crashed. - Hadoop 0.19.x provides file append function