SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Introduction to Maxtable


Xue Yingfei
http://code.google.com/p/maxtable/
Agenda

  Architecture Overview
  Key Features
  Maxtable Query Language (MQL)
  Operation and Maintenance
  Future Works




5 Mar 2012                         2
Architecture Overview ( 1 )

     Maxtable consists of three components:
     1.      Metadata server: This provides the global namespace for all the tables in this
             system. It keeps the B-tree structure in memory.
     2.      Ranger server: It holds some ranges of the data and the default size of one range
             is about 100GB.
     3.      Client library: The client library is linked with applications. This enables
             applications to read/write data stored in Maxtable.
     What components in the system and how they relate to one another.




5 Mar 2012                                                                                       3
Architecture Overview ( 2 )

     How to store the table in the disk ?




      One SSTable = 4M data.
      One Tablet = 25K SSTable = One range = 100G.
      One Table = 42K Tablet.
      So, one table can contain more than 4PB data, and we can extend the size of block
      or use two tablet levels to save index data to contain more data.




5 Mar 2012                                                                                4
Architecture Overview ( 3 )

        How does maxtable work
     •       Maxtable stores data in a table, sorted by a primary key(the first column).
     •       There are two types for data in the table: varchar (string) and int (number).
     •       Scaling is achieved by automatically splitting tables into contiguous ranges and
             assigning them up to different physical machines.
     •       There are two types of servers in a Maxtable cluster, Ranger Servers which hold
             some ranges of the data and Meta Servers which handle meta management
             works and oversee the Ranger Servers.
     •       A single Range Server may hold many continuous ranges, the Meta Server is
             responsible for farming them out in an intelligent way.
     •       If a single range fills up, the range is split in half(middle-split). The top half of the
             range remain in the current range and allocate a new range to save the lower half
             of the range, two ranges still locate at the current Ranger Server till the Ranger
             Server become overload, the Rebalancer will trigger Meta Server to reassign
             some ranges of the data locating at the overload Ranger Servers to other Range
             Servers that have enough space.


5 Mar 2012                                                                                               5
Key Features ( 1 )

  Scalability:
     • New ranger nodes can be added as storage service needs increase, the system
       automatically adapts to the new nodes while running the rebalance.
  Data writes:
     • When an application insert a data, writes can be cached at the Ranger server,
       periodically, the cache is flushed, for consistency, applications will force one data
       log to be flushed to the disk.
  SSTable Map:
     • This feature will reduce the data consistency control and improve the performance
       of data write, and we use a innovative method that it doesn't need any lock mutation
       for multi-writes to solve the conflicts between writes.
  Cache All Data:
     • In MaxTable we can cache all the metadata in the Metaserver and the hot data in ranger
       server.
  Re-balancing:
     • Using the tool to rebalance the tablets amongst Rangerservers. This is done to help
       with balancing the workload amongst nodes.
5 Mar 2012                                                                                      6
Key Features ( 2 )

  Index:
     • Maxtable will automatically build one unique index for each table by the first column.
  Recovery:
     • Maxtable implements the write ahead logging (WAL) to make sure this writing is
       safe. It can recover the crash server by replaying its log.
  Failover:
     • Metaserver maintains a heartbeat with each rangerserver, while the metaserver
       detects that the range server is unreachable, it will fail-over the data service locating
       on the crash rangerserver to another rangerserver and continue the service for this
       range.
  Metadata Consistency Checking (MCC):
     • Data checking tools to ensure the data consistency between on the metaserver and
       rangerserver.
  Backend Storage :
     • Maxtable’s backend storage can use distributed file system, currently it can use the
       KFS as its backend.

5 Mar 2012                                                                                         7
Key Features ( 3 )

  Range Query
     • It will support the range query by the index cloumn or the non-index column.
     • Support the AND and OR in the WHERE clause.
     • Split the work over all the range nodes in a cluster.
  Sharding
     • Automatic sharding support, distributing tablets over range servers.
     • Manually sharding support, it will scan all the tablet and split those tablets that have
       at least two blocks containing data. If customers want better scaling, they can do so
       manually by sharding tablets.
     • Generally, manually sharding will be followed by one rebalance operation that will
       rebalance the tablets because sharding may raise some new tablets.




5 Mar 2012                                                                                        8
Maxtable Query Language ( 1 )

  CREATE TABLE
     • Create one table.
         – create table table_name (column1 type1, ...,cloumnx type x)
         – create table blogdata (key varchar, num int, createtime varchar, comment varchar)
  INSERT
     • Insert one data row.
         – insert into table_name (column1_value,...columnx_value)
         – insert into blogdata (adidas, 1000, 2011-10-11, good)
  SELECT
     • Select one data by the default key column
         – select table_name (column1_value)
         – select blogdata (adidas)
  SELECTRANGE
     • Select data range by the range user specified
         – selectrange table_name (column1_value1, column1_value2)
         – selectrange blogdata (adidas, lining)

5 Mar 2012                                                                                     9
Maxtable Query Language ( 2 )

  SELECTWHERE
     • Select data by the WHERE clause
         – selectwhere table_name where columnX_name(columnX_value1, columnX_value2) and
           columnY_name(columnY_value1, columnY_value2)
  SELECTCOUNT
     • Get the # of rows by the WHERE clause
         – selectcount table_name where columnX_name(columnX_value1, columnX_value2) and
           columnY_name(columnY_value1, columnY_value2)
  SELECTSUM
     • Get the total values of some one column by the WHERE clause
         – selectsum (column_name) table_name where columnX_name(columnX_value1, columnX_value2)
           and columnY_name(columnY_value1, columnY_value2)
  DELETE
     • Delete one data
         – delete table_name (column1_value)
  DROP TABLE
     • Drop one table
         – drop table_name
5 Mar 2012                                                                                     10
Maxtable Query Language ( 3 )

 Following are the commands for the administrators.
  SHARDING
     • Sharding one table
         – sharding table_name
  MCC CHECKRANGER
     • Check the state of the rangers
         – mcc checkranger
  MCC CHECKTABLE
     • Checking the data of the table
         – mcc checktable table_name
  REBALANCE
     • Rebalancing the data load over the rangers
         – rebalance table_name




5 Mar 2012                                            11
Operation and Maintenance

  Platform requirement
     • http://code.google.com/p/maxtable/wiki/Platform
  How to build
     • http://code.google.com/p/maxtable/wiki/03HowToInstall
     • http://code.google.com/p/maxtable/wiki/05HowToBuildWithKFSFacer
  How to deploy
     • http://code.google.com/p/maxtable/wiki/04HowToDeploy
  How to use the client API
     • http://code.google.com/p/maxtable/wiki/08ClientSampleCode




5 Mar 2012                                                               12
Future Works

  Implement the master-slave in metaserver.
  Support secondary index
  Support the Join operation.
  Compaction & Compression




5 Mar 2012                                     13
Contact Information

  yingfei.xue@gmail.com




                  Thanks




5 Mar 2012                 14

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Bigtable
BigtableBigtable
Bigtable
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
 
Bigtable
BigtableBigtable
Bigtable
 
Less06 Storage
Less06 StorageLess06 Storage
Less06 Storage
 
Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to read
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Oracle 19c initialization parameters
Oracle 19c initialization parametersOracle 19c initialization parameters
Oracle 19c initialization parameters
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
Vertica
VerticaVertica
Vertica
 
DB2 and storage management
DB2 and storage managementDB2 and storage management
DB2 and storage management
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Big table
Big tableBig table
Big table
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
 
Solving the DB2 LUW Administration Dilemma
Solving the DB2 LUW Administration DilemmaSolving the DB2 LUW Administration Dilemma
Solving the DB2 LUW Administration Dilemma
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
DB2 LUW - Backup and Recovery
DB2 LUW - Backup and RecoveryDB2 LUW - Backup and Recovery
DB2 LUW - Backup and Recovery
 
Big table
Big tableBig table
Big table
 

Andere mochten auch

Federmanager Bologna - Personal Branding 8 marzo - Presidente Andrea Molza
Federmanager Bologna - Personal Branding 8 marzo - Presidente Andrea MolzaFedermanager Bologna - Personal Branding 8 marzo - Presidente Andrea Molza
Federmanager Bologna - Personal Branding 8 marzo - Presidente Andrea MolzaMarco Frullanti
 
Formulario de identificación
Formulario de identificaciónFormulario de identificación
Formulario de identificaciónNathalia Sanchez
 
April Webinar: Sample Balancing in 2012
April Webinar: Sample Balancing in 2012April Webinar: Sample Balancing in 2012
April Webinar: Sample Balancing in 2012Research Now
 
Electrophoresis and blotting techniques by asheesh pandey
Electrophoresis and blotting techniques by asheesh pandeyElectrophoresis and blotting techniques by asheesh pandey
Electrophoresis and blotting techniques by asheesh pandeyAsheesh Pandey
 
Opportunities for students in the New World of Cloud and Big Data
Opportunities for students in the New World of Cloud and Big DataOpportunities for students in the New World of Cloud and Big Data
Opportunities for students in the New World of Cloud and Big DataEMC
 
KNOWLEDGE MANAGEMENT - WHERE THEY ARE GONE WRONG?
KNOWLEDGE MANAGEMENT - WHERE  THEY ARE GONE WRONG?KNOWLEDGE MANAGEMENT - WHERE  THEY ARE GONE WRONG?
KNOWLEDGE MANAGEMENT - WHERE THEY ARE GONE WRONG?Dr. Raju M. Mathew
 
A Day of Social Media Insights
A Day of Social Media InsightsA Day of Social Media Insights
A Day of Social Media InsightsResearch Now
 
Tues wed reformation plays
Tues wed reformation playsTues wed reformation plays
Tues wed reformation playsTravis Klein
 
4 steps in Business Strategy for Start-ups
4 steps in Business Strategy for Start-ups4 steps in Business Strategy for Start-ups
4 steps in Business Strategy for Start-upsCostin Ciora
 
A Long Day Second Draft Script by Sophie McAvoy
A Long Day Second Draft Script by Sophie McAvoyA Long Day Second Draft Script by Sophie McAvoy
A Long Day Second Draft Script by Sophie McAvoysophiemcavoy1
 
Media Evaluation
Media EvaluationMedia Evaluation
Media Evaluationloousmith
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Fotonovel·la tutorial adrià, roger i gerard
Fotonovel·la tutorial adrià, roger i gerardFotonovel·la tutorial adrià, roger i gerard
Fotonovel·la tutorial adrià, roger i gerardmgonellgomez
 
EMC Hybrid Cloud for SAP - Enhanced Security and Compliance
EMC Hybrid Cloud for SAP - Enhanced Security and ComplianceEMC Hybrid Cloud for SAP - Enhanced Security and Compliance
EMC Hybrid Cloud for SAP - Enhanced Security and ComplianceEMC
 

Andere mochten auch (20)

Federmanager Bologna - Personal Branding 8 marzo - Presidente Andrea Molza
Federmanager Bologna - Personal Branding 8 marzo - Presidente Andrea MolzaFedermanager Bologna - Personal Branding 8 marzo - Presidente Andrea Molza
Federmanager Bologna - Personal Branding 8 marzo - Presidente Andrea Molza
 
Informe consulta general
Informe consulta generalInforme consulta general
Informe consulta general
 
Formulario de identificación
Formulario de identificaciónFormulario de identificación
Formulario de identificación
 
Formulario clientes
Formulario clientesFormulario clientes
Formulario clientes
 
April Webinar: Sample Balancing in 2012
April Webinar: Sample Balancing in 2012April Webinar: Sample Balancing in 2012
April Webinar: Sample Balancing in 2012
 
Electrophoresis and blotting techniques by asheesh pandey
Electrophoresis and blotting techniques by asheesh pandeyElectrophoresis and blotting techniques by asheesh pandey
Electrophoresis and blotting techniques by asheesh pandey
 
Opportunities for students in the New World of Cloud and Big Data
Opportunities for students in the New World of Cloud and Big DataOpportunities for students in the New World of Cloud and Big Data
Opportunities for students in the New World of Cloud and Big Data
 
Leadership
Leadership Leadership
Leadership
 
KNOWLEDGE MANAGEMENT - WHERE THEY ARE GONE WRONG?
KNOWLEDGE MANAGEMENT - WHERE  THEY ARE GONE WRONG?KNOWLEDGE MANAGEMENT - WHERE  THEY ARE GONE WRONG?
KNOWLEDGE MANAGEMENT - WHERE THEY ARE GONE WRONG?
 
A Day of Social Media Insights
A Day of Social Media InsightsA Day of Social Media Insights
A Day of Social Media Insights
 
Tues wed reformation plays
Tues wed reformation playsTues wed reformation plays
Tues wed reformation plays
 
4 steps in Business Strategy for Start-ups
4 steps in Business Strategy for Start-ups4 steps in Business Strategy for Start-ups
4 steps in Business Strategy for Start-ups
 
A Long Day Second Draft Script by Sophie McAvoy
A Long Day Second Draft Script by Sophie McAvoyA Long Day Second Draft Script by Sophie McAvoy
A Long Day Second Draft Script by Sophie McAvoy
 
Changes to SRAD
Changes to SRADChanges to SRAD
Changes to SRAD
 
Media Evaluation
Media EvaluationMedia Evaluation
Media Evaluation
 
Finance
FinanceFinance
Finance
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Fotonovel·la tutorial adrià, roger i gerard
Fotonovel·la tutorial adrià, roger i gerardFotonovel·la tutorial adrià, roger i gerard
Fotonovel·la tutorial adrià, roger i gerard
 
EMC Hybrid Cloud for SAP - Enhanced Security and Compliance
EMC Hybrid Cloud for SAP - Enhanced Security and ComplianceEMC Hybrid Cloud for SAP - Enhanced Security and Compliance
EMC Hybrid Cloud for SAP - Enhanced Security and Compliance
 
Glossary
GlossaryGlossary
Glossary
 

Ähnlich wie Introduction to Maxtable - A Scalable Distributed Database

A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial Na Zhu
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationVolodymyr Rovetskiy
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastSingleStore
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1Stefanie Zhao
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02Guillermo Julca
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptIftikhar70
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptsubbu998029
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architectureAjeet Singh
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Antonios Chatzipavlis
 

Ähnlich wie Introduction to Maxtable - A Scalable Distributed Database (20)

A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
 
AWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentationAWS (Amazon Redshift) presentation
AWS (Amazon Redshift) presentation
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
MemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks WebcastMemSQL 201: Advanced Tips and Tricks Webcast
MemSQL 201: Advanced Tips and Tricks Webcast
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Voldemort
VoldemortVoldemort
Voldemort
 
BDAS Shark study report 03 v1.1
BDAS Shark study report  03 v1.1BDAS Shark study report  03 v1.1
BDAS Shark study report 03 v1.1
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.ppt
 
database-stucture-and-space-managment.ppt
database-stucture-and-space-managment.pptdatabase-stucture-and-space-managment.ppt
database-stucture-and-space-managment.ppt
 
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
(BDT401) Amazon Redshift Deep Dive: Tuning and Best Practices
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Fudcon talk.ppt
Fudcon talk.pptFudcon talk.ppt
Fudcon talk.ppt
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Introduction to Maxtable - A Scalable Distributed Database

  • 1. Introduction to Maxtable Xue Yingfei http://code.google.com/p/maxtable/
  • 2. Agenda  Architecture Overview  Key Features  Maxtable Query Language (MQL)  Operation and Maintenance  Future Works 5 Mar 2012 2
  • 3. Architecture Overview ( 1 )  Maxtable consists of three components: 1. Metadata server: This provides the global namespace for all the tables in this system. It keeps the B-tree structure in memory. 2. Ranger server: It holds some ranges of the data and the default size of one range is about 100GB. 3. Client library: The client library is linked with applications. This enables applications to read/write data stored in Maxtable.  What components in the system and how they relate to one another. 5 Mar 2012 3
  • 4. Architecture Overview ( 2 )  How to store the table in the disk ? One SSTable = 4M data. One Tablet = 25K SSTable = One range = 100G. One Table = 42K Tablet. So, one table can contain more than 4PB data, and we can extend the size of block or use two tablet levels to save index data to contain more data. 5 Mar 2012 4
  • 5. Architecture Overview ( 3 )  How does maxtable work • Maxtable stores data in a table, sorted by a primary key(the first column). • There are two types for data in the table: varchar (string) and int (number). • Scaling is achieved by automatically splitting tables into contiguous ranges and assigning them up to different physical machines. • There are two types of servers in a Maxtable cluster, Ranger Servers which hold some ranges of the data and Meta Servers which handle meta management works and oversee the Ranger Servers. • A single Range Server may hold many continuous ranges, the Meta Server is responsible for farming them out in an intelligent way. • If a single range fills up, the range is split in half(middle-split). The top half of the range remain in the current range and allocate a new range to save the lower half of the range, two ranges still locate at the current Ranger Server till the Ranger Server become overload, the Rebalancer will trigger Meta Server to reassign some ranges of the data locating at the overload Ranger Servers to other Range Servers that have enough space. 5 Mar 2012 5
  • 6. Key Features ( 1 )  Scalability: • New ranger nodes can be added as storage service needs increase, the system automatically adapts to the new nodes while running the rebalance.  Data writes: • When an application insert a data, writes can be cached at the Ranger server, periodically, the cache is flushed, for consistency, applications will force one data log to be flushed to the disk.  SSTable Map: • This feature will reduce the data consistency control and improve the performance of data write, and we use a innovative method that it doesn't need any lock mutation for multi-writes to solve the conflicts between writes.  Cache All Data: • In MaxTable we can cache all the metadata in the Metaserver and the hot data in ranger server.  Re-balancing: • Using the tool to rebalance the tablets amongst Rangerservers. This is done to help with balancing the workload amongst nodes. 5 Mar 2012 6
  • 7. Key Features ( 2 )  Index: • Maxtable will automatically build one unique index for each table by the first column.  Recovery: • Maxtable implements the write ahead logging (WAL) to make sure this writing is safe. It can recover the crash server by replaying its log.  Failover: • Metaserver maintains a heartbeat with each rangerserver, while the metaserver detects that the range server is unreachable, it will fail-over the data service locating on the crash rangerserver to another rangerserver and continue the service for this range.  Metadata Consistency Checking (MCC): • Data checking tools to ensure the data consistency between on the metaserver and rangerserver.  Backend Storage : • Maxtable’s backend storage can use distributed file system, currently it can use the KFS as its backend. 5 Mar 2012 7
  • 8. Key Features ( 3 )  Range Query • It will support the range query by the index cloumn or the non-index column. • Support the AND and OR in the WHERE clause. • Split the work over all the range nodes in a cluster.  Sharding • Automatic sharding support, distributing tablets over range servers. • Manually sharding support, it will scan all the tablet and split those tablets that have at least two blocks containing data. If customers want better scaling, they can do so manually by sharding tablets. • Generally, manually sharding will be followed by one rebalance operation that will rebalance the tablets because sharding may raise some new tablets. 5 Mar 2012 8
  • 9. Maxtable Query Language ( 1 )  CREATE TABLE • Create one table. – create table table_name (column1 type1, ...,cloumnx type x) – create table blogdata (key varchar, num int, createtime varchar, comment varchar)  INSERT • Insert one data row. – insert into table_name (column1_value,...columnx_value) – insert into blogdata (adidas, 1000, 2011-10-11, good)  SELECT • Select one data by the default key column – select table_name (column1_value) – select blogdata (adidas)  SELECTRANGE • Select data range by the range user specified – selectrange table_name (column1_value1, column1_value2) – selectrange blogdata (adidas, lining) 5 Mar 2012 9
  • 10. Maxtable Query Language ( 2 )  SELECTWHERE • Select data by the WHERE clause – selectwhere table_name where columnX_name(columnX_value1, columnX_value2) and columnY_name(columnY_value1, columnY_value2)  SELECTCOUNT • Get the # of rows by the WHERE clause – selectcount table_name where columnX_name(columnX_value1, columnX_value2) and columnY_name(columnY_value1, columnY_value2)  SELECTSUM • Get the total values of some one column by the WHERE clause – selectsum (column_name) table_name where columnX_name(columnX_value1, columnX_value2) and columnY_name(columnY_value1, columnY_value2)  DELETE • Delete one data – delete table_name (column1_value)  DROP TABLE • Drop one table – drop table_name 5 Mar 2012 10
  • 11. Maxtable Query Language ( 3 ) Following are the commands for the administrators.  SHARDING • Sharding one table – sharding table_name  MCC CHECKRANGER • Check the state of the rangers – mcc checkranger  MCC CHECKTABLE • Checking the data of the table – mcc checktable table_name  REBALANCE • Rebalancing the data load over the rangers – rebalance table_name 5 Mar 2012 11
  • 12. Operation and Maintenance  Platform requirement • http://code.google.com/p/maxtable/wiki/Platform  How to build • http://code.google.com/p/maxtable/wiki/03HowToInstall • http://code.google.com/p/maxtable/wiki/05HowToBuildWithKFSFacer  How to deploy • http://code.google.com/p/maxtable/wiki/04HowToDeploy  How to use the client API • http://code.google.com/p/maxtable/wiki/08ClientSampleCode 5 Mar 2012 12
  • 13. Future Works  Implement the master-slave in metaserver.  Support secondary index  Support the Join operation.  Compaction & Compression 5 Mar 2012 13
  • 14. Contact Information  yingfei.xue@gmail.com Thanks 5 Mar 2012 14