SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
HBase schema design
    case studies

    Organized by Evan/Qingyan Liu
     qingyan123 (AT) gmail.com
              2009.7.13
The Tao is ...




De-normalization
Case 1: locations
●
    China
    ●
        Beijing
    ●
        Shanghai
    ●
        Guangzhou
    ●
        Shandong
        –   Jinan
        –   Qingdao
    ●
        Sichuan
        –   Chengdu
In RDBMS
loc_id PK   loc_name      parent_id   child_id
1           China                     2,3,4,5
2           Beijing       1
3           Shanghai      1
4           Guangzhou     1
5           Shandong      1           7,8
6           Sichuan       1           9
7           Jinan         1,5
8           Qingdao       1,5
9           Chengdu       1,6
In HBase
row                      column families
           name:         parent:           child:
<loc_id>                 parent:<loc_id>   child:<loc_id>
1          China                           child:1=state
                                           child:2=state
                                           child:3=state
                                           child:4=state
                                           child:5=state
                                           child:6=state
5          Shangdong     parent:1=nation child:7=city
                                         child:8=city
8          Qingdao       parent:1=nation
                         parent:5=state
Case 2: student-course
●
    Student
    ●
        1 S ~ many C
●
    Course
    ●
        1 C ~ many S
In RDBMS


Students                Courses
id PK      SCs          id PK
name       student_id   title
sex        course_id    introduction
age        type         teacher_id
In HBase
row                           column families
               info:               course:
<student_id>   info:name           course:<course_id>=type
               info:sex
               info:age

row                           column families
               info:               student:
<course_id>    info:title          student:<student_id>=type
               info:introduction
               info:teacher_id
Case 3: user-action
●
    users performs actions now and then
    ●
        store every events
    ●
        query recent events of a user
In RDBMS
                      Actions
                      id PK
                      user_id IDX
                      name
                      time

●   For fast SELECT id, user_id, name, time FROM Action
    WHERE user_id=XXX ORDER BY time DESC LIMIT 10
    OFFSET 20, we must create index on user_id.
    However, indices will greatly decrease insert speed
    for index-rebuild.
In HBase
row                                   column families
                              name:
<user><Long.MAX_VALUE -
System.currentTimeMillis()>
<event id>
Case 4: user-friends
●
    1 user has 1+ friends
●
    will lookup all friends of a user
In RDBMS

        Users
                           Friendships
        id IDX
                           user_id IDX
        name
                           friend_id
        sex
                           type
        age
●
    SELECT * FROM friendships WHERE
    user_id='XXX';
In HBase

row                           column families
                 info:            friend:
<user_id>        info:name        friend:<user_id>=type
                 info:sex
                 info:age

 ●
      actually, it is a graph can be represented by a
      sparse matrix.
 ●
      then you can use M/R to find sth interesting.
      e.g. the shortest path from user A to user B.
Case 5: access log
●
    each log line contains time, ip, domain, url,
    referer, browser_cookie, login_id, etc
●
    will be analyzed every 5 minutes, every hour,
    daily, weekly, and monthly
In RDBMS

Accesslog
time
ip IDX
domain
url
referer
browser_cookie IDX
login_id IDX
In HBase

row                                                  column families
                                          http:                     user
<time><INC_COUNTER>                       http:ip                   user:browser_
                                          http:domain               cookie
                                          http:url                  user:login_id
                                          http:referer




INC_COUNTER is used to distinguish the adjacent same time values.

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereEugene Hanikblum
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)Myungjin Lee
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...HostedbyConfluent
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Philip Fisher-Ogden
 
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 bitnineglobal
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)Emil Eifrem
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesDataStax
 

Was ist angesagt? (20)

Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
NoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and WhereNoSQL Graph Databases - Why, When and Where
NoSQL Graph Databases - Why, When and Where
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)The Semantic Web #9 - Web Ontology Language (OWL)
The Semantic Web #9 - Web Ontology Language (OWL)
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례 Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
Graph Database Meetup in Korea #4. 그래프 이론을 적용한 그래프 데이터베이스 활용 사례
 
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
A NOSQL Overview And The Benefits Of Graph Databases (nosql east 2009)
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 

Andere mochten auch

HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real WorldCloudera, Inc.
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915Dan Han
 
Osc2012 spring HBase Report
Osc2012 spring HBase ReportOsc2012 spring HBase Report
Osc2012 spring HBase ReportSeiichiro Ishida
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Unlocking Data for Analysts & Developers
Unlocking Data for Analysts & DevelopersUnlocking Data for Analysts & Developers
Unlocking Data for Analysts & DevelopersSimone Stanich
 
HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General SessionCloudera, Inc.
 
20分でわかるHBase
20分でわかるHBase20分でわかるHBase
20分でわかるHBaseSho Shimauchi
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseDataWorks Summit
 

Andere mochten auch (20)

Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915A 3 dimensional data model in hbase for large time-series dataset-20120915
A 3 dimensional data model in hbase for large time-series dataset-20120915
 
Osc2012 spring HBase Report
Osc2012 spring HBase ReportOsc2012 spring HBase Report
Osc2012 spring HBase Report
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Cassandra v0.6-siryou
Cassandra v0.6-siryouCassandra v0.6-siryou
Cassandra v0.6-siryou
 
Hog user manual v3
Hog user manual v3Hog user manual v3
Hog user manual v3
 
Unlocking Data for Analysts & Developers
Unlocking Data for Analysts & DevelopersUnlocking Data for Analysts & Developers
Unlocking Data for Analysts & Developers
 
Spark!
Spark!Spark!
Spark!
 
HBaseCon 2013: General Session
HBaseCon 2013: General SessionHBaseCon 2013: General Session
HBaseCon 2013: General Session
 
20分でわかるHBase
20分でわかるHBase20分でわかるHBase
20分でわかるHBase
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 

Kürzlich hochgeladen

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Kürzlich hochgeladen (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

20090713 Hbase Schema Design Case Studies

  • 1. HBase schema design case studies Organized by Evan/Qingyan Liu qingyan123 (AT) gmail.com 2009.7.13
  • 2. The Tao is ... De-normalization
  • 3. Case 1: locations ● China ● Beijing ● Shanghai ● Guangzhou ● Shandong – Jinan – Qingdao ● Sichuan – Chengdu
  • 4. In RDBMS loc_id PK loc_name parent_id child_id 1 China 2,3,4,5 2 Beijing 1 3 Shanghai 1 4 Guangzhou 1 5 Shandong 1 7,8 6 Sichuan 1 9 7 Jinan 1,5 8 Qingdao 1,5 9 Chengdu 1,6
  • 5. In HBase row column families name: parent: child: <loc_id> parent:<loc_id> child:<loc_id> 1 China child:1=state child:2=state child:3=state child:4=state child:5=state child:6=state 5 Shangdong parent:1=nation child:7=city child:8=city 8 Qingdao parent:1=nation parent:5=state
  • 6. Case 2: student-course ● Student ● 1 S ~ many C ● Course ● 1 C ~ many S
  • 7. In RDBMS Students Courses id PK SCs id PK name student_id title sex course_id introduction age type teacher_id
  • 8. In HBase row column families info: course: <student_id> info:name course:<course_id>=type info:sex info:age row column families info: student: <course_id> info:title student:<student_id>=type info:introduction info:teacher_id
  • 9. Case 3: user-action ● users performs actions now and then ● store every events ● query recent events of a user
  • 10. In RDBMS Actions id PK user_id IDX name time ● For fast SELECT id, user_id, name, time FROM Action WHERE user_id=XXX ORDER BY time DESC LIMIT 10 OFFSET 20, we must create index on user_id. However, indices will greatly decrease insert speed for index-rebuild.
  • 11. In HBase row column families name: <user><Long.MAX_VALUE - System.currentTimeMillis()> <event id>
  • 12. Case 4: user-friends ● 1 user has 1+ friends ● will lookup all friends of a user
  • 13. In RDBMS Users Friendships id IDX user_id IDX name friend_id sex type age ● SELECT * FROM friendships WHERE user_id='XXX';
  • 14. In HBase row column families info: friend: <user_id> info:name friend:<user_id>=type info:sex info:age ● actually, it is a graph can be represented by a sparse matrix. ● then you can use M/R to find sth interesting. e.g. the shortest path from user A to user B.
  • 15. Case 5: access log ● each log line contains time, ip, domain, url, referer, browser_cookie, login_id, etc ● will be analyzed every 5 minutes, every hour, daily, weekly, and monthly
  • 17. In HBase row column families http: user <time><INC_COUNTER> http:ip user:browser_ http:domain cookie http:url user:login_id http:referer INC_COUNTER is used to distinguish the adjacent same time values.