MySql to HBase in 5 Steps

Converting MySql or Oracle databases to Apache HBase with on-line examples using the popular Wordnet dictionary

  1. 1. CloudGraph ® MySql to HBase in 5 Steps Converting MySql or Oracle databases to Apache HBase™ with on-line examples using the popular Wordnet® dictionary Scott Cinnamond – TerraMeta Software Inc. http://cloudgraph.org
  2. 2. What is Wordnet ? ® • Large complex lexical (MySql) database of English. • Nouns, verbs, adjectives and adverbs grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • Synsets are interlinked by means of conceptual-semantic and lexical relations.
  3. 3. HBase Conversion Steps http://wordnet.cloudgraph.org 1) Model Creation: reverse engineer Wordnet DB into UML® 2) Code Generation: provision persistence and query-DSL java code 3) HBase™ Table Mapping: map data graphs and row keys to table(s) 4) Data Migration: MySql to HBase 5) Services / App Creation: build services, web app
  4. 4. 1.) Model Creation Reverse engineer Wordnet DB into PlasmaSDO™ UML® Model • Capture entities, properties, data types, associations, enumerations, comments as UML • Why UML? Popular standards-based format. Editable, viewable using standard tools. Supports enterprise governance processes • How? Maven build with plasma-maven-plugin RDB tool (goal:RDB, action:reverse, dialect:mysql) • Download working example at https://github.com/cloudgraph/wordnet
  5. 5. Generated Wordnet Model (core subset of 30 total entities and enumerations)
  6. 6. 2.) Code Generation Provision SDO persistence and query DSL java code • Generate Java API based on Wordnet UML Model • Why? Use across RDB, HBase, other CloudGraph Services. Compile time checking for queries, all persistence logic • How? Maven build with plasma-maven-plugin SDO and DSL tools • See generated API Javadocs on-line at http://wordnet.cloudgraph.org
  7. 7. 3.) HBase™ Table Mapping Map data graphs and row keys to HBase™ table(s) • Configure delimited, hashed, salted, formatted, composite row keys with (xpath) paths into target data graphs • Map data graph roots to HBase tables • Why? Automates row-key creation via data extraction processing from anywhere in your data graphs • How? CloudGraph Configuration XML. See https://github.com/cloudgraph/wordnet
  8. 8. 4.) Data Migration MySql to HBase • Create RDB-to-HBase standalone migration app using generated persistence and DSL query API incrementally call CloudGraph HBase and RDB services • Why? Wordnet data is large and highly connected, so must be incrementally extracted/inserted and linked
  9. 9. 5.) Services / App Creation Build services, web app • Build simple pojo services using persistence and DSL query API • Encapsulate Wordnet business logic • Add adapter/wrapper structures • Call services called from web-app
  10. 10. Web App http://wordnet.cloudgraph.org • Auto-complete field triggers CloudGraph HBase to use the HBase fuzzy row filter API • Find button returns all semantic and lexical relations for the selected word, including descriptions and example sentences • Resulting relation graphs typically contain more than 100 nodes and return in less than 200 milliseconds
  11. 11. Conclusions • Complex, highly recursive RDB models can be easily converted and leveraged in HBase and future CloudGraph services • Large lexical data graphs can be returned in single query • Data migration difficult given complex recursive model
  12. 12. Resources • Download the complete CloudGraph Wordnet example: https://github.com/cloudgraph/wordnet • Run the example online: http://wordnet.cloudgraph.org • Project details, contact information: http://cloudgraph.org • Beta Source Repo: https://github.com/terrameta/cloudgraph • Production Source Repo (under construction): https://github.com/cloudgraph
  13. 13. Status / Legal • • • Project Status – CloudGraph ® is currently under private beta testing Licensing – CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed under version 2 of the GNU General Public License Trademarks – WordNet ® is a registered trademark of Princeton University – Apache HBase™ is a trademark of Apache Software Foundation – CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta Software Inc.