Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

MySql to HBase in 5 Steps

2.922 Aufrufe

Veröffentlicht am

Converting MySql or Oracle databases to Apache HBase with on-line examples using the popular Wordnet dictionary

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

MySql to HBase in 5 Steps

  1. 1. CloudGraph ® MySql to HBase in 5 Steps Converting MySql or Oracle databases to Apache HBase™ with on-line examples using the popular Wordnet® dictionary Scott Cinnamond – TerraMeta Software Inc. http://cloudgraph.org
  2. 2. What is Wordnet ? ® • Large complex lexical (MySql) database of English. • Nouns, verbs, adjectives and adverbs grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • Synsets are interlinked by means of conceptual-semantic and lexical relations.
  3. 3. HBase Conversion Steps http://wordnet.cloudgraph.org 1) Model Creation: reverse engineer Wordnet DB into UML® 2) Code Generation: provision persistence and query-DSL java code 3) HBase™ Table Mapping: map data graphs and row keys to table(s) 4) Data Migration: MySql to HBase 5) Services / App Creation: build services, web app
  4. 4. 1.) Model Creation Reverse engineer Wordnet DB into PlasmaSDO™ UML® Model • Capture entities, properties, data types, associations, enumerations, comments as UML • Why UML? Popular standards-based format. Editable, viewable using standard tools. Supports enterprise governance processes • How? Maven build with plasma-maven-plugin RDB tool (goal:RDB, action:reverse, dialect:mysql) • Download working example at https://github.com/cloudgraph/wordnet
  5. 5. Generated Wordnet Model (core subset of 30 total entities and enumerations)
  6. 6. 2.) Code Generation Provision SDO persistence and query DSL java code • Generate Java API based on Wordnet UML Model • Why? Use across RDB, HBase, other CloudGraph Services. Compile time checking for queries, all persistence logic • How? Maven build with plasma-maven-plugin SDO and DSL tools • See generated API Javadocs on-line at http://wordnet.cloudgraph.org
  7. 7. 3.) HBase™ Table Mapping Map data graphs and row keys to HBase™ table(s) • Configure delimited, hashed, salted, formatted, composite row keys with (xpath) paths into target data graphs • Map data graph roots to HBase tables • Why? Automates row-key creation via data extraction processing from anywhere in your data graphs • How? CloudGraph Configuration XML. See https://github.com/cloudgraph/wordnet
  8. 8. 4.) Data Migration MySql to HBase • Create RDB-to-HBase standalone migration app using generated persistence and DSL query API incrementally call CloudGraph HBase and RDB services • Why? Wordnet data is large and highly connected, so must be incrementally extracted/inserted and linked
  9. 9. 5.) Services / App Creation Build services, web app • Build simple pojo services using persistence and DSL query API • Encapsulate Wordnet business logic • Add adapter/wrapper structures • Call services called from web-app
  10. 10. Web App http://wordnet.cloudgraph.org • Auto-complete field triggers CloudGraph HBase to use the HBase fuzzy row filter API • Find button returns all semantic and lexical relations for the selected word, including descriptions and example sentences • Resulting relation graphs typically contain more than 100 nodes and return in less than 200 milliseconds
  11. 11. Conclusions • Complex, highly recursive RDB models can be easily converted and leveraged in HBase and future CloudGraph services • Large lexical data graphs can be returned in single query • Data migration difficult given complex recursive model
  12. 12. Resources • Download the complete CloudGraph Wordnet example: https://github.com/cloudgraph/wordnet • Run the example online: http://wordnet.cloudgraph.org • Project details, contact information: http://cloudgraph.org • Beta Source Repo: https://github.com/terrameta/cloudgraph • Production Source Repo (under construction): https://github.com/cloudgraph
  13. 13. Status / Legal • • • Project Status – CloudGraph ® is currently under private beta testing Licensing – CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed under version 2 of the GNU General Public License Trademarks – WordNet ® is a registered trademark of Princeton University – Apache HBase™ is a trademark of Apache Software Foundation – CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta Software Inc.