Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity
CNIC Information System with Pakdata Cf In Pakistan
Managing "Big Data" Application Complexity with CloudGraph
1. -Analysis and solutions for problems
faced by HBase™ and other
columnar data store client
applications under the ever
increasing demand for domain model
complexity-
Managing “Big Data” Application
Complexity using CloudGraph®
Scott Cinnamond, TerraMeta Software Inc.
http://cloudgraph.org
2. (for columnar data store client applications)
Complexity
Complexity Increases With
Added Data Model Entities
#Model Entities / Classes
3. Why More App Complexity?
(with Added Data Model
Entities)
1. Column Mapping Difficult
2. Composite Row Key Mapping, Hashing,
Salting and Formatting
3. Persistence Code Development,
Refactoring and Maintenance
4. Typical Column Mapping
Strategies
• Hard Coded Names Embedded in Source Code
– Not good
• Column Names in Java Constants File(s)
– Better, but still really hard coded
– Feasible with 5-10 entities, 50 attributes
– With 500-1000 entities and 5000+ attributes? Not
maintainable
• Custom XML Configuration
–
–
–
–
Create a “meta model” using, say XML Schema and JAXB
Construct unique names and refer to them in source
Better but application specific ”one off”
Does not solve “state” management challenges
5. CloudGraph Column Mapping
A Standards Based Approach Using SDO and UML
CloudGraph
Statefull Column
Key Factories
Marshalling
Row Key
Mapping
Entity ID
Mapping
Sequence
Management
Data Graph “State”
6. Great, Still How Do We Keep Column
Names Entirely Out Of CRUD Source
Code?
Create | Update | Delete:
CloudGraph SDO API
(Service Data Objects)
Read (Query):
CloudGraph Query DSL
(Domain Specific
Language)
7. CloudGraph SDO
Your complex domain model as a
(create | update | delete) API
•
•
•
•
•
Drives all Column Mapping Transparently
Granular Control over Data Graph Edits
Convenient “Create Entity” Factory Methods
Change Tracking Including History
Rich Built In Data Types
• 100% Compile Time Checking
• Supports Multiple Inheritance Models
• Currently Uses PlasmaSDO™
– See http://plasma-sdo.org
8. CloudGraph SDO API Example
Uses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
9. CloudGraph Query DSL
Your complex domain model as a query API
• Drives all Column Mapping Transparently
• Intuitive Almost “Fluent” English
Appearance
• Logical Entity, Attribute Names Generated
into API
• 100% Compile Time Checking
• Currently Uses PlasmaQuery®
– See http://plasma-query.org
10. CloudGraph Query DSL Example
Uses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
11. Why More Complexity?
2.) Composite Row Key Mapping,
Hashing and Formatting
• More Model Entities:
Larger data graphs
More composite row key fields so can find graphs
How to reliably map “deep” into graphs
• Row Key Field Hashing and Formatting
– Critical for HBase partial-key scan API
– Many data type specific idiosyncrasies
12. CloudGraph HBase Composite Row Keys
A Configuration Driven Approach using SDO XPath
CloudGraph
Composite Row
Keys
Hierarchica
l
Row Filters
Fuzzy Row
Filter
Partial Key
Assembly
Scan Support
13. Why More Complexity?
3.) Persistence Code Development,
Refactoring and Maintenance
Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines
“Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines
*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/
**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document
***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xmlcml.org
14. CloudGraph Code Generation
A contract-first approach in 4 steps
1. Leverage Existing or Create UML Model(s)
1. Can be automatically reverse engineered
from existing RDBMS Schema
2. Map Repository Namespaces to Service
Configurations
3. Define and Map Row Keys To Data Graphs
4. Add CloudGraph and Plasma Maven
Artifacts and Generate Code
15. Resources
• Exchange Model Examples
– https://github.com/cloudgraph/cml
– https://github.com/cloudgraph/bioxsd
– https://github.com/cloudgraph/hl7
• End To End Examples
– https://github.com/cloudgraph/wordnet
– http://wordnet.cloudgraph.org