SlideShare ist ein Scribd-Unternehmen logo
1 von 17
-Analysis and solutions for problems
faced by HBase™ and other
columnar data store client
applications under the ever
increasing demand for domain model
complexity-

Managing “Big Data” Application
Complexity using CloudGraph®
Scott Cinnamond, TerraMeta Software Inc.
http://cloudgraph.org
(for columnar data store client applications)

Complexity

Complexity Increases With
Added Data Model Entities

#Model Entities / Classes
Why More App Complexity?
(with Added Data Model
Entities)

1. Column Mapping Difficult
2. Composite Row Key Mapping, Hashing,
Salting and Formatting
3. Persistence Code Development,
Refactoring and Maintenance
Typical Column Mapping
Strategies
• Hard Coded Names Embedded in Source Code
– Not good 

• Column Names in Java Constants File(s)
– Better, but still really hard coded
– Feasible with 5-10 entities, 50 attributes
– With 500-1000 entities and 5000+ attributes? Not
maintainable

• Custom XML Configuration
–
–
–
–

Create a “meta model” using, say XML Schema and JAXB
Construct unique names and refer to them in source
Better but application specific ”one off”
Does not solve “state” management challenges
CloudGraph Column Mapping
A Standards Based Approach Using SDO and UML

CloudGraph
Statefull Column
Key Factories

Marshalling

Row Key
Mapping

Entity ID
Mapping

Sequence
Management

Data Graph “State”
Great, Still How Do We Keep Column
Names Entirely Out Of CRUD Source
Code?
Create | Update | Delete:

CloudGraph SDO API
(Service Data Objects)

Read (Query):

CloudGraph Query DSL
(Domain Specific
Language)
CloudGraph SDO
Your complex domain model as a
(create | update | delete) API
•
•
•
•
•

Drives all Column Mapping Transparently
Granular Control over Data Graph Edits
Convenient “Create Entity” Factory Methods
Change Tracking Including History
Rich Built In Data Types
• 100% Compile Time Checking
• Supports Multiple Inheritance Models
• Currently Uses PlasmaSDO™
– See http://plasma-sdo.org
CloudGraph SDO API Example
Uses Chemical Modelling Language (CML) 2.4
https://github.com/cloudgraph/cml
CloudGraph Query DSL
Your complex domain model as a query API
• Drives all Column Mapping Transparently
• Intuitive Almost “Fluent” English
Appearance
• Logical Entity, Attribute Names Generated
into API
• 100% Compile Time Checking
• Currently Uses PlasmaQuery®
– See http://plasma-query.org
CloudGraph Query DSL Example
Uses Chemical Modelling Language (CML) 2.4

https://github.com/cloudgraph/cml
Why More Complexity?
2.) Composite Row Key Mapping,
Hashing and Formatting
• More Model Entities:
 Larger data graphs
 More composite row key fields so can find graphs
 How to reliably map “deep” into graphs

• Row Key Field Hashing and Formatting
– Critical for HBase partial-key scan API
– Many data type specific idiosyncrasies
CloudGraph HBase Composite Row Keys
A Configuration Driven Approach using SDO XPath

CloudGraph
Composite Row
Keys

Hierarchica
l
Row Filters

Fuzzy Row
Filter

Partial Key
Assembly

Scan Support
Why More Complexity?
3.) Persistence Code Development,
Refactoring and Maintenance

Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines
“Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines
*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/
**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document
***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xmlcml.org
CloudGraph Code Generation
A contract-first approach in 4 steps
1. Leverage Existing or Create UML Model(s)
1. Can be automatically reverse engineered
from existing RDBMS Schema

2. Map Repository Namespaces to Service
Configurations
3. Define and Map Row Keys To Data Graphs
4. Add CloudGraph and Plasma Maven
Artifacts and Generate Code
Resources
• Exchange Model Examples
– https://github.com/cloudgraph/cml
– https://github.com/cloudgraph/bioxsd
– https://github.com/cloudgraph/hl7

• End To End Examples
– https://github.com/cloudgraph/wordnet
– http://wordnet.cloudgraph.org
Status/Legal
• Project Status
– CloudGraph® is currently in private beta testing
– Other services for Cassandra, MongoDB and others are
under analysis
– See http://cloudgraph.org for contact info and other details

• Licensing
– CloudGraph® 0.5.5 Community Edition (CE) is open source
licensed under version 2 of the GNU General Public License

• Trademarks
– CloudGraph® is a registered trademark of TerraMeta
Software LLC
– Java™ is a trademark of Oracle Corporation
– HBase™ is a trademark of Apache Software Foundation
Copyright © TerraMeta Software, Inc – 2012,2013 – All Rights Reserved
References
• BIOXSD – http://bioxsd.org
• Chemical Markup Language (CML) – http://xmlcml.org
• Health Level 7 (HL7) – http://hl7.org
• Apache HBase™ – http://hbase.apache.org
• Apache Cassandra –
http://cassandra.apache.org
• MongoDB - http://www.mongodb.org
• PlasmaSDO™ – http://plasma-sdo.org,
http://search.maven.org/#search%7Cga%7C1%
7Ca%3A%22plasma-sdo%22

Weitere ähnliche Inhalte

Was ist angesagt?

Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent
 
gobblin-meetup-yarn
gobblin-meetup-yarngobblin-meetup-yarn
gobblin-meetup-yarnYinan Li
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowDatabricks
 
How web works and browser works ? (behind the scenes)
How web works and browser works ? (behind the scenes)How web works and browser works ? (behind the scenes)
How web works and browser works ? (behind the scenes)Vibhor Grover
 
Object- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkObject- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkESUG
 
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionData Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionFormulatedby
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBArangoDB Database
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformNavneet Gupta
 
GraphQL - A love story
GraphQL -  A love storyGraphQL -  A love story
GraphQL - A love storybwullems
 
An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseMax Neunhöffer
 
TypeSafe NoSQL @ TopConf 2012
TypeSafe NoSQL @ TopConf 2012TypeSafe NoSQL @ TopConf 2012
TypeSafe NoSQL @ TopConf 2012Maciek Próchniak
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...Databricks
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streamsconfluent
 

Was ist angesagt? (20)

Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...
 
gobblin-meetup-yarn
gobblin-meetup-yarngobblin-meetup-yarn
gobblin-meetup-yarn
 
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflowContinuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
 
How web works and browser works ? (behind the scenes)
How web works and browser works ? (behind the scenes)How web works and browser works ? (behind the scenes)
How web works and browser works ? (behind the scenes)
 
Object- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkObject- Relational Persistence in Smalltalk
Object- Relational Persistence in Smalltalk
 
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to ProductionData Science Salon: A Journey of Deploying a Data Science Engine to Production
Data Science Salon: A Journey of Deploying a Data Science Engine to Production
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
 
Big Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data PlatformBig Data Ingestion @ Flipkart Data Platform
Big Data Ingestion @ Flipkart Data Platform
 
GraphQL - A love story
GraphQL -  A love storyGraphQL -  A love story
GraphQL - A love story
 
An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model Database
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
TypeSafe NoSQL @ TopConf 2012
TypeSafe NoSQL @ TopConf 2012TypeSafe NoSQL @ TopConf 2012
TypeSafe NoSQL @ TopConf 2012
 
MediaWiki for ALM
MediaWiki for ALMMediaWiki for ALM
MediaWiki for ALM
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
 
HTAP Queries
HTAP QueriesHTAP Queries
HTAP Queries
 
Streaming all over the world Real life use cases with Kafka Streams
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
 

Andere mochten auch

Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduceScott Cinnamond
 
EA Talk on Managing Complexity
EA Talk on Managing ComplexityEA Talk on Managing Complexity
EA Talk on Managing ComplexityRichard Veryard
 
Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...
Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...
Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...Compuware APM
 
Balancing model performance and complexity in real-world analytics applications
Balancing model performance and complexity in real-world analytics applicationsBalancing model performance and complexity in real-world analytics applications
Balancing model performance and complexity in real-world analytics applicationsState Street
 
Data Management Dilemma - SIZE vs COMPLEXITY
Data Management Dilemma - SIZE vs COMPLEXITYData Management Dilemma - SIZE vs COMPLEXITY
Data Management Dilemma - SIZE vs COMPLEXITYReinhold Thurner
 
Complex User Interfaces Don't Need to Be...Complex
Complex User Interfaces Don't Need to Be...ComplexComplex User Interfaces Don't Need to Be...Complex
Complex User Interfaces Don't Need to Be...ComplexGfK User Centric
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 StepsScott Cinnamond
 
Effort estimation for web applications
Effort estimation for web applicationsEffort estimation for web applications
Effort estimation for web applicationsNagaraja Gundappa
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Traian Rebedea
 
The Future of Applications: Three Strategies for the High-velocity, Software-...
The Future of Applications: Three Strategies for the High-velocity, Software-...The Future of Applications: Three Strategies for the High-velocity, Software-...
The Future of Applications: Three Strategies for the High-velocity, Software-...Accenture Technology
 
Developing applications with a microservice architecture (SVforum, microservi...
Developing applications with a microservice architecture (SVforum, microservi...Developing applications with a microservice architecture (SVforum, microservi...
Developing applications with a microservice architecture (SVforum, microservi...Chris Richardson
 

Andere mochten auch (12)

Rich Data Graphs for MapReduce
Rich Data Graphs for MapReduceRich Data Graphs for MapReduce
Rich Data Graphs for MapReduce
 
EA Talk on Managing Complexity
EA Talk on Managing ComplexityEA Talk on Managing Complexity
EA Talk on Managing Complexity
 
Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...
Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...
Managing Complexity Across Today’s Application Delivery Chain:Six key indicat...
 
Balancing model performance and complexity in real-world analytics applications
Balancing model performance and complexity in real-world analytics applicationsBalancing model performance and complexity in real-world analytics applications
Balancing model performance and complexity in real-world analytics applications
 
Preview of guideline
Preview of guidelinePreview of guideline
Preview of guideline
 
Data Management Dilemma - SIZE vs COMPLEXITY
Data Management Dilemma - SIZE vs COMPLEXITYData Management Dilemma - SIZE vs COMPLEXITY
Data Management Dilemma - SIZE vs COMPLEXITY
 
Complex User Interfaces Don't Need to Be...Complex
Complex User Interfaces Don't Need to Be...ComplexComplex User Interfaces Don't Need to Be...Complex
Complex User Interfaces Don't Need to Be...Complex
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
Effort estimation for web applications
Effort estimation for web applicationsEffort estimation for web applications
Effort estimation for web applications
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8
 
The Future of Applications: Three Strategies for the High-velocity, Software-...
The Future of Applications: Three Strategies for the High-velocity, Software-...The Future of Applications: Three Strategies for the High-velocity, Software-...
The Future of Applications: Three Strategies for the High-velocity, Software-...
 
Developing applications with a microservice architecture (SVforum, microservi...
Developing applications with a microservice architecture (SVforum, microservi...Developing applications with a microservice architecture (SVforum, microservi...
Developing applications with a microservice architecture (SVforum, microservi...
 

Ähnlich wie Managing "Big Data" Application Complexity with CloudGraph

Spring data presentation
Spring data presentationSpring data presentation
Spring data presentationOleksii Usyk
 
Onion Architecture with S#arp
Onion Architecture with S#arpOnion Architecture with S#arp
Onion Architecture with S#arpGary Pedretti
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?Dan Sullivan, Ph.D.
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Tutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automationTutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automationPascalDesmarets1
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationPascalDesmarets1
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...Insight Technology, Inc.
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDatabricks
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformChester Chen
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Cloud Migration and Portability Best Practices
Cloud Migration and Portability Best PracticesCloud Migration and Portability Best Practices
Cloud Migration and Portability Best PracticesRightScale
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Jonathan Seidman
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward
 
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20Phil Wilkins
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)PascalDesmarets1
 

Ähnlich wie Managing "Big Data" Application Complexity with CloudGraph (20)

Spring data presentation
Spring data presentationSpring data presentation
Spring data presentation
 
Onion Architecture with S#arp
Onion Architecture with S#arpOnion Architecture with S#arp
Onion Architecture with S#arp
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Tutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automationTutorial Expert How-To - Docker-based automation
Tutorial Expert How-To - Docker-based automation
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Cloud Migration and Portability Best Practices
Cloud Migration and Portability Best PracticesCloud Migration and Portability Best Practices
Cloud Migration and Portability Best Practices
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
No sql
No sqlNo sql
No sql
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
 
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
gRPC, GraphQL, REST - Which API Tech to use - API Conference Berlin oct 20
 
Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)Tutorial Expert How-To - Command Line Interface (CLI)
Tutorial Expert How-To - Command Line Interface (CLI)
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Kürzlich hochgeladen (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Managing "Big Data" Application Complexity with CloudGraph

  • 1. -Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity- Managing “Big Data” Application Complexity using CloudGraph® Scott Cinnamond, TerraMeta Software Inc. http://cloudgraph.org
  • 2. (for columnar data store client applications) Complexity Complexity Increases With Added Data Model Entities #Model Entities / Classes
  • 3. Why More App Complexity? (with Added Data Model Entities) 1. Column Mapping Difficult 2. Composite Row Key Mapping, Hashing, Salting and Formatting 3. Persistence Code Development, Refactoring and Maintenance
  • 4. Typical Column Mapping Strategies • Hard Coded Names Embedded in Source Code – Not good  • Column Names in Java Constants File(s) – Better, but still really hard coded – Feasible with 5-10 entities, 50 attributes – With 500-1000 entities and 5000+ attributes? Not maintainable • Custom XML Configuration – – – – Create a “meta model” using, say XML Schema and JAXB Construct unique names and refer to them in source Better but application specific ”one off” Does not solve “state” management challenges
  • 5. CloudGraph Column Mapping A Standards Based Approach Using SDO and UML CloudGraph Statefull Column Key Factories Marshalling Row Key Mapping Entity ID Mapping Sequence Management Data Graph “State”
  • 6. Great, Still How Do We Keep Column Names Entirely Out Of CRUD Source Code? Create | Update | Delete: CloudGraph SDO API (Service Data Objects) Read (Query): CloudGraph Query DSL (Domain Specific Language)
  • 7. CloudGraph SDO Your complex domain model as a (create | update | delete) API • • • • • Drives all Column Mapping Transparently Granular Control over Data Graph Edits Convenient “Create Entity” Factory Methods Change Tracking Including History Rich Built In Data Types • 100% Compile Time Checking • Supports Multiple Inheritance Models • Currently Uses PlasmaSDO™ – See http://plasma-sdo.org
  • 8. CloudGraph SDO API Example Uses Chemical Modelling Language (CML) 2.4 https://github.com/cloudgraph/cml
  • 9. CloudGraph Query DSL Your complex domain model as a query API • Drives all Column Mapping Transparently • Intuitive Almost “Fluent” English Appearance • Logical Entity, Attribute Names Generated into API • 100% Compile Time Checking • Currently Uses PlasmaQuery® – See http://plasma-query.org
  • 10. CloudGraph Query DSL Example Uses Chemical Modelling Language (CML) 2.4 https://github.com/cloudgraph/cml
  • 11. Why More Complexity? 2.) Composite Row Key Mapping, Hashing and Formatting • More Model Entities:  Larger data graphs  More composite row key fields so can find graphs  How to reliably map “deep” into graphs • Row Key Field Hashing and Formatting – Critical for HBase partial-key scan API – Many data type specific idiosyncrasies
  • 12. CloudGraph HBase Composite Row Keys A Configuration Driven Approach using SDO XPath CloudGraph Composite Row Keys Hierarchica l Row Filters Fuzzy Row Filter Partial Key Assembly Scan Support
  • 13. Why More Complexity? 3.) Persistence Code Development, Refactoring and Maintenance Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines “Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines *Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/ **Example from UML adaptation of HL7 POCD/HD000040 Clinical Document ***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xmlcml.org
  • 14. CloudGraph Code Generation A contract-first approach in 4 steps 1. Leverage Existing or Create UML Model(s) 1. Can be automatically reverse engineered from existing RDBMS Schema 2. Map Repository Namespaces to Service Configurations 3. Define and Map Row Keys To Data Graphs 4. Add CloudGraph and Plasma Maven Artifacts and Generate Code
  • 15. Resources • Exchange Model Examples – https://github.com/cloudgraph/cml – https://github.com/cloudgraph/bioxsd – https://github.com/cloudgraph/hl7 • End To End Examples – https://github.com/cloudgraph/wordnet – http://wordnet.cloudgraph.org
  • 16. Status/Legal • Project Status – CloudGraph® is currently in private beta testing – Other services for Cassandra, MongoDB and others are under analysis – See http://cloudgraph.org for contact info and other details • Licensing – CloudGraph® 0.5.5 Community Edition (CE) is open source licensed under version 2 of the GNU General Public License • Trademarks – CloudGraph® is a registered trademark of TerraMeta Software LLC – Java™ is a trademark of Oracle Corporation – HBase™ is a trademark of Apache Software Foundation Copyright © TerraMeta Software, Inc – 2012,2013 – All Rights Reserved
  • 17. References • BIOXSD – http://bioxsd.org • Chemical Markup Language (CML) – http://xmlcml.org • Health Level 7 (HL7) – http://hl7.org • Apache HBase™ – http://hbase.apache.org • Apache Cassandra – http://cassandra.apache.org • MongoDB - http://www.mongodb.org • PlasmaSDO™ – http://plasma-sdo.org, http://search.maven.org/#search%7Cga%7C1% 7Ca%3A%22plasma-sdo%22