The Ontology2 platform squares the circle between big data, low latency, and semantics by combining expressive reasoning, information retrieval and machine learning with Hadoop, Apache Spark and Solid State Drive backed cloud .
2. OUR PLATFORM
For organizations handling complex, heterogeneous, and big data from
a large number of sources, structured, unstructured and
semistructured.
We rapidly (in terms of computer time and configuration time)
combine, curate, and index your data, both in batch and in real-time.
Based on our experience with Freebase (the basis for the Google
Knowledge Graph), we combine Hadoop technology with SQL and
NoSQL databases on a next generation cloud technology;
Focus: quality, usability, cross-domain integration and inference,
standards-driven interoperability, open-source components
3. Current State as we understand it
Technical: Need for extreme agility
• High-quality, curated data is important
• Limited by MySQL speed/scalability (and slow schema changes because of row store)
• Difficulty of handling taxonomy/ontology/schema changes
• Dealing with data loss and broken inter-concept links caused by changes
• Difficulty of linking entity between silos; inability to infer accurate, high quality relationships
between collections
• Need for clean, normalized data for input to machine learning algorithms
• Need ability to manage spatial and temporal data
• To keep up with competition: It must be easy has to make changes, fast to implement changes
• Need for data typing beyond SQL (currency, length, time interval, etc.) to support inference and
user interfaces
• Infrastructure built ad-hoc is difficult to document, maintain, expand
Business Challenges
• To be discussed
4. Benefits from cloud-native Infovore™ platform
Index construction does not interfere
with user-facing real-time services
Development, Test and Staging do
not interfere with production
Batch Jobs Don’t Interfere with
Interactive Services
5. Next Generation Cloud
• Near Bare Metal Performance
Hardware
Virtualization
• Incredible Speed
• Predictable Response Time
SSD Drives
• Take advantage of competition between cloud
provider
• Use existing on premise capacity; control physical
security, flexible options
Hybrid cloud
7. Index Construction in Hybrid Cloud
New Index Construction Never Conflicts With Production
time
Old index (multiple copies for throughput & availability)
Source
data
Test
Clone
New Index
Terminate and
recover
resources
8. Batch Index plus Real-Time Index
Effortless and efficient scalability
Message
Queue
Bulk Data time stamped
master data
small real-time index
large bulk
index
merger
RESULTS
9. New approach to data management
A FRAMEWORK FOR DATA QUALITY
Multiple sources of instance data
Facts
classifications
Reference data…
Examples
Test Data
Training Data
Requirements
Quality metrics
10. WE DELIVER FAST CYCLE TIME
HYBRID CLOUD: No waiting for hardware
PARALLEL DATA PROCESSING: Handle large data sets quickly
DEVOPS AUTOMATION: Little system administration overhead
EFFICIENT DATA REPRESENTATION: Rapid turnaround, low hardware cost
COMPETITIVE
ADVANTAGE
MINIMIZE WASTED CYCLES
automation eliminates errors
MINIMIZE TIME AROUND CYCLE
11. Ontology2 Spatial Hierarchy
Freebase data enriched for Language+Contextual Performance
Global coverage
30+ languages
250 countries
36,000 regions
1.5M names
400,000 cites & towns
8M names
Large alternative name bank + hierarchical constraint =
• Resolution of jurisdictions in international business listings
• Resolution of place names in free text
12. Extensive Graph-Based Schema
META-MODEL SYSTEMATICAL DESCRIBES PROCESSES AND THINGS
RDFS
types + properties
XML SCHEMA
Data types
EXTENDED
Data types
DECLARATIVE MAPPINGS
CSV RDBMS XML …
DECLARATIVE
HINTS
formatting
editing
…
LINGUISTIC +
CONTEXTUAL
Knowledge
Representation
SOLVES ISSUES, SEE
SLIDE 3 !
14. RAW DATA
Event-driven real-time pipeline
applications
MERGED
PRODUCTION
INDEX
batch pipeline
MODEL-DRIVEN ARCHITECTURE
HANDLING CONTENT AND DATA WITH CONTEXTUAL UNDERSTANDING
15. SUMMARY
For organizations handling complex, heterogeneous, and big data from
a large number of sources, structured, unstructured and
semistructured.
We rapidly (in terms of computer time and configuration time)
combine, curate, and index your data, both in batch and in real-time.
Based on our experience with Freebase (the basis for the Google
Knowledge Graph), we combine Hadoop technology with SQL and
NoSQL databases on a next generation cloud technology;
Focus: quality, usability, cross-domain integration and inference,
standards-driven interoperability, open-source components
Bill Freeman, President KMSolutions
william.freeman3@outlook.com (774) 301-1301