Current challenges facing the implementation of NoSQL-type databases involve how to use advanced rule-based analytics on large tables and key value stores, where metadata is often sparse. Graph databases or triple stores are great for utilizing one’s metadata, but are often computationally inefficient compared to NoSQL stores. To combat this problem, Modus Operandi will showcase a Predicate Store inside of its MOVIA product that can run advanced, first-order level, logical rule sets and queries against large tables or column stores directly to provide a scalable, rapid and advanced data analytics for cloud applications. This provides graph complexity in terms of content with the performance and scalability of NoSQL data approaches. The system also allows for both statistical algorithms as well as logic-based rule sets to be run concurrently, meaning that a host of parallel analytics can be run at once, providing deep analysis over a multitude of important pattern types.
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Reasoning over big data
1. Reasoning Over Big Data Stores
Eric Little, PhD
VP Data Science
Polytechnic School of Engineering - NYU
eric.little@osthus.com
2. Slide 2
Who We Are & What We Do
OSTHUS, Inc. is the U.S.
subsidiary of OSTHUS GmbH
Global presence - offices in
Germany, U.S. & China
Provide advanced solutions,
consulting and technology
services for Pharmaceutical and
Biotech R&D
Technology provider for the
Allotrope effort, globally aligning
several pharma and biotech
companies
3. Slide 3
Semantic Technologies – Smart Data Piece
Semantic Technologies
Provide several important features for emerging new technologies
• Controlled vocabularies
• Taxonomies
• Metadata structures
• Ontology models
• Logical inference
Data today continues to evolve and grow in both size and complexity.
We need hybrid solutions that can provide real insights
Analytics is growing into a new kind of field – Data Science
Is data science about interacting with machines or humans?
Must be able to strike a balance between complexity of the data and
simplicity of the presentation to the user
4. Slide 4
Metadata, Reference Data & Master Data
• While often lumped together, these are distinct kinds of data
• Semantic Technologies can help with the organization of these
kinds of data – but should not be done in isolation
• Scalability is achieved using complementary approaches
Increasedconceptualcomplexity
IncreasedScalabilityIssues
5. Slide 5
Graphs are good for information –
not so good for high-bandwidth
applications where speed and
scalability are the primary drivers.
Can require highly specialized
hardware, software techniques or
engineers
Semantics should be confined to
the metadata aspects of the
problem – use other tech for the
rest
Where Semantics Can Fall Short
6. Slide 6
Big Data is a real challenge –
but starting to become a buzz
word
Many “Big Data Problems”
can be reduced to smaller
data problems
Applications exist that require
complex inferencing over very
large data sets
A current client has lab
readings from 40,000+
devices
How to do this effectively?
The Big Data Problem
7. Slide 7
Why Not Just Build the Data Lake?
Data lakes are fine when you
are gathering and storing the
data
What happens later on when
a lot of data is in there?
The benefits are that data can
stay in its original form – no
real ETL
But running analytics across
disparate stores is very
challenging
“Without metadata, every
subsequent use of data means
analysts start from scratch.”
(Gartner 2014)
8. Slide 8
Reasoning Over Big Data Is A Growing Topic
There has been an inordinate amount of time and energy spent on
just queries.
This is not reasoning though – it is just retrieval
What is Reasoning?
More than just automated query sets run in sequence or parallel
Reasoning is about inferring new information that isn’t in the raw data.
It is a heuristic – where one discovers or learns something new for
themselves
Deductive, Inductive, Abductive
9. Slide 9
Logical Reasoning (does
not always assume set
theory)
Mathematical Reasoning
(which is logical
reasoning, but assumes
set theory as the basis)
9
Types of Reasoning One Can Use
11. Slide 11
Types of Semantic Inference (Forward and
Backward Chaining)
Uses Modus Ponens
Finds a T consequent and
affirms related antecedent
(verifies connection)
Uses Modus Ponens
Finds a T antecedent & affirms a
related consequent (new
knowledge)
12. Slide 12
Ontology Layering Is Important for Scale
Data Source Models
Multi- & Single-Source Data
Integration Models
Domain Models (Objs, Attributes,
Process & Relations)
System Lvl Models (Rules)
DataTraceability(Provenance)
UserDrivenOntologies
Upper-Lvl Models
Meta-data
Levels
(Human
Concepts)
Data-centric
Levels
(Machine
Language)
Metaphysics – not just data models
Data Sources connected directly to higher classifications
Federation allows for improved scale
13. Slide 13
Get your semantics experts and your big data scientists on the same
page
Utilize tables where possible – avoid multi-node graph hops
Use graphs for metadata – leave instance data in place when possible
Large graphs should be avoided
Lots of columns and rows are fine – joins across tables are not
Break graph information into other formats wherever possible
Pre-compute phases are important
Pre-compute multi-table joins based on SME input, known semantic
patterns, business rules/logics, etc.
Use statistical methods to cluster data (e.g., normalcy calcs)
Use the tech that is right for the job
Combining Semantics and NoSQL
14. Slide 14
One Example of Using RDF in Cloud-scalable
Applications
Example of a current approach being used – there are others
Can scale across multiple cloud nodes (where TS’s have issues)
Triples are indexed items
15. THANK YOU – QUESTIONS?
Eric Little, PhD
VP Data Science
OSTHUS, Inc.
eric.little@osthus.com
(M) 321-480-4818
www.linkedin.com/pub/eric-little