SlideShare ist ein Scribd-Unternehmen logo
1 von 42
HORIZONTAL INTEGRATION OF
      BIG INTELLIGENCE DATA
      The Role of Ontology in the Era of Big Data


T. Malyuta, Ph. D
New York City College of Technology, NY, NY
B. Smith, Ph. D
University at Buffalo, Buffalo, NY
R. Rudnicki
CUBRC, Buffalo, NY
2


Big Data Problem
• Wikipedia defines Big Data as “…a collection of
  data sets so large and complex that it becomes
  difficult to process using on-hand database
  management tools.”
• Gartner defines Big Data with three „V‟s:
 • Volume
 • Velocity (of production and analysis)
 • Variety
• This means that Big Data are beyond our control
 (as opposed to those complex and big systems with
 diverse and changing data where the complexity is
 known)
3



Big Data Solution – Agility
• Dimensions of agility
  • Storage paradigms that accommodate massive volumes of
    heterogeneous data
  • Data processing paradigms that can deal with the massive volumes
    of heterogeneous data coming onstream
  • Dynamic data stores that can easily accommodate diverse and a
    priori unknown data types and semantics
  • Methods and tools that leverage dynamic and diverse content
4



Agile Integration and Interoperability
• Today, the main problem of the Big Data is using it
• Utilization of „Variety‟ – diverse types and semantics –
  requires data integration and interoperability
• Traditional integration approaches fail
• Agile integration paradigms are needed
5


The Problem of Horizontal Integration of
Big Intelligence Data
  • HI =Def. the ability to exploit multiple data sources as
    if they are one
  • Recognized issues for HI with existing approaches
     • Data silos
     • Lexicon/semantics silos
  • Requirement for HI of Big Intelligence Data – Agile
    Semantic Interoperability
     A strategy for HI must be agile in the sense that it can be quickly
      extended to new zones of emerging data according to need
     Ontology allows an incremental approach – big bang already from
      the very first buck (we showed in I2WD)
     Ontology can provide the needed agility
6



Agile Semantic Interoperability
• A good solution has to be
  • Able to grow incrementally
  • Able to be developed in a distributed manner
  • Without losing consistency
  • Independent of particular implementations, and data producers and
    consumers
  • Applicable to data in an agile manner
• We call our solution: „semantic enhancement‟ (SE) of data
7


 SE
• SE is realized with the help of ontologies that are used
 to annotate (tag) data
  • Vocabulary of ontologies used for annotations provides agile
    horizontal integration
  • Ontologies, by virtue of their nature and organization, provide
    semantic enhancement of data
                          Skill                      Education
                                              Technical
                          ComputerSkill       Education
                               ProgrammingSkil
                               l
                                      SQ     Jav   C+
                                      L      a     +
 PersonID              Name                   Description
 111                   Java                   Programming
 222                   SQL                    Database
8


The Meaning of „Enhancement‟
• Semantic enhancement/enrichment of data = arm‟s
 length approach (no change to data) – through
 simple annotation we associate an entire knowledge
 system with a database field
 • enables analytics to process data, e.g. about computer
   skills, “vertically” along the Skill hierarchy, as well as
   “horizontally” via relations between Skill and Education.
 • and further… while data in the database does not change, its
   analysis can be richer and richer as our understanding of the
   reality changes
• For this richness to be leveraged by different
 communities, persons, and applications it needs to
 have the properties mentioned above and be
 constructed in accordance with the principles of the
 SE
9



SE Principles
⁻ Create a Shared Semantic Resource (SSR) of
  ontologies to be used for annotation
⁻ Establish an agile strategy for building ontologies
  within this SSR, and apply and extend these
  ontologies to annotate new source data as they
  come onstream
 ⁻ Strategy pioneered in biomedical and other scientific fields:
   leaves data as they are, and incrementally tags data
   sources with terms from a growing, consistent, non-
   redundant set of ontologies
⁻ Problem: Given the immense and growing variety of
 data sources, the development methodology must
 be applied by multiple different groups
10


 Achieving the Goal
• Methodology of incremental distributed ontology
  development
• A common ontology architecture incorporating a
  common, domain-neutral, upper-level ontology
  (BFO)
• A shared governance and change management
  process
• A simple, repeatable process for ontology
  development
• An ontology registry
• A process of intelligence data capture through
  „annotation‟ or „tagging‟ of source data artifacts
11


  Main Methodological Points
• Ontological realism
   • Based on Doctrine;
   • Involves SMEs in label selection and definition
   • Thoroughly tested*
• Arms-length process, with minimal disturbance to existing
  data and data semantics
• Reference ontologies – capture generic content and are
  designed for aggressive reuse in multiple different types of
  context
   • Single reference ontology for each domain of interest
• Application ontologies – are tied to specific local
  applications
   • An application ontology is created by combining local content with generic
     content taken over from relevant reference ontologies
   • Are still interoperable as are based on the common set of reference


* Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of
Scientific Ontologies”, Applied Ontology, 5 (2010), 139–188.
12

    Arms-length Process
• Focusing on the terms (labels, acronyms, codes) used in ***our
  source data.
• Where multiple distinct terms {t1, …, tn} are used in separate
  data sources with one and the same meaning, they are
  associated with a single preferred label drawn from a standard
  set of such labels
• All the separate data items associated with the {t1, … tn} thereby
  linked together through the corresponding preferred labels.
• Preferred labels form basis the for the ontologies we build
            SE ontology labels XYZ

            AB     Heterogeneous                  KL
            C      Contents                       M
13


Reference and Application Ontologies
    Reference Ontology                           Application Definitions
vehicle =def: an object used for             artillery vehicle = def. vehicle designed for
transporting people or goods                 the transport of one or more artillery
                                             weapons
  tractor =def: a vehicle that is used for
  towing                                     wheeled tractor = def. a tractor that has a
                                             wheeled platform
  crane =def: a vehicle that is used for
  lifting and moving heavy objects           tracked tractor = def. a tractor that has a
                                             tracked platform
vehicle platform=def: means of providing
mobility to a vehicle                        artillery tractor = def. an artillery vehicle
                                             that is a tractor
  wheeled platform=def: a vehicle
  platform that provides mobility through    wheeled artillery tractor = def. an artillery
  the use of wheels                          tractor that has a wheeled platform

  tracked platform=def: a vehicle
  platform that provides mobility through
  the use of continuous tracks
14



Illustration of Ontology Types (Toy Example)

               Vehicle                      Black –
                                            reference
                                            ontologies

                                Artillery   Red –
               Tractor          Vehicle     application
                                            ontologies


     Wheeled               Artillery
     Tractor               Tractor


               Wheeled
               Artillery
               Tractor
15



 Role of Reference Ontologies
• Normalized
  • Maintains a set of consistent ontologies
  • Eliminates redundancy
• Modular
  • A set of plug-and-play ontology modules
  • Enables distributed consistent development
• Surveyable
16



SE Architecture
• The Upper Level Ontology (ULO) in the SE hierarchy
  must be maximally general (no overlap with domain
  ontologies)
• The Mid-Level Ontologies (MLOs) introduce successively
  less general and more detailed representations of types
  which arise in successively narrower domains until we
  reach the Lowest Level Ontologies (LLOs).
• The LLOs are maximally specific representation of the
  entities in a particular one-dimensional domain
17


Architecture Illustration
18



Current State
• Completed
  • Data Representation and Integration Framework (DRIF):
    architectural solution and implementation to create Dataspace
    (cloud of intelligence data)
    • Lossless representation of sources with their native semantics
    • Semantic Enhancement (SE): suite of prototype ontologies with
      coverage allowing annotation of these native semantics
    • Index exposing the content of the Dataspace via SE with proven
      benefits
  • Methodology and architecture for ontology development
• In progress
   • Assembling the Shared Semantic Resource (SSR) as a separate
     store and enabling its use outside the Dataspace; in discussions
     with various agencies
19



The SSR
              DoD         AirForce            Navy       NSA
                                 use

Reference Ontologies (Shared Semantic Resource)            Application
         …
                                                           Ontologies:
                       Geospatial            Weapon
                                                           Agent-related
     Agent           Organization          Information
                                                           Weapon-related
                                             Artifact      …



                                 for purposes of
              Event         Intelligence      Video      NLP
             Reporting        Analysis       Analysis
20


Challenges to HI
• Too many lexicons
• The scope of the domain: signal, sensor, image, …
  intelligence about … the whole world
• Difficult to conduct governance and management of
  ontology development to ensure consistent evolution
• Lack of expertise
21



Preventing Failure
• The method we use offers solutions to some of the common
  reasons for failure
• Lack of Consensus
  • Realism offers an objective standard for settling disputes over
    terminology. Ontology development becomes an empirical science
    instead of an exercise in the publication of dialects
  • Governance helps to resolve conflicts and achieve consensus
• High Maintenance
  • Arm‟s length implementation places no additional overhead onto
    applications
• Parochialism
  • Architecture and methodology prevent development of vocabularies
    that apply only to a single perspective
• Poor Quality
  • Experience prevents common mistakes in vocabularies that cause
    downstream problems with search and analytics
Distributed Common Ground System – Army (DCGS-A)


Semantic Enhancement of the
  Dataspace on the Cloud
23



Integrated Store of Intelligence Data
• Lossless integration without heavy pre-processing
• Ability to:
  • Incorporate multiple integration models / approaches /
    points of view of data and data-semantics
  • Perform continuous semantic enrichment            of   the
    integrated store
• Scalability
24


Solution Components
• Cloud implementation
  • Cloudbase (Accumulo)
• Data Representation and Integration Framework
  • Comprehensive unified representation of data, data semantics, and
    metadata
• This work was funded by US Army CERDEC Intelligence
 and Information Warfare Directorate (I2WD)
25


   Dealing with Semantic Heterogeneity
Physical                    Virtual integration. A
Integration. A              projection onto a
separate data store         homogeneous data-
homogenizing                model exposed to
semantics in a              users – is more
particular data-            flexible, but may
model – works only          have the problem of
for special cases,          data availability (e.g.
entails loss and            military, intelligence).
distortion of data          Also, a particular
and semantics,              homogeneous model
creates a new data          has limited usage,
silo.                       does not expose all
                            content, and does
                            not support
                            enrichment
26


Pursuit of the Holy Grail of Intelligence
Data Integration
 • In a highly dynamic semantic environment evolving in ad
  hoc ways
   • how to have it all and have it available immediately and at any time?
     • Traditional physical and virtual integration approaches fail to respond to
       these requirements
   • how to use these data resources efficiently (integrate, query, and
     analyze)?
27


    Workable Solution
A physical store                     Light Weight
incorporating                        Semantic
heterogeneous contents.              Enhancement (SE)
Data Representation and              supports semantic
Integration Framework (DRIF) – is    integration and
based on a decomposed
representation of structured data    provides a decent
(RDF-style) and allows collection    utilization capability
of data resources without loss and   without adding
or distortion and thereby achieve
representational integration         storage and
                                     processing weight to
                                     the already storage-
                                     and processing-
                                     heavy Dataspace
28


   DRIF Dataspace
• Integration without heavy pre-processing (ad-hoc rapid
 integration):
  • Of any data artifact regardless of the model (or absence of it)
    and modality
  • Without loss and or distortion of data and data-semantics
• Continuous evolution and enrichment
• Pay-as-you-go solution
  • While data and data-semantics are expected to be enriched
    and refined, they can be efficiently utilized immediately after
    entering the DataSpace through querying, navigation, and
    drilling
Organization of the DRIF Dataspace




  Registration
  Ingestion
  Extraction [Transformation] / Enrichment
30

Semantic Enhancement of the Dataspace
• Simple yet efficient harmonization strategy
 • Takes place not by changing the data semantics to which it is applied ,
   but rather by adding an extra semantic layer to it
 • Long-lasting solution that can be applied consistently and in cumulative
   fashion to new models entering the Dataspace
• Strategy compliant with and complementing the
 DRIF
 • Source data models are not changed

• Be used efficiently, and in a unified fashion, in
 search, reasoning, and analytics
 • Provides views of the Dataspace of different level of detail

• Mapping to a particular Über-model or choosing a
 single comprehensive model for harmonization do
 not provide the benefits described
31




Illustration
• DRIF Dataspace accommodates lots of data models and
    is a microcosm of a collection of systems with diverse and
    heterogeneous data
•   Incremental annotations of these data models through SE
    ontologies
•   Preserving the native content of data resources
•   Presenting the native content via the SE annotations
•   Benefits of the approach
32


 Sources
• Source database Db1, with tables Person and
 Skill, containing person data and data pertaining to skills of
 different kinds, respectively.
   PersonID   SkillID         SkillID Name     Description
   111        222             222     Java     Programming

• Source database Db2, with the table Person, containing data
 about IT personnel and their skills:
   ID         SkillDescr
   333        SQL

• Source database Db3, with the table ProgrSkill, containing
 data about programmers‟ skills:
   EmplID      SkillName
   444         Java
33


   Representation in theSE Label
    Label    Relation
                         Dataspace
                                                      Representation of
Db1.Name              Is-a      SE.Skill              data-models, SE
Db2.SkillDescr        Is-a      SE.ComputerSkill      and SE annotations
Db3.SkillName         Is-a      SE.ProgrammingSkill   as Concepts and
                                                      ConceptAssociation
Db1.PersonID          Is-a      SE.PersonID           s
Db2.ID                Is-a      SE.PersonID
Db3.EmplID            Is-a      SE.PersonID           Blue – SE
                                                      annotations
SE.ComputerSkill      Is-a      SE.Skill
                                                      Red – SE
SE.ProgrammingSkill   Is-a      SE.ComputerSkill      hierarchies

     Value and       Relation         Value and
 Associated Label                 Associated Label    Native
111, Db1.PersonID hasSkillID     222, Db1.SkillID     representation
                                                      of structured
222, Db1.SkillID  hasName        Java, Db1.Name
                                                      data
222, Db1.SkillID  hasDescription Programming,
                                 Db1.Description
333, Db2.ID       hasSkillDescr SQL, Db2.SkillDescr
444, Db3.EmplID   hasSkillName Java, Db3.SkillName
34


  Indexed Contents Based on the SE
Index entries based on the SE and native (blue) vocabularies

    Index Entry      Associated Field-Value
    111,        Type: Person
    PersonID    Skill: Java
                Db1.Description:Programming
    333,        Type: Person
    PersonID    ComputerSkill: SQL
    444,        Type: Person
    PersonID    ProgrammingSkill: Java
35


 Benefits of DRIF + SE
• Leverages syntactic integration provided by DRIF, semantic
 integration provided by the SE vocabulary and annotations of
 native sources, and rich semantics provided by ontologies in
 general
  • Entering Skill = Java (which will be re-written at run time as: Skill =
    Java OR ComputerSkill = Java OR ProgrammingSkill = Java OR
    NetworkSkill = Java) will return: persons 111 and 444
  • Entering ComputerSkill = Java OR ComputerSkill = SQL will
    return: persons 333 and 444
  • entering ProgrammingSkill = Java will return: person 444
  • entering Description = Programming will return: person 111
• Allows to query/search and manipulate native
  representations
• Light-weight non-intrusive approach that can be improved
  and refined without impacting the Dataspace
36


Index Contents without the SE
   Index entries based on native vocabularies
     Index Entry     Associated Field-Value
   111, PersonID   Type: Person
                   Name: Java
                   Description: Programming
   333, ID         Type: Person
                   SkillDescr: SQL
   444, EmplID     Type: Person
                   SkillName: Java
37


 Problems
• Even for our toy example we can see how much manual
 effort the analyst needs to apply in performing search
 without SE – and even then the information he will gain will
 be meager in comparison with what is made available
 through the Index with SE.
  • For example, if an analyst is familiar with the labels used in Db1
   and is thus in a position to enter Name = Java, his query will still
   return only: person 111. Directly salient Db4 information will
   thus be missed.
38


  Additional Notes on the SE process
• Original data and data-semantics are included in the Dataspace
  without loss and or distortion; thus there is no need to cover all
  semantics of the Dataspace – what is unlikely to be used in
  search or is not important for integration will still be available
  when needed
• A complex ontology is not needed – a common and shared
  vocabulary is sufficient for virtual semantic integration and
  search/analytics
• The approach is very flexible, and investments can be made in
  specific areas according to need (pay-as-you-go)
• The approach is tunable – if the chosen annotations of a
  particular subset of a source data-model are too general for data
  analyses, the respective ontologies can be further developed
  and source models re-annotated
39


     Benefits of the Approach
• Does not interfere with the source content
• Enhancement enables this content to evolve in a cumulative
    fashion as it accommodates new kinds of data
•   Does not depend on the data resources and can be developed
    independently from them in an incremental and distributed fashion
•   Provides a more consistent, homogeneous, and well-articulated
    presentation of the content which originates in multiple internally
    inconsistent and heterogeneous systems
•   Makes management and exploitation of the content more cost-
    effective
•   The use of the selected ontologies brings integration with other
    government initiatives and brings the system closer to the federally
    mandated net-centric data strategy
•   Creates an integrated content that is effectively searchable and
    that provides content to which more powerful analytics can be
    applied
40


 Towards Globalization and Sharing
• Using the SE approach to
  create a Shared Semantic
  Resource for the Intelligence
  Community to enable
  interoperability across
  systems
• Applying it directly to or
  projecting its contents on a
  particular integration
  solution
41




References
• Smith B. et al. Horizontal Integration of Warfighter
    Intelligence Data: A Shared Semantic Resource for the
    Intelligence Community, STIDS Conference, 2012.
•
• Smith B. et al., “Ontology for the Intelligence
    Analyst”, Crosstalk: The Journal of Defense Software
    Engineering, 2012.
•
• Salmen D. et al. Integration of Intelligence Data through
    Semantic Enhancement, STIDS Conference, 2011.
Follow Us




 Data Tactics Corporation
  7901 Jones Branch Dr.
        Suite 700
    McLean, VA 22102
www.data-tactics-corp.com

Weitere ähnliche Inhalte

Andere mochten auch

A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and ReportsDataTactics
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3DataTactics
 
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATADataTactics
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
ODSC_Cherven_20160518
ODSC_Cherven_20160518ODSC_Cherven_20160518
ODSC_Cherven_20160518Ken Cherven
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
X Som Graduation Presentation
X Som   Graduation PresentationX Som   Graduation Presentation
X Som Graduation PresentationGiorgio Orsi
 
Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integrationjuanesteva
 
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...hamidnazary2002
 
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreOntology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreAdriel Café
 
8 ontology integration and interoperability (onto i op)
8 ontology integration and interoperability (onto i op)8 ontology integration and interoperability (onto i op)
8 ontology integration and interoperability (onto i op)AEGIS-ACCESSIBLE Projects
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data IntegrationJanna Hastings
 
ontology based- data_integration.
ontology based- data_integration.ontology based- data_integration.
ontology based- data_integration.AliAlJadaa
 
Management Gurus
Management GurusManagement Gurus
Management GurusMarcus9000
 

Andere mochten auch (15)

A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and Reports
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3
 
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
ODSC_Cherven_20160518
ODSC_Cherven_20160518ODSC_Cherven_20160518
ODSC_Cherven_20160518
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
X Som Graduation Presentation
X Som   Graduation PresentationX Som   Graduation Presentation
X Som Graduation Presentation
 
Ontology For Data Integration
Ontology For Data IntegrationOntology For Data Integration
Ontology For Data Integration
 
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
Enterprise and Data Mining Ontology Integration to Extract Actionable Knowled...
 
Ontology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and moreOntology integration - Heterogeneity, Techniques and more
Ontology integration - Heterogeneity, Techniques and more
 
8 ontology integration and interoperability (onto i op)
8 ontology integration and interoperability (onto i op)8 ontology integration and interoperability (onto i op)
8 ontology integration and interoperability (onto i op)
 
Ontology-based Data Integration
Ontology-based Data IntegrationOntology-based Data Integration
Ontology-based Data Integration
 
ontology based- data_integration.
ontology based- data_integration.ontology based- data_integration.
ontology based- data_integration.
 
Management Gurus
Management GurusManagement Gurus
Management Gurus
 

Ähnlich wie Horizontal Integration of Big Intelligence Data

Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataBarry Smith
 
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEWONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEWijait
 
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW ijait
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisJamshaid Ashraf
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048AliAlJadaa
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Heimo Hänninen
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSKishan Patel
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 
Vivo ontology overviewanddirections.2013-04-25
Vivo ontology overviewanddirections.2013-04-25Vivo ontology overviewanddirections.2013-04-25
Vivo ontology overviewanddirections.2013-04-25joncr
 
ArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolMark Matienzo
 
Semantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usageSemantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usagecatherine roussey
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notesBernadette Hyland-Wood
 
OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010Peter Yim
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)petrknoth
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1iotest
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global DataspaceOpen Education Consortium
 

Ähnlich wie Horizontal Integration of Big Intelligence Data (20)

Horizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence dataHorizontal integration of warfighter intelligence data
Horizontal integration of warfighter intelligence data
 
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEWONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
 
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
ONTOLOGY VISUALIZATION PROTÉGÉ TOOLS – A REVIEW
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESS
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Vivo ontology overviewanddirections.2013-04-25
Vivo ontology overviewanddirections.2013-04-25Vivo ontology overviewanddirections.2013-04-25
Vivo ontology overviewanddirections.2013-04-25
 
ArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management Tool
 
Semantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usageSemantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usage
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010OOR--Open-Ontology-Repository--jun2010
OOR--Open-Ontology-Repository--jun2010
 
Ontology
OntologyOntology
Ontology
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 

Mehr von DataTactics

C Star Analytic Presentation
C Star Analytic PresentationC Star Analytic Presentation
C Star Analytic PresentationDataTactics
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka DataTactics
 
Data Tactics Analytics Practice
Data Tactics Analytics PracticeData Tactics Analytics Practice
Data Tactics Analytics PracticeDataTactics
 
Discontinuities Demo
Discontinuities DemoDiscontinuities Demo
Discontinuities DemoDataTactics
 
Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag DataTactics
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013DataTactics
 
Data Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionData Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionDataTactics
 
Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2DataTactics
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013DataTactics
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief AnalyticsDataTactics
 

Mehr von DataTactics (11)

C Star Analytic Presentation
C Star Analytic PresentationC Star Analytic Presentation
C Star Analytic Presentation
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka
 
Data Tactics Analytics Practice
Data Tactics Analytics PracticeData Tactics Analytics Practice
Data Tactics Analytics Practice
 
Discontinuities Demo
Discontinuities DemoDiscontinuities Demo
Discontinuities Demo
 
DLISA
DLISADLISA
DLISA
 
Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
 
Data Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionData Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and Description
 
Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief Analytics
 

Horizontal Integration of Big Intelligence Data

  • 1. HORIZONTAL INTEGRATION OF BIG INTELLIGENCE DATA The Role of Ontology in the Era of Big Data T. Malyuta, Ph. D New York City College of Technology, NY, NY B. Smith, Ph. D University at Buffalo, Buffalo, NY R. Rudnicki CUBRC, Buffalo, NY
  • 2. 2 Big Data Problem • Wikipedia defines Big Data as “…a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.” • Gartner defines Big Data with three „V‟s: • Volume • Velocity (of production and analysis) • Variety • This means that Big Data are beyond our control (as opposed to those complex and big systems with diverse and changing data where the complexity is known)
  • 3. 3 Big Data Solution – Agility • Dimensions of agility • Storage paradigms that accommodate massive volumes of heterogeneous data • Data processing paradigms that can deal with the massive volumes of heterogeneous data coming onstream • Dynamic data stores that can easily accommodate diverse and a priori unknown data types and semantics • Methods and tools that leverage dynamic and diverse content
  • 4. 4 Agile Integration and Interoperability • Today, the main problem of the Big Data is using it • Utilization of „Variety‟ – diverse types and semantics – requires data integration and interoperability • Traditional integration approaches fail • Agile integration paradigms are needed
  • 5. 5 The Problem of Horizontal Integration of Big Intelligence Data • HI =Def. the ability to exploit multiple data sources as if they are one • Recognized issues for HI with existing approaches • Data silos • Lexicon/semantics silos • Requirement for HI of Big Intelligence Data – Agile Semantic Interoperability  A strategy for HI must be agile in the sense that it can be quickly extended to new zones of emerging data according to need  Ontology allows an incremental approach – big bang already from the very first buck (we showed in I2WD)  Ontology can provide the needed agility
  • 6. 6 Agile Semantic Interoperability • A good solution has to be • Able to grow incrementally • Able to be developed in a distributed manner • Without losing consistency • Independent of particular implementations, and data producers and consumers • Applicable to data in an agile manner • We call our solution: „semantic enhancement‟ (SE) of data
  • 7. 7 SE • SE is realized with the help of ontologies that are used to annotate (tag) data • Vocabulary of ontologies used for annotations provides agile horizontal integration • Ontologies, by virtue of their nature and organization, provide semantic enhancement of data Skill Education Technical ComputerSkill Education ProgrammingSkil l SQ Jav C+ L a + PersonID Name Description 111 Java Programming 222 SQL Database
  • 8. 8 The Meaning of „Enhancement‟ • Semantic enhancement/enrichment of data = arm‟s length approach (no change to data) – through simple annotation we associate an entire knowledge system with a database field • enables analytics to process data, e.g. about computer skills, “vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education. • and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes • For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE
  • 9. 9 SE Principles ⁻ Create a Shared Semantic Resource (SSR) of ontologies to be used for annotation ⁻ Establish an agile strategy for building ontologies within this SSR, and apply and extend these ontologies to annotate new source data as they come onstream ⁻ Strategy pioneered in biomedical and other scientific fields: leaves data as they are, and incrementally tags data sources with terms from a growing, consistent, non- redundant set of ontologies ⁻ Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups
  • 10. 10 Achieving the Goal • Methodology of incremental distributed ontology development • A common ontology architecture incorporating a common, domain-neutral, upper-level ontology (BFO) • A shared governance and change management process • A simple, repeatable process for ontology development • An ontology registry • A process of intelligence data capture through „annotation‟ or „tagging‟ of source data artifacts
  • 11. 11 Main Methodological Points • Ontological realism • Based on Doctrine; • Involves SMEs in label selection and definition • Thoroughly tested* • Arms-length process, with minimal disturbance to existing data and data semantics • Reference ontologies – capture generic content and are designed for aggressive reuse in multiple different types of context • Single reference ontology for each domain of interest • Application ontologies – are tied to specific local applications • An application ontology is created by combining local content with generic content taken over from relevant reference ontologies • Are still interoperable as are based on the common set of reference * Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies”, Applied Ontology, 5 (2010), 139–188.
  • 12. 12 Arms-length Process • Focusing on the terms (labels, acronyms, codes) used in ***our source data. • Where multiple distinct terms {t1, …, tn} are used in separate data sources with one and the same meaning, they are associated with a single preferred label drawn from a standard set of such labels • All the separate data items associated with the {t1, … tn} thereby linked together through the corresponding preferred labels. • Preferred labels form basis the for the ontologies we build SE ontology labels XYZ AB Heterogeneous KL C Contents M
  • 13. 13 Reference and Application Ontologies Reference Ontology Application Definitions vehicle =def: an object used for artillery vehicle = def. vehicle designed for transporting people or goods the transport of one or more artillery weapons tractor =def: a vehicle that is used for towing wheeled tractor = def. a tractor that has a wheeled platform crane =def: a vehicle that is used for lifting and moving heavy objects tracked tractor = def. a tractor that has a tracked platform vehicle platform=def: means of providing mobility to a vehicle artillery tractor = def. an artillery vehicle that is a tractor wheeled platform=def: a vehicle platform that provides mobility through wheeled artillery tractor = def. an artillery the use of wheels tractor that has a wheeled platform tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks
  • 14. 14 Illustration of Ontology Types (Toy Example) Vehicle Black – reference ontologies Artillery Red – Tractor Vehicle application ontologies Wheeled Artillery Tractor Tractor Wheeled Artillery Tractor
  • 15. 15 Role of Reference Ontologies • Normalized • Maintains a set of consistent ontologies • Eliminates redundancy • Modular • A set of plug-and-play ontology modules • Enables distributed consistent development • Surveyable
  • 16. 16 SE Architecture • The Upper Level Ontology (ULO) in the SE hierarchy must be maximally general (no overlap with domain ontologies) • The Mid-Level Ontologies (MLOs) introduce successively less general and more detailed representations of types which arise in successively narrower domains until we reach the Lowest Level Ontologies (LLOs). • The LLOs are maximally specific representation of the entities in a particular one-dimensional domain
  • 18. 18 Current State • Completed • Data Representation and Integration Framework (DRIF): architectural solution and implementation to create Dataspace (cloud of intelligence data) • Lossless representation of sources with their native semantics • Semantic Enhancement (SE): suite of prototype ontologies with coverage allowing annotation of these native semantics • Index exposing the content of the Dataspace via SE with proven benefits • Methodology and architecture for ontology development • In progress • Assembling the Shared Semantic Resource (SSR) as a separate store and enabling its use outside the Dataspace; in discussions with various agencies
  • 19. 19 The SSR DoD AirForce Navy NSA use Reference Ontologies (Shared Semantic Resource) Application … Ontologies: Geospatial Weapon Agent-related Agent Organization Information Weapon-related Artifact … for purposes of Event Intelligence Video NLP Reporting Analysis Analysis
  • 20. 20 Challenges to HI • Too many lexicons • The scope of the domain: signal, sensor, image, … intelligence about … the whole world • Difficult to conduct governance and management of ontology development to ensure consistent evolution • Lack of expertise
  • 21. 21 Preventing Failure • The method we use offers solutions to some of the common reasons for failure • Lack of Consensus • Realism offers an objective standard for settling disputes over terminology. Ontology development becomes an empirical science instead of an exercise in the publication of dialects • Governance helps to resolve conflicts and achieve consensus • High Maintenance • Arm‟s length implementation places no additional overhead onto applications • Parochialism • Architecture and methodology prevent development of vocabularies that apply only to a single perspective • Poor Quality • Experience prevents common mistakes in vocabularies that cause downstream problems with search and analytics
  • 22. Distributed Common Ground System – Army (DCGS-A) Semantic Enhancement of the Dataspace on the Cloud
  • 23. 23 Integrated Store of Intelligence Data • Lossless integration without heavy pre-processing • Ability to: • Incorporate multiple integration models / approaches / points of view of data and data-semantics • Perform continuous semantic enrichment of the integrated store • Scalability
  • 24. 24 Solution Components • Cloud implementation • Cloudbase (Accumulo) • Data Representation and Integration Framework • Comprehensive unified representation of data, data semantics, and metadata • This work was funded by US Army CERDEC Intelligence and Information Warfare Directorate (I2WD)
  • 25. 25 Dealing with Semantic Heterogeneity Physical Virtual integration. A Integration. A projection onto a separate data store homogeneous data- homogenizing model exposed to semantics in a users – is more particular data- flexible, but may model – works only have the problem of for special cases, data availability (e.g. entails loss and military, intelligence). distortion of data Also, a particular and semantics, homogeneous model creates a new data has limited usage, silo. does not expose all content, and does not support enrichment
  • 26. 26 Pursuit of the Holy Grail of Intelligence Data Integration • In a highly dynamic semantic environment evolving in ad hoc ways • how to have it all and have it available immediately and at any time? • Traditional physical and virtual integration approaches fail to respond to these requirements • how to use these data resources efficiently (integrate, query, and analyze)?
  • 27. 27 Workable Solution A physical store Light Weight incorporating Semantic heterogeneous contents. Enhancement (SE) Data Representation and supports semantic Integration Framework (DRIF) – is integration and based on a decomposed representation of structured data provides a decent (RDF-style) and allows collection utilization capability of data resources without loss and without adding or distortion and thereby achieve representational integration storage and processing weight to the already storage- and processing- heavy Dataspace
  • 28. 28 DRIF Dataspace • Integration without heavy pre-processing (ad-hoc rapid integration): • Of any data artifact regardless of the model (or absence of it) and modality • Without loss and or distortion of data and data-semantics • Continuous evolution and enrichment • Pay-as-you-go solution • While data and data-semantics are expected to be enriched and refined, they can be efficiently utilized immediately after entering the DataSpace through querying, navigation, and drilling
  • 29. Organization of the DRIF Dataspace Registration Ingestion Extraction [Transformation] / Enrichment
  • 30. 30 Semantic Enhancement of the Dataspace • Simple yet efficient harmonization strategy • Takes place not by changing the data semantics to which it is applied , but rather by adding an extra semantic layer to it • Long-lasting solution that can be applied consistently and in cumulative fashion to new models entering the Dataspace • Strategy compliant with and complementing the DRIF • Source data models are not changed • Be used efficiently, and in a unified fashion, in search, reasoning, and analytics • Provides views of the Dataspace of different level of detail • Mapping to a particular Über-model or choosing a single comprehensive model for harmonization do not provide the benefits described
  • 31. 31 Illustration • DRIF Dataspace accommodates lots of data models and is a microcosm of a collection of systems with diverse and heterogeneous data • Incremental annotations of these data models through SE ontologies • Preserving the native content of data resources • Presenting the native content via the SE annotations • Benefits of the approach
  • 32. 32 Sources • Source database Db1, with tables Person and Skill, containing person data and data pertaining to skills of different kinds, respectively. PersonID SkillID SkillID Name Description 111 222 222 Java Programming • Source database Db2, with the table Person, containing data about IT personnel and their skills: ID SkillDescr 333 SQL • Source database Db3, with the table ProgrSkill, containing data about programmers‟ skills: EmplID SkillName 444 Java
  • 33. 33 Representation in theSE Label Label Relation Dataspace Representation of Db1.Name Is-a SE.Skill data-models, SE Db2.SkillDescr Is-a SE.ComputerSkill and SE annotations Db3.SkillName Is-a SE.ProgrammingSkill as Concepts and ConceptAssociation Db1.PersonID Is-a SE.PersonID s Db2.ID Is-a SE.PersonID Db3.EmplID Is-a SE.PersonID Blue – SE annotations SE.ComputerSkill Is-a SE.Skill Red – SE SE.ProgrammingSkill Is-a SE.ComputerSkill hierarchies Value and Relation Value and Associated Label Associated Label Native 111, Db1.PersonID hasSkillID 222, Db1.SkillID representation of structured 222, Db1.SkillID hasName Java, Db1.Name data 222, Db1.SkillID hasDescription Programming, Db1.Description 333, Db2.ID hasSkillDescr SQL, Db2.SkillDescr 444, Db3.EmplID hasSkillName Java, Db3.SkillName
  • 34. 34 Indexed Contents Based on the SE Index entries based on the SE and native (blue) vocabularies Index Entry Associated Field-Value 111, Type: Person PersonID Skill: Java Db1.Description:Programming 333, Type: Person PersonID ComputerSkill: SQL 444, Type: Person PersonID ProgrammingSkill: Java
  • 35. 35 Benefits of DRIF + SE • Leverages syntactic integration provided by DRIF, semantic integration provided by the SE vocabulary and annotations of native sources, and rich semantics provided by ontologies in general • Entering Skill = Java (which will be re-written at run time as: Skill = Java OR ComputerSkill = Java OR ProgrammingSkill = Java OR NetworkSkill = Java) will return: persons 111 and 444 • Entering ComputerSkill = Java OR ComputerSkill = SQL will return: persons 333 and 444 • entering ProgrammingSkill = Java will return: person 444 • entering Description = Programming will return: person 111 • Allows to query/search and manipulate native representations • Light-weight non-intrusive approach that can be improved and refined without impacting the Dataspace
  • 36. 36 Index Contents without the SE Index entries based on native vocabularies Index Entry Associated Field-Value 111, PersonID Type: Person Name: Java Description: Programming 333, ID Type: Person SkillDescr: SQL 444, EmplID Type: Person SkillName: Java
  • 37. 37 Problems • Even for our toy example we can see how much manual effort the analyst needs to apply in performing search without SE – and even then the information he will gain will be meager in comparison with what is made available through the Index with SE. • For example, if an analyst is familiar with the labels used in Db1 and is thus in a position to enter Name = Java, his query will still return only: person 111. Directly salient Db4 information will thus be missed.
  • 38. 38 Additional Notes on the SE process • Original data and data-semantics are included in the Dataspace without loss and or distortion; thus there is no need to cover all semantics of the Dataspace – what is unlikely to be used in search or is not important for integration will still be available when needed • A complex ontology is not needed – a common and shared vocabulary is sufficient for virtual semantic integration and search/analytics • The approach is very flexible, and investments can be made in specific areas according to need (pay-as-you-go) • The approach is tunable – if the chosen annotations of a particular subset of a source data-model are too general for data analyses, the respective ontologies can be further developed and source models re-annotated
  • 39. 39 Benefits of the Approach • Does not interfere with the source content • Enhancement enables this content to evolve in a cumulative fashion as it accommodates new kinds of data • Does not depend on the data resources and can be developed independently from them in an incremental and distributed fashion • Provides a more consistent, homogeneous, and well-articulated presentation of the content which originates in multiple internally inconsistent and heterogeneous systems • Makes management and exploitation of the content more cost- effective • The use of the selected ontologies brings integration with other government initiatives and brings the system closer to the federally mandated net-centric data strategy • Creates an integrated content that is effectively searchable and that provides content to which more powerful analytics can be applied
  • 40. 40 Towards Globalization and Sharing • Using the SE approach to create a Shared Semantic Resource for the Intelligence Community to enable interoperability across systems • Applying it directly to or projecting its contents on a particular integration solution
  • 41. 41 References • Smith B. et al. Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012. • • Smith B. et al., “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012. • • Salmen D. et al. Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.
  • 42. Follow Us Data Tactics Corporation 7901 Jones Branch Dr. Suite 700 McLean, VA 22102 www.data-tactics-corp.com