SlideShare a Scribd company logo
1 of 4
Download to read offline
 
	
  

                                                                                                                                                                 Linked Enterprise Data:
                                                                                                                                                          leveraging the Semantic Web stack
                                                                                                                                                             in a corporate IS environment


                                                               This paper has been selected and presented in the Industry track at ISWC 2012 Boston
                                                                                                                                                                                                            Fabrice Lacroix – Antidot - lacroix@antidot.net




       The context
       Business information systems (IS) have developed incrementally. Each new operating need has
       generated an ad hoc application: ERP, CRM, EDM, directories, messaging, extranet and so on. IS
       development has been driven by applications and processes, each new application creating another
       data silo. Organizations are now facing a new challenge: how to manage and extract value from this
       disparate, isolated data. Companies need an agile information system to deliver new applications at an
       ever-increasing pace, developed from existing data, without creating a new warehouse or adding
       complexity.
       Over the past twenty years, various solutions attempting to tackle the problems raised by data
       proliferation have appeared: BI, MDM, SOA. While these tools undoubtedly provide benefits, they entail
       in most cases a long and costly deployment process and make the overall system even more complex.
       What’s more, none of them is able to address the challenges of an ever faster changing technological
       environment. A versatile IS should:
                                  •                          Pool data to create information that will provide a new operational service,
                                  •                          Integrate and distribute data between applications, both internally and externally with its
                                                             ecosystem,
                                  •                          Provide an information infrastructure that emphasizes agility and ease of use.
       Therefore, we need to look beyond the technological issues and change the paradigm. Instead of
       focusing on applications, we must place the data at the heart of the approach. And for that, the recent
       evolution of the Web blazes the trail.


       Why we use the Semantic Web stack
       Originally designed to serve as a universal document publication system, the Web has radically evolved
       over the past 15 years. The Web of Data, also known as the Semantic Web, is the latest iteration of
       the Web, in which computers can process and exchange information automatically and unambiguously.
       It goes well beyond simple access to raw data by providing a way of interweaving the semantized data.
       This process, known as Linked Data, creates a decentralized knowledge base in which the value of
       each piece of information is enhanced by its links to complementary data.
       Being a software vendor in the realm of information access solutions (enterprise search engines, data
       management and enrichment), Antidot has been working for a long time on solutions creating a unified
       informational space drawing on all of the company’s documents and data, meshing unstructured and
       structured information. In 2003 Antidot foresaw in the Semantic Web technologies an elegant way to
       tackle the challenge of enterprise data integration and in 2006 we started evaluating and integrating
       them in our solutions. Four years of development and several major projects with various customers
       and business subjects allowed us to figure out a way to efficiently use those technologies. We strongly
       support Linked Enterprise Data (LED), the application of the Linked Data principles to the corporate
       IS1.
       However, the way we use Semantic Web may seem heretical to the conventional principles. Hereafter
       we report key aspects of our approach.



       	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
       1
        For more information on our LED approach, read our white paper “Linked Enterprise Data – Principles, Uses and
       Benefits” – http://bit.ly/LED-EN (PDF, 24 pages, 5.6MB)



       © Antidot – Linked Enterprise Data – ISWC 2012                                                                                                                                                                                                         1/4
How we use the Semantic Web stack
The classical Semantic Web architecture for
integrating data from various silos relies on a
federated    principle where     a   query    is
synchronously distributed over the sources
through SPARQL endpoints exposed by each of
them.
This approach presents many scientific and
technological challenges but considering the
rationale behind the Web of Data and the need
to work in the gigantic open Web space, this
seems to be the only reasonable way to make it
work.

Though theoretically correct, this approach is not applicable to the corporate IS for a large variety of
reasons:
    •    The corporate information system is built with numerous legacy or closed applications that
         cannot be adapted or extended with Sparql endpoints.
    •    The enterprise information realm is made up at 80% of unstructured or semi-structured data
         that cannot fit in the model as such.
    •    Enterprises do not want access to raw data in RDF format. They want to reap valuable
         information derived from the data, which requires large and complex computations to create
         these new informational objects.
    •    The bottom-up approach of mapping silos and their data to RDF to fit the model requires an
         enormous work for defining vocabularies or ontologies for each source, which is a too heavy
         investment.
    •    Companies dream of seamlessly integrating external data to leverage their internal
         information. But this external data is mostly available in XML or JSON through Web Services,
         and not yet in RDF, so that using Sparql as a way to query and integrate does not make sense.
    •    IT departments have invested heavily in their “relational database for storing / XML for
         exchanging / Web apps for accessing” infrastructure. Their staffs are trained for this paradigm.
         They lack in-house skills for integrating the graph-way-of-thinking.
    •    Stability matters most and Semantic Web technology is unknown, considered as new and
         immature: CIOs are not ready to take the risk of adding load and technological uncertainty on
         systems that are critical to the company for its daily business operations.
For all these reasons, the RDF-Sparql paradigm as described above is not ready to enter the corporate
IS and it may even face some resistance.
However, we think the Semantic Web is the solution to create an agile information system. The way
we use it at Antidot is tightly related to the architecture of the data processing workflow we set up in
projects. Being a long time software vendor of information access solutions, we have rapidly come to
the conclusion that there is no good search engine, whatever the technology, if the data quality is not
good enough.
To meet this need, we have developed Antidot Information Factory (AIF), a software solution
designed specifically to enrich and leverage structured and unstructured data. Antidot Information
Factory is an "information generator" that orchestrates large-scale processing of existing data and
automates publishing of enriched or newly created information.




The data processing workflows, named dataflows, always have the same pattern: Capture – Normalize
– Semantize – Enrich – Build – Expose.




© Antidot – Linked Enterprise Data – ISWC 2012                                                       2/4
Harvest and Normalize – Those are regular functions as seen in ETL systems: extract the data from
the sources, clean it and transform it. We tailor the Normalize process by aligning fields content in
order to mesh data coming from different sources (such as records from a CRM and an ERP). For
extracting records from relational databases and transforming selected records in RDF, we have
developed a R2RML and Direct Mapping compliant module named db2triples2.


Semantize – This critical step is a corner stone of our approach. We cherry-pick a subset of
interesting fields of each object and create their RDF triples counterpart.
Generating the triples requires two actions:
                                   URI generation – The URIs are generated according to few principles. We chose the form of a
                                URL even though they are not directly dereferenceable: since the sources are not Semantic Web
                                compliant, our solution is in charge of maintaining a mapping allowing the real server access. The
                                path contains the necessary information to access the record in the source system.
                                Example: a record extracted from the CRM will have a URI of the form
                                http://data.mycompany.com/crm/expr_id, where data.mycompany.com points to our solution,
                                crm is a nickname for the CRM source chosen during setup, and expr_id is an expression and/or
                                identifier that unambiguously points the record and allows backtracking to the original data.
                                    Choosing the predicates – Experience has led us to the conclusion that “big ontologies” and
                                “upfront ontology design” must be avoided in enterprise projects. The idea is not to model or
                                describe each and every aspect of the processes and data inside the company, but to build
                                incrementally the necessary information. Not to mention the fact that in our approach, the graph
                                is a means and not an aim. Therefore, we foster the use of existing vocabularies (like DC, Foaf,
                                Organization, …) and mesh them as needed. When the enterprise has defined internal XML
                                formats, we reuse them by transforming tags and attributes name to triples predicates.
                                Pragmatism is the rule.
Unstructured documents like office files, PDF files or emails content don’t fit the RDF formalism and
cannot be linked to the graph as such. Extra work is necessary:
                                   First, we transform available metadata like document name, author, creation date, sender and
                                receivers for a mail, subject and so forth into RDF.
                                   Then, we use text-mining technology to extract named entities like people, organizations,
                                products, etc. from the documents. These entities lists are generated using different sources of
                                the enterprise: directories, CRM or ERP are providing people and company names, while products
                                are listed in ERPs or taxonomies. Each annotation generates a triple where the subject is the
                                document URI, the object is the entity URI, and the predicate depends on the entity type but
                                mostly means “quotes” (doc_URI quotes entity_URI).
                                    And last, we run various specific algorithms designed to do document versus document
                                comparison to detect duplicates, different versions of the same document, inclusions, semantically
                                related ones, etc. Each of these relations is inserted in the graph with an appropriate predicate.
By doing so, and thanks to the syntactic alignment done at the Normalize step, we start linking data
together, mostly based on shared field values. This creates a first sparse graph.
But the key question here is “Why do we transform only a subpart of the harvested data in RDF and
what do we do with the rest of it?” Indeed, not to mention the fact that text documents are not graph
friendly, as stated above we only transform a selected part of the structured data into RDF:
                                   From a technical standpoint we don’t feel like the technology is mature and stable enough to
                                proceed differently. In industrial projects, millions of seed objects are regularly extracted from the
                                sources (invoices, clients, files, etc.), each having tens of fields. And having billions of triples
                                doesn’t scale well in available triplestores.
                                   Transforming only a subpart of the data largely simplifies the task of choosing the predicates,
                                hence reinforces the choice of using many small available vocabularies instead of big ontologies.
                                   The data that is not transformed to RDF is stored by Information Factory for later use during
                                the Build step.
The very diversity of enterprise data implies a flexible and pragmatic strategy: the graph is only a part
of it.




	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
2
  db2triples is compatible with R2RML and Direct Mapping Recommendations from May 29th 2012 and has
successfully passed the validation tests (http://www.w3.org/2001/sw/rdb2rdf/implementation-report/). We have
open sourced it and made it available at http://github.com/antidot/db2triples.



© Antidot – Linked Enterprise Data – ISWC 2012                                                                                                                                                                                     3/4
Enrich - The next step is dedicated to enriching both the objects (the records captured in the sources)
and the graph. Depending on the content type and of the project needs, we run various algorithms
such as text mining, classification, topic detection, etc depending on the content type and project
needs. This complementary information is included into the graph. We also integrate external
information either by importing data sets in the graph or by querying external sources and mapping
the result to RDF triples to link them to the graph.


Build - Once the graph is decorated, the key step is to build the knowledge objects that are the real
target of the project. We start by executing inference rules (mostly Sparql construct queries) in order
to saturate the graph. Then we extract those objects from the graph: this requires a mix of select
queries plus dedicated graph traversal and sub-graph selection algorithms. Moreover, since we haven’t
originally transformed all the data in RDF nor transferred it to the graph, the objects we extract are
like skeletons that need to be complemented with the original data that was left outside the graph: this
task is completed through specific algorithms and tools embedded in Information Factory, designed to
merge RDF-objects extracted from the graph with structured data and documents previously
harvested.


Expose - Finally, the knowledge objects created are made available to the IS and the users through
various ways, depending on the environment and the needs. They can be dumped in XML files,
injected in a database, or indexed and made available through a semantic search engine following the
Search Base Application – SBA - paradigm. Of course, these objects can be loaded in a triplestore and
made available in RDF native format through a dedicated Sparql endpoint.


Hence, we have chosen the Semantic Web technology for very pragmatic reasons:
    •    The RDF/OWL formalism is perfectly suited for modeling. Its graph nature fits our needs of
         agility and flexibility. It fosters bottom-up small-scale projects that will offer a quick,
         inexpensive response to an isolated business need. Each new project will gradually enlarge the
         information graph, without the need to revise or overhaul the initial models.
    •    The Semantic Web benefits from an ecosystem of existing solutions and tools (triplestores,
         inference engines, Sparql endpoints, modeling tools, etc.), as well as partners and skills. The
         open and standard nature of its formats and protocols guarantees investment sustainability:
         the created data is always accessible and reusable, independently of the technology providers.
    •    The momentum around Open Data and Linked Data strengthens the credibility of the
         approach. Though not yet mainstream, CIOs are interested in evaluating the technology and
         benefits behind these concepts, and they approve of the opportunity to extend toward a real
         Semantic Web project in the future, leveraging this first investment.




Conclusion
Linked Enterprise Data’ strategy and underlying Semantic Web standards represent a comprehensive
response to the challenge of creating an agile, high-performance information system.
Our approach has proven to be pragmatic and efficient, delivering the project agility and the data
versatility expected. Our value proposition is not technology itself: we offer to create valuable
information in an agile way for business needs.
CIOs don’t express yet the need for Semantic Web or Linked Data, they haven’t planned to setup a
triplestore in their infrastructure. But we think that the Semantic Web stack is the right tool.
Linked Enterprise Data approach will prove its value whereas we might not be able to convince our
customers to dive directly into a global Web of Data approach. And it might allow later projects to use
more directly and openly Semantic Web technologies.




© Antidot – Linked Enterprise Data – ISWC 2012                                                      4/4

More Related Content

What's hot

The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
The opportunity of the business data lake
The opportunity of the business data lakeThe opportunity of the business data lake
The opportunity of the business data lakeCapgemini
 
BP Data Modelling as a Service (DMaaS)
BP Data Modelling as a Service (DMaaS)BP Data Modelling as a Service (DMaaS)
BP Data Modelling as a Service (DMaaS)Christopher Bradley
 
ER/Studio Enterprise Team Edition Datasheet
ER/Studio Enterprise Team Edition DatasheetER/Studio Enterprise Team Edition Datasheet
ER/Studio Enterprise Team Edition DatasheetEmbarcadero Technologies
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data VirtualizationKenneth Peeples
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Kun Le
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsBoris Otto
 
Database Architecture Proposal
Database Architecture ProposalDatabase Architecture Proposal
Database Architecture ProposalDATANYWARE.com
 
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data MigrationConnexica
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DMichael Swanson
 
Teradata Overview
Teradata OverviewTeradata Overview
Teradata OverviewTeradata
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
 
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareDATAVERSITY
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Findwise
 

What's hot (20)

The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
The opportunity of the business data lake
The opportunity of the business data lakeThe opportunity of the business data lake
The opportunity of the business data lake
 
BP Data Modelling as a Service (DMaaS)
BP Data Modelling as a Service (DMaaS)BP Data Modelling as a Service (DMaaS)
BP Data Modelling as a Service (DMaaS)
 
ER/Studio Enterprise Team Edition Datasheet
ER/Studio Enterprise Team Edition DatasheetER/Studio Enterprise Team Edition Datasheet
ER/Studio Enterprise Team Edition Datasheet
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
 
Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569Architecting a-big-data-platform-for-analytics 24606569
Architecting a-big-data-platform-for-analytics 24606569
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and Options
 
Database Architecture Proposal
Database Architecture ProposalDatabase Architecture Proposal
Database Architecture Proposal
 
Data lakes
Data lakesData lakes
Data lakes
 
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data Migration
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Syngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&DSyngenta's Predictive Analytics Platform for Seeds R&D
Syngenta's Predictive Analytics Platform for Seeds R&D
 
Teradata Overview
Teradata OverviewTeradata Overview
Teradata Overview
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented MiddlewareADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
 

Viewers also liked

Les Roumains et les Bulgares apportent plus qu'ils ne coûtent
Les Roumains et les Bulgares apportent plus qu'ils ne coûtentLes Roumains et les Bulgares apportent plus qu'ils ne coûtent
Les Roumains et les Bulgares apportent plus qu'ils ne coûtentThierry Labro
 
Projet socialiste 30 mesures phares
Projet socialiste 30 mesures pharesProjet socialiste 30 mesures phares
Projet socialiste 30 mesures pharesJuanico
 
Réforme du ststatut ue
Réforme du ststatut ueRéforme du ststatut ue
Réforme du ststatut ueThierry Labro
 
Pwc real-estate-2020-building-the-future
Pwc real-estate-2020-building-the-futurePwc real-estate-2020-building-the-future
Pwc real-estate-2020-building-the-futureThierry Labro
 
Le rapport de la Cour des comptes sur les finances publiques 2013
Le rapport de la Cour des comptes sur les finances publiques 2013Le rapport de la Cour des comptes sur les finances publiques 2013
Le rapport de la Cour des comptes sur les finances publiques 2013Thierry Labro
 
TER convention lorraine 2007-2016
TER convention lorraine 2007-2016TER convention lorraine 2007-2016
TER convention lorraine 2007-2016Thierry Labro
 

Viewers also liked (7)

Les Roumains et les Bulgares apportent plus qu'ils ne coûtent
Les Roumains et les Bulgares apportent plus qu'ils ne coûtentLes Roumains et les Bulgares apportent plus qu'ils ne coûtent
Les Roumains et les Bulgares apportent plus qu'ils ne coûtent
 
Projet socialiste 30 mesures phares
Projet socialiste 30 mesures pharesProjet socialiste 30 mesures phares
Projet socialiste 30 mesures phares
 
Réforme du ststatut ue
Réforme du ststatut ueRéforme du ststatut ue
Réforme du ststatut ue
 
Pwc real-estate-2020-building-the-future
Pwc real-estate-2020-building-the-futurePwc real-estate-2020-building-the-future
Pwc real-estate-2020-building-the-future
 
Gfci15 15 march2014
Gfci15 15 march2014Gfci15 15 march2014
Gfci15 15 march2014
 
Le rapport de la Cour des comptes sur les finances publiques 2013
Le rapport de la Cour des comptes sur les finances publiques 2013Le rapport de la Cour des comptes sur les finances publiques 2013
Le rapport de la Cour des comptes sur les finances publiques 2013
 
TER convention lorraine 2007-2016
TER convention lorraine 2007-2016TER convention lorraine 2007-2016
TER convention lorraine 2007-2016
 

Similar to ISWC 2012 - Industry Track: "Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment."

bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000Kartik Padmanabhan
 
Go from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdfGo from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdfwebmaster553228
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...Enterprise Management Associates
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeSG Analytics
 
Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16Mainstay
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)MarketingArrowECS_CZ
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataPrecisely
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Denodo
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...LindaWatson19
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart CommerceBuilding Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart CommerceAlex Liu
 

Similar to ISWC 2012 - Industry Track: "Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment." (20)

bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000
 
Go from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdfGo from data to decision in one unified platform.pdf
Go from data to decision in one unified platform.pdf
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
How to Merge the Data Lake and the Data Warehouse: The Power of a Unified Ana...
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16Ericsson hds 8000 wp 16
Ericsson hds 8000 wp 16
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
pwc-data-mesh.pdf
pwc-data-mesh.pdfpwc-data-mesh.pdf
pwc-data-mesh.pdf
 
Modernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your DataModernize your Infrastructure and Mobilize Your Data
Modernize your Infrastructure and Mobilize Your Data
 
Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)Why Data Mesh Needs Data Virtualization (ASEAN)
Why Data Mesh Needs Data Virtualization (ASEAN)
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Building Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart CommerceBuilding Data Science Ecosystems for Smart Cities and Smart Commerce
Building Data Science Ecosystems for Smart Cities and Smart Commerce
 

More from Antidot

Comment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireComment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireAntidot
 
Antidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot
 
Comment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteComment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteAntidot
 
Antidot Content Classifier
Antidot Content ClassifierAntidot Content Classifier
Antidot Content ClassifierAntidot
 
Cas client CAIJ
Cas client CAIJCas client CAIJ
Cas client CAIJAntidot
 
Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Antidot
 
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Antidot
 
Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Antidot
 
Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Antidot
 
Flyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRFlyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRAntidot
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper Antidot
 
Do’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceDo’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceAntidot
 
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Antidot
 
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Antidot
 
En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?Antidot
 
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Antidot
 
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Antidot
 
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Antidot
 
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Antidot
 
Comment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesComment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesAntidot
 

More from Antidot (20)

Comment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireComment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaire
 
Antidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenus
 
Comment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteComment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texte
 
Antidot Content Classifier
Antidot Content ClassifierAntidot Content Classifier
Antidot Content Classifier
 
Cas client CAIJ
Cas client CAIJCas client CAIJ
Cas client CAIJ
 
Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...
 
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
 
Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?
 
Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...
 
Flyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRFlyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FR
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper
 
Do’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceDo’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerce
 
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
 
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
 
En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?
 
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
 
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
 
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
 
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
 
Comment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesComment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertes
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

ISWC 2012 - Industry Track: "Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment."

  • 1.     Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment This paper has been selected and presented in the Industry track at ISWC 2012 Boston Fabrice Lacroix – Antidot - lacroix@antidot.net The context Business information systems (IS) have developed incrementally. Each new operating need has generated an ad hoc application: ERP, CRM, EDM, directories, messaging, extranet and so on. IS development has been driven by applications and processes, each new application creating another data silo. Organizations are now facing a new challenge: how to manage and extract value from this disparate, isolated data. Companies need an agile information system to deliver new applications at an ever-increasing pace, developed from existing data, without creating a new warehouse or adding complexity. Over the past twenty years, various solutions attempting to tackle the problems raised by data proliferation have appeared: BI, MDM, SOA. While these tools undoubtedly provide benefits, they entail in most cases a long and costly deployment process and make the overall system even more complex. What’s more, none of them is able to address the challenges of an ever faster changing technological environment. A versatile IS should: • Pool data to create information that will provide a new operational service, • Integrate and distribute data between applications, both internally and externally with its ecosystem, • Provide an information infrastructure that emphasizes agility and ease of use. Therefore, we need to look beyond the technological issues and change the paradigm. Instead of focusing on applications, we must place the data at the heart of the approach. And for that, the recent evolution of the Web blazes the trail. Why we use the Semantic Web stack Originally designed to serve as a universal document publication system, the Web has radically evolved over the past 15 years. The Web of Data, also known as the Semantic Web, is the latest iteration of the Web, in which computers can process and exchange information automatically and unambiguously. It goes well beyond simple access to raw data by providing a way of interweaving the semantized data. This process, known as Linked Data, creates a decentralized knowledge base in which the value of each piece of information is enhanced by its links to complementary data. Being a software vendor in the realm of information access solutions (enterprise search engines, data management and enrichment), Antidot has been working for a long time on solutions creating a unified informational space drawing on all of the company’s documents and data, meshing unstructured and structured information. In 2003 Antidot foresaw in the Semantic Web technologies an elegant way to tackle the challenge of enterprise data integration and in 2006 we started evaluating and integrating them in our solutions. Four years of development and several major projects with various customers and business subjects allowed us to figure out a way to efficiently use those technologies. We strongly support Linked Enterprise Data (LED), the application of the Linked Data principles to the corporate IS1. However, the way we use Semantic Web may seem heretical to the conventional principles. Hereafter we report key aspects of our approach.                                                                                                                 1 For more information on our LED approach, read our white paper “Linked Enterprise Data – Principles, Uses and Benefits” – http://bit.ly/LED-EN (PDF, 24 pages, 5.6MB) © Antidot – Linked Enterprise Data – ISWC 2012 1/4
  • 2. How we use the Semantic Web stack The classical Semantic Web architecture for integrating data from various silos relies on a federated principle where a query is synchronously distributed over the sources through SPARQL endpoints exposed by each of them. This approach presents many scientific and technological challenges but considering the rationale behind the Web of Data and the need to work in the gigantic open Web space, this seems to be the only reasonable way to make it work. Though theoretically correct, this approach is not applicable to the corporate IS for a large variety of reasons: • The corporate information system is built with numerous legacy or closed applications that cannot be adapted or extended with Sparql endpoints. • The enterprise information realm is made up at 80% of unstructured or semi-structured data that cannot fit in the model as such. • Enterprises do not want access to raw data in RDF format. They want to reap valuable information derived from the data, which requires large and complex computations to create these new informational objects. • The bottom-up approach of mapping silos and their data to RDF to fit the model requires an enormous work for defining vocabularies or ontologies for each source, which is a too heavy investment. • Companies dream of seamlessly integrating external data to leverage their internal information. But this external data is mostly available in XML or JSON through Web Services, and not yet in RDF, so that using Sparql as a way to query and integrate does not make sense. • IT departments have invested heavily in their “relational database for storing / XML for exchanging / Web apps for accessing” infrastructure. Their staffs are trained for this paradigm. They lack in-house skills for integrating the graph-way-of-thinking. • Stability matters most and Semantic Web technology is unknown, considered as new and immature: CIOs are not ready to take the risk of adding load and technological uncertainty on systems that are critical to the company for its daily business operations. For all these reasons, the RDF-Sparql paradigm as described above is not ready to enter the corporate IS and it may even face some resistance. However, we think the Semantic Web is the solution to create an agile information system. The way we use it at Antidot is tightly related to the architecture of the data processing workflow we set up in projects. Being a long time software vendor of information access solutions, we have rapidly come to the conclusion that there is no good search engine, whatever the technology, if the data quality is not good enough. To meet this need, we have developed Antidot Information Factory (AIF), a software solution designed specifically to enrich and leverage structured and unstructured data. Antidot Information Factory is an "information generator" that orchestrates large-scale processing of existing data and automates publishing of enriched or newly created information. The data processing workflows, named dataflows, always have the same pattern: Capture – Normalize – Semantize – Enrich – Build – Expose. © Antidot – Linked Enterprise Data – ISWC 2012 2/4
  • 3. Harvest and Normalize – Those are regular functions as seen in ETL systems: extract the data from the sources, clean it and transform it. We tailor the Normalize process by aligning fields content in order to mesh data coming from different sources (such as records from a CRM and an ERP). For extracting records from relational databases and transforming selected records in RDF, we have developed a R2RML and Direct Mapping compliant module named db2triples2. Semantize – This critical step is a corner stone of our approach. We cherry-pick a subset of interesting fields of each object and create their RDF triples counterpart. Generating the triples requires two actions: URI generation – The URIs are generated according to few principles. We chose the form of a URL even though they are not directly dereferenceable: since the sources are not Semantic Web compliant, our solution is in charge of maintaining a mapping allowing the real server access. The path contains the necessary information to access the record in the source system. Example: a record extracted from the CRM will have a URI of the form http://data.mycompany.com/crm/expr_id, where data.mycompany.com points to our solution, crm is a nickname for the CRM source chosen during setup, and expr_id is an expression and/or identifier that unambiguously points the record and allows backtracking to the original data. Choosing the predicates – Experience has led us to the conclusion that “big ontologies” and “upfront ontology design” must be avoided in enterprise projects. The idea is not to model or describe each and every aspect of the processes and data inside the company, but to build incrementally the necessary information. Not to mention the fact that in our approach, the graph is a means and not an aim. Therefore, we foster the use of existing vocabularies (like DC, Foaf, Organization, …) and mesh them as needed. When the enterprise has defined internal XML formats, we reuse them by transforming tags and attributes name to triples predicates. Pragmatism is the rule. Unstructured documents like office files, PDF files or emails content don’t fit the RDF formalism and cannot be linked to the graph as such. Extra work is necessary: First, we transform available metadata like document name, author, creation date, sender and receivers for a mail, subject and so forth into RDF. Then, we use text-mining technology to extract named entities like people, organizations, products, etc. from the documents. These entities lists are generated using different sources of the enterprise: directories, CRM or ERP are providing people and company names, while products are listed in ERPs or taxonomies. Each annotation generates a triple where the subject is the document URI, the object is the entity URI, and the predicate depends on the entity type but mostly means “quotes” (doc_URI quotes entity_URI). And last, we run various specific algorithms designed to do document versus document comparison to detect duplicates, different versions of the same document, inclusions, semantically related ones, etc. Each of these relations is inserted in the graph with an appropriate predicate. By doing so, and thanks to the syntactic alignment done at the Normalize step, we start linking data together, mostly based on shared field values. This creates a first sparse graph. But the key question here is “Why do we transform only a subpart of the harvested data in RDF and what do we do with the rest of it?” Indeed, not to mention the fact that text documents are not graph friendly, as stated above we only transform a selected part of the structured data into RDF: From a technical standpoint we don’t feel like the technology is mature and stable enough to proceed differently. In industrial projects, millions of seed objects are regularly extracted from the sources (invoices, clients, files, etc.), each having tens of fields. And having billions of triples doesn’t scale well in available triplestores. Transforming only a subpart of the data largely simplifies the task of choosing the predicates, hence reinforces the choice of using many small available vocabularies instead of big ontologies. The data that is not transformed to RDF is stored by Information Factory for later use during the Build step. The very diversity of enterprise data implies a flexible and pragmatic strategy: the graph is only a part of it.                                                                                                                 2 db2triples is compatible with R2RML and Direct Mapping Recommendations from May 29th 2012 and has successfully passed the validation tests (http://www.w3.org/2001/sw/rdb2rdf/implementation-report/). We have open sourced it and made it available at http://github.com/antidot/db2triples. © Antidot – Linked Enterprise Data – ISWC 2012 3/4
  • 4. Enrich - The next step is dedicated to enriching both the objects (the records captured in the sources) and the graph. Depending on the content type and of the project needs, we run various algorithms such as text mining, classification, topic detection, etc depending on the content type and project needs. This complementary information is included into the graph. We also integrate external information either by importing data sets in the graph or by querying external sources and mapping the result to RDF triples to link them to the graph. Build - Once the graph is decorated, the key step is to build the knowledge objects that are the real target of the project. We start by executing inference rules (mostly Sparql construct queries) in order to saturate the graph. Then we extract those objects from the graph: this requires a mix of select queries plus dedicated graph traversal and sub-graph selection algorithms. Moreover, since we haven’t originally transformed all the data in RDF nor transferred it to the graph, the objects we extract are like skeletons that need to be complemented with the original data that was left outside the graph: this task is completed through specific algorithms and tools embedded in Information Factory, designed to merge RDF-objects extracted from the graph with structured data and documents previously harvested. Expose - Finally, the knowledge objects created are made available to the IS and the users through various ways, depending on the environment and the needs. They can be dumped in XML files, injected in a database, or indexed and made available through a semantic search engine following the Search Base Application – SBA - paradigm. Of course, these objects can be loaded in a triplestore and made available in RDF native format through a dedicated Sparql endpoint. Hence, we have chosen the Semantic Web technology for very pragmatic reasons: • The RDF/OWL formalism is perfectly suited for modeling. Its graph nature fits our needs of agility and flexibility. It fosters bottom-up small-scale projects that will offer a quick, inexpensive response to an isolated business need. Each new project will gradually enlarge the information graph, without the need to revise or overhaul the initial models. • The Semantic Web benefits from an ecosystem of existing solutions and tools (triplestores, inference engines, Sparql endpoints, modeling tools, etc.), as well as partners and skills. The open and standard nature of its formats and protocols guarantees investment sustainability: the created data is always accessible and reusable, independently of the technology providers. • The momentum around Open Data and Linked Data strengthens the credibility of the approach. Though not yet mainstream, CIOs are interested in evaluating the technology and benefits behind these concepts, and they approve of the opportunity to extend toward a real Semantic Web project in the future, leveraging this first investment. Conclusion Linked Enterprise Data’ strategy and underlying Semantic Web standards represent a comprehensive response to the challenge of creating an agile, high-performance information system. Our approach has proven to be pragmatic and efficient, delivering the project agility and the data versatility expected. Our value proposition is not technology itself: we offer to create valuable information in an agile way for business needs. CIOs don’t express yet the need for Semantic Web or Linked Data, they haven’t planned to setup a triplestore in their infrastructure. But we think that the Semantic Web stack is the right tool. Linked Enterprise Data approach will prove its value whereas we might not be able to convince our customers to dive directly into a global Web of Data approach. And it might allow later projects to use more directly and openly Semantic Web technologies. © Antidot – Linked Enterprise Data – ISWC 2012 4/4