Enterprises are drowning in data that they can't find, access, or use. For many years, enterprises have wrestled with the best way to combine all that data into actionable information without building systems that break as schemas evolve. Approaches like warehousing and ETL can be brittle in the face of changing data sources or expensive to create. Data integration at the application level is common but this results in significant complexity in the code. Data-oriented web services attempt to provide reusable sources of integrated data, however these have just added another layer of data access that constrain query and access patterns.
This talk will look at how semantic web technologies can be used to make existing data visible and actionable using standards like RDF (data), R2RML (data translation), OWL (schema definition and integration), SPARQL (federated query), and RIF (rules). The semantic web approach takes the data you already have and makes that data available for query and use across your existing data sources. This base capability is an excellent platform for building federated analytics.
2. Data Integration Problems
1. Discovery and description
2. Internal integration
3. External integration
4. Nomadic data
5. Inflexible interfaces
2
3. 1. Discovery and description
• What data do we have?
• What does it mean?
• Who is creating it?
• Who is using it?
3
4. 2. Internal integration
• Does your order entity have the same
fields as my entity?
• Are your codes for order status the same
as my codes for order status?
4
5. 3. External integration
• Does a public source of information
exist?
• How do the entities in the public source
relate to the entities in my data?
5
6. 4. Nomadic data
• Where does your data come from?
• Which version of the data are you using?
• Why does your data not match my data?
6
7. 5. Inflexible interfaces
• Why can't I see all of my data?
• Why does it take months to expose a
new data element in my application?
7
18. Music Database
Musicians:
MID First Last Inst_ID
1 Eddie Van Halen 10
2 Yo Yo Ma 20
3 Kenny G 30
Instruments: IID Instrument Type
10 Guitar String
20 Cello String
30 Saxophone Woodwind
12
19. Musician Schema
rdfs:Class rdf:Property
rdf:type rdf:type
rdfs:domain music:firstName
music:Musician rdfs:doma
in
rdfs music:lastName
:dom
ain
rdfs:range music:plays
music:Instrument rdfs:dom
ain
rdfs
:do
music:instName
mai
n
music:instType
13
20. Triples From Tables
Musicians: Instruments:
MID First Last Inst_ID IID Instrument Type
1 Eddie Van Halen 10 10 Guitar String
2 Yo Yo Ma 20 20 Cello String
3 Kenny G 30 30 Saxophone Woodwind
Turn each key into a resource and specify the proper
type of each resource:
artist:1 rdf:type music:Musician instrument:10 rdf:type music:Instrument
artist:2 rdf:type music:Musician instrument:20 rdf:type music:Instrument
artist:3 rdf:type music:Musician instrument:30 rdf:type music:Instrument
14
21. Triples From Tables
Musicians: Instruments:
MID First Last Inst_ID IID Instrument Type
1 Eddie Van Halen 10 10 Guitar String
2 Yo Yo Ma 20 20 Cello String
3 Kenny G 30 30 Saxophone Woodwind
Turn each cell into a triple based on the key, property
(mapped per column), and value:
artist:1 music:firstName "Eddie" instrument:10 music:instName "Guitar"
artist:1 music:lastName "Van Halen" instrument:10 music:instType "String"
artist:2 music:firstName "Yo Yo" instrument:20 music:instName "Cello"
artist:2 music:lastName "Ma" instrument:20 music:instType "String"
artist:3 music:firstName "Kenny" instrument:30 music:instName "Saxophone"
artist:3 music:lastName "G" instrument:30 music:instType "Woodwind"
15
22. Triples From Tables
Musicians: Instruments:
MID First Last Inst_ID IID Instrument Type
1 Eddie Van Halen 10 10 Guitar String
2 Yo Yo Ma 20 20 Cello String
3 Kenny G 30 30 Saxophone Woodwind
Turn each foreign key reference into a relationship
between the foreign and primary resources.
artist:1 music:plays instrument:10
artist:1 music:plays instrument:20
artist:2 music:plays instrument:30
16
23. R2RML Triple Mapping
ain music:instName
rdfs:dom
music:Instrument
rdfs:d
omain
music:instType
Instruments:
IID Instrument Type
10 Guitar String
17
33. Registry
• Semantic data sources are self-describing
and use a common protocol
• Easy to build into a registry w/ additional
metadata (also described with RDFS/
OWL)
18
34. Benefits of semantic
technology stack
1. Common data model
2. Precise description
3. Uniform access
4. Federation
19
35. 1. Common data model
• RDF provides common model for both
data and descriptions of all kinds
• Very flexible (but also very fine-grained)
20