Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )
social pharmacy d-pharm 1st year by Pragati K. Mahajan
Semantic Web questions we couldn't ask 10 years ago
1. Frank van Harmelen
All the questions
we couldn’t ask
10 years ago
Creative Commons License:
allowed to share & remix,
but must attribute & non-commercial
2. The bad news:
you’re going to get 3 talks
1. Where are we now?
– The Semantic Web in 4 principles & a movie
– Did we get anywhere?
2. Now what?
– Questions we couldn’t ask 10 years ago
3. Methodological hobby horse
– Science or engineering?
4. a web page
in English
about
Frank
And this
page is
about
LarKC
and another
web page
about
Frank
And this
page is
about
Stefano
This page
is about
the Vrije
Uniersitei
“The Semantic Web” a.k.a. “The Web of Data”
10. P4. explicit & formal semantics
• assign types to things
• assign types to relations
• organise types in a hierarchy
• impose constraints on
possible interpretations
11. Examples of “semantics”
Frank Lynda
married-to
• Frank is male
• married-to relates
males to females
• married-to relates
1 male to 1 female
• Lynda = Hazel
lowerbound upperbound
Hazel
13. Did we get anywhere?
• Google = meaningful search
• NXP = data integration
• BBC = content re-use
• Wallmart= SEO (RDF-a)
• data.gov = data-publishing
14.
15. NXP: data integration
about 26.000 products
Triple store
Triple store
Departments
Customers
Notice the 3-layer architecture
21. Did we get anywhere?
• Google = meaningful search
• NXP = data integration
• BBC = content re-use
• BestBuy = SEO (RDF-a)
• data.gov = data-publishing
Oracle DB, IBM DB2
Reuters,
New York Times, Guardian
Sears, Kmart, OverStock,
Volkswagen, Renault
GoodRelations ontology,
schema.org
42. Law: |T|<< |A|
T = terminological knowledge
A = assertional knowledge
43. Dataset Closure of
T
Closure of
T + A
Ratio
LUBM 8sec 1h15min 562
Linked Life Data 332sec 1h05min 11
FactForge 89sec 2h45min 111
We don’t have any good laws on complexity
Editor's Notes
@TODO@: do a slide on data-integration at NXP@TODO@: find a slide on RDF-a in Wallmart etc
@TODO@: do a slide on data-integration at NXP@TODO@: find a slide on RDF-a in Wallmart etc@TODO@: replace company names with logo’s?
@@ Add: trust@@Add: noisy data (inconsistency, misleading, incomplete)
Suggests to let a 1000 ontologies blossom, to have lots of connections between lots of datasets.
Some known information laws already apply:Zipf law / long tail distributions are everywhere= vast majority of occurrences are caused by a vast minority of itemsthis phenomen is sometimes a blessing, sometimes a cursenice for compressionawful for load balancingand knowing the law helps us deal with the phenomenonthat’s why it’s worth trying to discover these laws.
@add another long-tail example@ (e.g. in-degree?)
Physical distribution doesn’t work the web is not a database (and never will be)@@ADD: even worse for long tail
- Compare to physics laws: gravity F = G m_1 m_2 / r^2 conservation of energy (dE/dt = 0), increase of entropy (dS/dt \geq 0),we cannot yet hope for such beautifully mathematised laws,in such a concise language that fits on a very compact space computer science is like alchemy, a "protoscience"
Some known information laws already apply:Zipf law / long tail distributions are everywhere= vast majority of occurrences are caused by a vast minority of itemsthis phenomen is sometimes a blessing, sometimes a cursenice for compressionawful for load balancingand knowing the law helps us deal with the phenomenonthat’s why it’s worth trying to discover these laws.
this only works because terminologies are in general only simple hierarchies. (it’s easy to build examples where this doesn’t hold, but in practice it turns out to hold).So, this law depends on the previous lawas an aside: the graph is now big enough to do statistics on it.
use complexity” as a measure, not just “size”. spell out LLD,don’t break FactForge
- Semantic Web = engineering enterprise.- This talk = what are the scientific observations/facts/theories after 10 yearsWhat are the big CS (or: KR?) lessons we can learn from a decade of SemWeb?(= regard SemWeb adoption as a giant laboratory for CS laws)Did we learn any science? (and of course the laws won’t be specific to SemWeb? Hopefully not. Hopefully they are generic laws about the structure and behaviour of informaiton!)
a gazillion new open questionsdon’t just try to build things, also try to understand thingsdon’t just ask how, also ask why