SlideShare ist ein Scribd-Unternehmen logo
1 von 32
DSNo%fy:	
  Handling	
  Broken	
  Links	
  	
  
     in	
  the	
  Web	
  of	
  Data	
  
         Niko	
  Popitsch,	
  University	
  of	
  Vienna	
  /	
  Austria	
  
                    niko.popitsch@univie.ac.at	
  

               Joint	
  work	
  with	
  Bernhard	
  Haslhofer	
  
                bernhard.haslhofer@univie.ac.at	
  

                          April	
  30,	
  2010	
  	
  
                     WWW	
  2010	
  Conference	
  	
  
                  Raleigh, North Carolina, USA
Outline	
  

  IntroducIon	
  and	
  problem	
  definiIon	
  

  Related	
  work	
  and	
  soluIon	
  strategies	
  

  DSNoIfy	
  
      Usage	
  scenarios	
  and	
  design	
  
      Core	
  algorithm	
  
      EvaluaIon	
  

  Summary	
  &	
  Discussion	
  

  References	
                                          image: www.freeimages.co.uk




                                                                                       2
image by TBL / Hans Rosling


Linked	
  Data	
  Principles	
  (short	
  version):	
  	
  
(1)  	
  use	
  HTTP	
  URIs	
  to	
  idenIfy	
  resources,	
  	
  
(2)  	
  deliver	
  meaningful	
  representa%ons	
  (e.g.,	
  RDF,	
  XHTML)	
  when	
  these	
  are	
  
     dereferenced	
  
(3)  	
  link	
  to	
  other	
  resources	
  




                                                                                                           3
A	
  Linked	
  Data	
  Example	
  




                                     4
$ curl -H "Accept: application/rdf+xml" http://www.bbc.co.uk/music/artists/
084308bd-1654-436f-ba03-df6697104e19
Links within the data source
 [...]
 <mo:member rdf:resource="/music/artists/5d06fe54-485a-4a07-b506-5f6f719448cb#artist" />
 <mo:member rdf:resource="/music/artists/f332a312-e95b-4413-b6cc-1762a5a6a083#artist" />
 <mo:member rdf:resource="/music/artists/0dcee02c-5d2c-4f5c-9d60-d58a4df32d9e#artist" />
 [...]


RDF links between data sources
 [...]
 <owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" />
 <mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03-
 df6697104e19.html" />
 [...]



 [...]
 <mo:MusicArtist rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist">
   <rdf:type rdf:resource="http://purl.org/ontology/mo/MusicGroup" />
   <foaf:name>Green Day</foaf:name>
 [...]



                                                                                           5
Problem:	
  links	
  can	
  break	
  




                                        6
Ignore	
  broken	
  links	
  ?	
  Not	
  a	
  good	
  idea	
  !	
  


 Broken	
  links	
  on	
  the	
  Web	
  are	
  annoying	
  for	
  humans	
  
   but	
  alternaIve	
  paths	
  may	
  be	
  used:	
  	
  
       search	
  engines,	
  URL	
  manipulaIon,	
  alternaIve	
  	
  
           informaIon	
  providers,	
  etc.	
  

 Much	
  harder	
  for	
  machines	
  in	
  a	
  Web	
  of	
  Data	
  !	
  
   reduced	
  data	
  accessibility	
  
   data	
  inconsistencies	
  




                                                                               7
Avoid	
  broken	
  links	
  ?	
  Great!	
  	
  	
  
But	
  hard	
  to	
  achieve	
  in	
  the	
  Web	
  environment…	
  

  SoluIon	
  strategies	
  that	
  solve	
  problem	
  only	
  parIally:	
  
      RelaIve	
  references	
  
      embedded	
  links	
  
      redundancy	
  

  SoluIon	
  strategies	
  that	
  are	
  not	
  commonly	
  applicable:	
  
      Versioned/staIc	
  collecIons	
  
      regular	
  (predictable)	
  updates	
  
      dynamic	
  links	
  
      indirecIon	
  services	
  (PURLs,	
  DOIs)	
  


                                                                                8
Solve	
  the	
  problem	
  (1/2)	
  :	
  No%fica%on	
  
No%fica%on	
  strategy:	
  
  Data	
  source	
  “knows“	
  about	
  the	
  events	
  that	
  are	
  taking	
  place	
  
  NoIfies	
  clients	
  
  Client	
  may	
  then	
  check	
  their	
  links	
  and	
  fix	
  the	
  broken	
  ones	
  




Current	
  AcIviIes:	
  
  WOD-­‐LMP	
  [Volz	
  et	
  al.	
  2009]	
  
  Triplify	
  Linked	
  Data	
  Update	
  Log	
  [Auer	
  et	
  al.	
  2009]	
  
  PubSubHubbub	
  /	
  sparqlPuSH	
                                                            1
  h^p://groups.google.com/group/dataset-­‐dynamics	
  
  …	
  

                                                                                                    9
Solve	
  the	
  problem	
  (2/2):	
  Detect	
  and	
  correct	
  
Detect	
  and	
  correct	
  
  If	
  noIficaIon	
  is	
  not	
  applicable	
  
  Clients	
  detect	
  broken	
  links	
  and	
  try	
  to	
  fix	
  them	
  
                                                                                                  2



Current	
  acIviIes:	
  
  Robust	
  hyperlinks	
  [Phelps	
  &	
  Wilensky	
  2000]	
  –	
  Web	
  documents	
  
  PageChaser	
  [Morishima	
  et	
  al.	
  2009]	
  –	
  Web	
  documents	
  
  DSNo%fy	
  –	
  aims	
  at	
  becoming	
  a	
  general	
  framework	
  for	
  fixing	
  broken	
  links	
  
  …	
  


                                                                                                                10
What	
  events	
  cause	
  the	
  problem	
  ?	
  

                       ?




                                                     11
Events	
  that	
  poten%ally	
  lead	
  to	
  broken	
  links	
  

Broken	
  links	
  due	
  to	
  dele%on	
  events	
  




  A	
  dele%on	
  event	
  takes	
  place	
  at	
  Ime	
  t	
  when	
  a	
  resource	
  had	
  (dereferencable)	
  
   representaIons	
  at	
  t-­‐Δ	
  but	
  has	
  none	
  at	
  Ime	
  t	
  

  Vice	
  versa:	
  create	
  event	
  

  Easy	
  to	
  detect	
  


                                                                                                                  12
Events	
  that	
  poten%ally	
  lead	
  to	
  broken	
  links	
  

Broken	
  links	
  due	
  to	
  update	
  events	
  




  An	
  update	
  events	
  takes	
  place	
  at	
  Ime	
  t	
  when	
  a	
  resource	
  had	
  different	
  
   representaIons	
  at	
  t-­‐Δ	
  compared	
  to	
  the	
  ones	
  at	
  Ime	
  t	
  

  Resource	
  updates	
  resulIng	
  in	
  representaIons	
  with	
  different	
  meaning	
  
   (seman%c	
  dri_)	
  may	
  lead	
  to	
  seman%cally	
  broken	
  links	
  

  Hard	
  to	
  detect,	
  open	
  problem	
  	
  

                                                                                                                13
Events	
  that	
  poten%ally	
  lead	
  to	
  broken	
  links	
  

What	
  about	
  move	
  events	
  ?	
  




                                                                    14
Events	
  that	
  poten%ally	
  lead	
  to	
  broken	
  links	
  

What	
  about	
  move	
  events	
  ?	
  

                               a                                                  b



  A	
  move	
  event	
  from	
  a	
  to	
  b	
  takes	
  place	
  at	
  Ime	
  t	
  when	
  
      There	
  were	
  no	
  representaIons	
  of	
  b	
  at	
  Ime	
  t-­‐Δ	
  	
  
      There	
  are	
  no	
  representaIons	
  of	
  a	
  at	
  Ime	
  t	
  
      The	
  representaIons	
  of	
  at-­‐Δ	
  are	
  more	
  similar	
  to	
  the	
  ones	
  of	
  bt	
  than	
  to	
  the	
  
         ones	
  of	
  any	
  other	
  considered	
  resource	
  at	
  Ime	
  t	
  
      The	
  calculated	
  similarity	
  between	
  them	
  is	
  >	
  than	
  a	
  threshold	
  

         Instance	
  matching	
  problem!	
  
                                                                                                                              15
Events	
  that	
  poten%ally	
  lead	
  to	
  broken	
  links	
  

The	
  core	
  algorithm	
  of	
  DSNoIfy	
  detects	
  move	
  events	
  based	
  on	
  
  resource	
  similari%es	
  




                                                                                            16
Changes	
  in	
  DBpedia	
  

 Class           Snapshot 3.2 Snapshot 3.3 Moved Removed Created

 Person
                    213,016    244,621     2,841    20,561    49,325

 Place              247,508    318,017     2,209     2,430    70,730
 Organisation        76,343    105,827     2,020     1,242    28,706

 Work               189,725    213,231     4,097     6,558    25,967

  Resources that were moved/removed/created between the DBpedia
  snapshots 3.2 (October 2008) and 3.3 (May 2009)
PART	
  3	
  :	
  DSNo%fy	
  




                                18
Usage	
  Scenario	
  




                        19
Usage	
  Scenario	
  




  ApplicaIon	
  that	
  consumes	
  various	
  LD	
  sources	
  and	
  may	
  	
  
   update	
  a	
  “source	
  dataset”	
  
                                                                                      20
Usage	
  Scenario	
  




  DSnoIfy	
  is	
  an	
  add-­‐on	
  for	
  applicaIons	
  that	
  want	
  to	
  preserve	
  high	
  link	
  
   integrity	
  in	
  their	
  data	
  
                                                                                                                 21
Usage	
  Scenario	
  




  Other	
  actors	
  (applicaIons)	
  might	
  also	
  be	
  interested	
  in	
  these	
  events	
  	
  
                                                                                                            22
General	
  Approach	
  
  Periodically	
  access	
  linked	
  data	
  sources	
  

  Extract	
  features	
  from	
  resource	
  representa%ons	
  

  Combine	
  them	
  to	
  comparable	
  feature	
  vectors	
  (FV)	
  

  Store	
  them	
  in	
  3	
  indices	
  
      1st	
  index	
  represents	
  the	
  current	
  state	
  of	
  the	
  monitored	
  data	
  
      2nd	
  index	
  stores	
  items	
  that	
  became	
  recently	
  unavailable	
  
      3rd	
  index	
  stores	
  archived	
  feature	
  vectors	
  

  Periodically	
  access	
  index	
  1+2	
  and	
  log	
  detect	
  events	
  

  Periodically	
  update	
  indices	
  1-­‐3	
  
                                                                                                     23
From	
  Resource	
  to	
  Feature	
  Vector	
  

  Both,	
  data	
  type	
  and	
  object	
  	
  
   proper%es	
  supported	
  

  Feature	
  influence	
  is	
  	
  
   weighted	
  

  Some	
  are	
  used	
  in	
  
   plausibility	
  checks	
  
        RDFHash	
  over	
  all	
  
         features	
  




                                                    24
Move	
  Event	
  Detec%on	
  
  Pair	
  wise	
  comparison	
  using	
  a	
  vector	
  space	
  model	
  	
  
      Feature	
  comparison	
  e.g.,	
  using	
  Levenshtein	
  similarity.	
  

  It	
  is	
  sufficient	
  to	
  compare	
  recently	
  added	
  and	
  recently	
  removed	
  feature	
  
   vectors	
  !	
  

  Two	
  thresholds	
  for	
  comparing	
  the	
  similarity	
  between	
  FVs	
  represenIng	
  
   created	
  and	
  removed	
  items:	
  
      lower	
  threshold:	
  select	
  predecessor	
  candidates	
  
              consider	
  URI	
  of	
  added	
  FV	
  as	
  possible	
  new	
  URI	
  of	
  resource	
  represented	
  by	
  
               removed	
  FV	
  
        upper	
  threshold:	
  decidable	
  by	
  DSNoIfy?	
  
              decide	
  whether	
  such	
  a	
  candidate	
  can	
  be	
  automaIcally	
  selected	
  or	
  whether	
  
               human	
  user	
  has	
  to	
  be	
  asked	
  for	
  assistance.	
  

                                                                                                                                 25
Core	
  Housekeeping	
  Algorithm	
  




Ci,Ri and Mi,j denote create, remove and move events of items i and j. mx   26

and hx denote monitoring and housekeeping operations respectively.
Resul%ng	
  Data	
  Structures	
  
  DSNoIfy	
  constructs	
  three	
  data	
  structures:	
  

       An	
  event	
  log	
  containing	
  all	
  events	
  detected	
  by	
  the	
  system	
  

       A	
  log	
  containing	
  all	
  “event	
  choices”	
  DSNoIfy	
  cannot	
  decide	
  on	
  and	
  	
  

       a	
  linked	
  structure	
  of	
  feature	
  vectors	
  consItuIng	
  a	
  history	
  of	
  the	
  
        respecIve	
  items.	
  

  Accessible	
  via	
  




                                                                                                                       image: www.freeimages.co.uk
      Linked	
  data	
  interface	
  
      Java	
  interface	
  
      XML-­‐RPC	
  

                                                                                                                  27
Evalua%on	
  

  Core	
  quesIons:	
  
      Does	
  DSNoIfy	
  work	
  with	
  real	
  data	
  ?	
  
      How	
  does	
  housekeeping	
  frequency	
  affect	
  its	
  effec%veness	
  ?	
  

  Used	
  data:	
  
      Data	
  from	
  DBpedia	
  (8380	
  events)	
  and	
  IIMB	
  (10	
  x	
  222	
  events)	
  were	
  used	
  
      Hand-­‐picked	
  features	
  based	
  on	
  coverage	
  and	
  entropy	
  in	
  the	
  data	
  sets	
  	
  

  Results:	
  
      Housekeeping	
  frequency	
  and	
  data	
  source	
  dynamics	
  determine	
  the	
  
       number	
  of	
  FV-­‐pairs	
  that	
  have	
  to	
  be	
  compared	
  (scalability)	
  
      Number	
  of	
  FV	
  comparisons	
  as	
  well	
  as	
  coverage	
  and	
  entropy	
  of	
  indexed	
  
       features	
  influence	
  accuracy	
  of	
  method	
  
                                                                                                                  28
Evalua%on	
  -­‐	
  Results	
  




  Influence	
  of	
  data	
  source	
  agility	
  and	
  housekeeping	
  frequency	
  on	
  the	
  accuracy	
  
   of	
  the	
  DSNoIfy	
  algorithm	
                                                                    29
Discussion	
  

  Broken	
  links	
  are	
  a	
  considerable	
  problem	
  in	
  a	
  Web	
  of	
  Data	
  

  The	
  broken	
  link	
  problem	
  is	
  partly	
  a	
  special	
  case	
  of	
  the	
  instance	
  matching	
  
   problem	
  

  DSNo%fy	
  is	
  an	
  event-­‐based	
  approach	
  to	
  this	
  problem:	
  
      DSNoIfy	
  can	
  be	
  used	
  as	
  an	
  add-­‐on	
  for	
  data	
  sources	
  that	
  want	
  to	
  preserve	
  
       link	
  integrity	
  in	
  their	
  data	
  

  We	
  cannot	
  “cure”	
  the	
  Web	
  of	
  Data	
  from	
  broken	
  links	
  (but	
  at	
  least	
  alleviate	
  
   the	
  pain	
  a	
  bit	
  :)	
  



                                                                                                                       30
Current	
  and	
  Future	
  Work	
  

  Scalability	
  issues,	
  evaluaIon	
  
  AutomaIc	
  feature	
  selecIon	
  (parameter	
  esImaIon)	
  
  Event	
  algebra:	
  high-­‐level	
  composite	
  events	
  
  EvaluaIon	
  with	
  other	
  data	
  sources	
  (e.g.,	
  file	
  system)	
  
  Dataset	
  dynamics:	
  	
  
   vocabularies,	
  protocols,	
  formats	
  
  …	
  



      Thank	
  You	
  !	
  
      niko.popitsch@univie.ac.at	
  




                                                                                        images: NASA / NSSDC
      h^p://www.dsnoIfy.org	
  
                                                                                   31
References	
  and	
  Related	
  Work	
  
    H.	
  Ashman.	
  Electronic	
  document	
  addressing:	
  dealing	
  with	
  change.	
  ACM	
  Comput.	
  Surv.,	
  32(3),	
  2000.	
  
    F.	
  Kappe.	
  A	
  scalable	
  architecture	
  for	
  maintaining	
  referenIal	
  integrity	
  in	
  distributed	
  informaIon	
  
     systems.	
  Journal	
  of	
  Universal	
  Computer	
  Science,	
  1(2):84–104,	
  1995.	
  
    A.	
  Morishima,	
  A.	
  Nakamizo,	
  T.	
  Iida,	
  S.	
  Sugimoto,	
  and	
  H.	
  Kitagawa.	
  Bringing	
  your	
  dead	
  links	
  back	
  to	
  life:	
  a	
  
     comprehensive	
  approach	
  and	
  lessons	
  learned.	
  In	
  HT	
  ’09:	
  Proceedings	
  of	
  the	
  20th	
  ACM	
  conference	
  on	
  
     Hypertext	
  and	
  hypermedia,	
  pages	
  15–24,	
  2009.	
  	
  
    T.	
  A.	
  Phelps	
  and	
  R.	
  Wilensky.	
  Robust	
  hyperlinks	
  cost	
  just	
  five	
  words	
  each.	
  Technical	
  Report	
  UCB/
     CSD-­‐00-­‐1091,	
  EECS	
  Department,	
  University	
  of	
  California,	
  Berkeley,	
  2000	
  
    J.	
  Volz,	
  C.	
  Bizer,	
  M.	
  Gaedke,	
  and	
  G.	
  Kobilarov.	
  Discovering	
  and	
  maintaining	
  links	
  on	
  the	
  web	
  of	
  data.	
  In	
  
     8th	
  InternaGonal	
  SemanGc	
  Web	
  Conference,	
  2009.	
  
    A.	
  Hogan,	
  A.	
  Harth,	
  and	
  S.	
  Decker.	
  Performing	
  object	
  consolidaIon	
  on	
  the	
  semanIc	
  web	
  data	
  graph.	
  In	
  
     Proceedings	
  of	
  the	
  1st	
  I3:	
  IdenGty,	
  IdenGfiers,	
  IdenGficaGon	
  Workshop,	
  2007	
  
    A.	
  Ferrara,	
  D.	
  Lorusso,	
  S.	
  Montanelli,	
  and	
  G.	
  Varese.	
  Towards	
  a	
  benchmark	
  for	
  instance	
  matching.	
  In	
  
     Ontology	
  Matching	
  (OM	
  2008),	
  volume	
  431	
  of	
  CEUR	
  Workshop	
  Proceedings.	
  CEUR-­‐WS.org,	
  2008	
  
    C.	
  Bizer,	
  T.	
  Heath,	
  and	
  T.	
  Berners-­‐Lee.	
  Linked	
  data	
  -­‐	
  the	
  story	
  so	
  far.	
  InternaGonal	
  Journal	
  on	
  SemanGc	
  
     Web	
  and	
  InformaGon	
  Systems	
  (IJSWIS),	
  5(3),	
  2009	
  
    S.	
  Auer,	
  S.	
  Dietzold,	
  J.	
  Lehmann,	
  S.	
  Hellmann,	
  and	
  D.	
  Aumüller.	
  Triplify:	
  light-­‐weight	
  linked	
  data	
  
     publicaIon	
  from	
  relaIonal	
  databases.	
  In	
  WWW	
  ’09,	
  New	
  York,	
  NY,	
  USA,	
  2009.	
  ACM	
  
    W.	
  Y.	
  Arms.	
  Uniform	
  resource	
  names:	
  handles,	
  purls,	
  and	
  digital	
  object	
  idenIfiers.	
  Commun.	
  ACM,	
  44
     (5):68,	
  2001.	
  


                                                                                                                                                                    32

Weitere ähnliche Inhalte

Was ist angesagt?

Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Peter Mika
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challengesMichael Hausenblas
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Febcstalks
 
Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...John Breslin
 
Methods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataMethods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataCristina Sarasua
 
Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do Haklae Kim
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL OverviewMarc Smith
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Peter Mika
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, NarrativeBruce Mason
 
Open Government Data, Linked Data, and the Missing Blocks in Korea
Open Government Data, Linked Data, and the Missing Blocks in Korea Open Government Data, Linked Data, and the Missing Blocks in Korea
Open Government Data, Linked Data, and the Missing Blocks in Korea Haklae Kim
 
Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)Myungjin Lee
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.PhiloWeb
 

Was ist angesagt? (16)

20111101 b hyland-w3-c-tpac-egov
20111101 b hyland-w3-c-tpac-egov20111101 b hyland-w3-c-tpac-egov
20111101 b hyland-w3-c-tpac-egov
 
Open Data: a brief introduction
Open Data:  a brief introductionOpen Data:  a brief introduction
Open Data: a brief introduction
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Feb
 
Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...
 
Methods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataMethods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of Data
 
Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do Big Data on the Web – What We Will Do
Big Data on the Web – What We Will Do
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, Narrative
 
Open Government Data, Linked Data, and the Missing Blocks in Korea
Open Government Data, Linked Data, and the Missing Blocks in Korea Open Government Data, Linked Data, and the Missing Blocks in Korea
Open Government Data, Linked Data, and the Missing Blocks in Korea
 
Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
 

Andere mochten auch

Análisis crítico literario del poema de David Auris Villegas
Análisis crítico literario del poema de David Auris VillegasAnálisis crítico literario del poema de David Auris Villegas
Análisis crítico literario del poema de David Auris VillegasMiguel Ramos
 
Apontamentos Acerca do Parcelamento Tributário
Apontamentos Acerca do Parcelamento TributárioApontamentos Acerca do Parcelamento Tributário
Apontamentos Acerca do Parcelamento TributárioFabiano Desidério
 
cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015
cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015
cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015Shenoy Karun
 
A SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇA
A SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇAA SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇA
A SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇAFabiano Desidério
 
Rancangan dan pengemb surv
Rancangan dan pengemb survRancangan dan pengemb surv
Rancangan dan pengemb survraysa hasdi
 
Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...
Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...
Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...friendlycoward279
 
Technical management
Technical managementTechnical management
Technical managementZoltan Kun
 
Lessons learnt from Alberta - May 2012
Lessons learnt from Alberta - May 2012Lessons learnt from Alberta - May 2012
Lessons learnt from Alberta - May 2012Global CCS Institute
 
поиск, отбор и оценка Hr директора 13.05.2015
поиск, отбор и оценка Hr директора 13.05.2015поиск, отбор и оценка Hr директора 13.05.2015
поиск, отбор и оценка Hr директора 13.05.2015Elina Polukhina
 
Psychology
PsychologyPsychology
Psychologyucobi
 
Why online advertising is not a dirty word - Echelon 2014
Why online advertising is not a dirty word - Echelon 2014Why online advertising is not a dirty word - Echelon 2014
Why online advertising is not a dirty word - Echelon 2014e27
 

Andere mochten auch (20)

Tenoxicam 59804-37-4-api
Tenoxicam 59804-37-4-apiTenoxicam 59804-37-4-api
Tenoxicam 59804-37-4-api
 
Presentation1 Telcomnicacion
Presentation1 TelcomnicacionPresentation1 Telcomnicacion
Presentation1 Telcomnicacion
 
Análisis crítico literario del poema de David Auris Villegas
Análisis crítico literario del poema de David Auris VillegasAnálisis crítico literario del poema de David Auris Villegas
Análisis crítico literario del poema de David Auris Villegas
 
Austral consulting sl 2015
Austral consulting sl 2015Austral consulting sl 2015
Austral consulting sl 2015
 
Apontamentos Acerca do Parcelamento Tributário
Apontamentos Acerca do Parcelamento TributárioApontamentos Acerca do Parcelamento Tributário
Apontamentos Acerca do Parcelamento Tributário
 
cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015
cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015
cropped_Page 1 - Kochi - Malayalai Rich List June 4 2015
 
A SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇA
A SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇAA SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇA
A SÚMULA IMPEDITIVA DE RECURSOS E A GARANTIA DE ACESSO A JUSTIÇA
 
EnIT goes layar
EnIT goes layarEnIT goes layar
EnIT goes layar
 
Mother
MotherMother
Mother
 
Rancangan dan pengemb surv
Rancangan dan pengemb survRancangan dan pengemb surv
Rancangan dan pengemb surv
 
Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...
Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...
Akron puede dar en el estacionamiento de la coleccion de cuadros de una patad...
 
Technical management
Technical managementTechnical management
Technical management
 
Lessons learnt from Alberta - May 2012
Lessons learnt from Alberta - May 2012Lessons learnt from Alberta - May 2012
Lessons learnt from Alberta - May 2012
 
Creative commons licenses
Creative commons licensesCreative commons licenses
Creative commons licenses
 
20161118141037 4
20161118141037 420161118141037 4
20161118141037 4
 
поиск, отбор и оценка Hr директора 13.05.2015
поиск, отбор и оценка Hr директора 13.05.2015поиск, отбор и оценка Hr директора 13.05.2015
поиск, отбор и оценка Hr директора 13.05.2015
 
Internship Summary Pper
Internship Summary PperInternship Summary Pper
Internship Summary Pper
 
Biopori
BioporiBiopori
Biopori
 
Psychology
PsychologyPsychology
Psychology
 
Why online advertising is not a dirty word - Echelon 2014
Why online advertising is not a dirty word - Echelon 2014Why online advertising is not a dirty word - Echelon 2014
Why online advertising is not a dirty word - Echelon 2014
 

Ähnlich wie dsnotify presentation at www2010

Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open DataIvan Herman
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs3 Round Stones
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Menzo Windhouwer
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museumsljsmart
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information3 Round Stones
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
 
Why should you trust my data code4lib 2016
Why should you trust my data code4lib 2016Why should you trust my data code4lib 2016
Why should you trust my data code4lib 2016flyingzumwalt
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) DataEdward Curry
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareIMC Technologies
 
Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetupOKFN-GR
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database dannyijwest
 
NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases
NoSQL Object DB & NewSQL Columnar DB, A Tale of Two DatabasesNoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases
NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases✔ Eric David Benari, PMP
 
Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataMustafa Jarrar
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonAlex Coley
 

Ähnlich wie dsnotify presentation at www2010 (20)

090626cc tech-summit
090626cc tech-summit090626cc tech-summit
090626cc tech-summit
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
Why should you trust my data code4lib 2016
Why should you trust my data code4lib 2016Why should you trust my data code4lib 2016
Why should you trust my data code4lib 2016
 
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack PrototypeLOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
LOD2: State of Play WP1: Requirements, Design & LOD2 Stack Prototype
 
Een oceaan van data
Een oceaan van dataEen oceaan van data
Een oceaan van data
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetup
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases
NoSQL Object DB & NewSQL Columnar DB, A Tale of Two DatabasesNoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases
NoSQL Object DB & NewSQL Columnar DB, A Tale of Two Databases
 
Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked Data
 
Environmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz LondonEnvironmental Linked Data - Semtech Biz London
Environmental Linked Data - Semtech Biz London
 

dsnotify presentation at www2010

  • 1. DSNo%fy:  Handling  Broken  Links     in  the  Web  of  Data   Niko  Popitsch,  University  of  Vienna  /  Austria   niko.popitsch@univie.ac.at   Joint  work  with  Bernhard  Haslhofer   bernhard.haslhofer@univie.ac.at   April  30,  2010     WWW  2010  Conference     Raleigh, North Carolina, USA
  • 2. Outline     IntroducIon  and  problem  definiIon     Related  work  and  soluIon  strategies     DSNoIfy     Usage  scenarios  and  design     Core  algorithm     EvaluaIon     Summary  &  Discussion     References   image: www.freeimages.co.uk 2
  • 3. image by TBL / Hans Rosling Linked  Data  Principles  (short  version):     (1)   use  HTTP  URIs  to  idenIfy  resources,     (2)   deliver  meaningful  representa%ons  (e.g.,  RDF,  XHTML)  when  these  are   dereferenced   (3)   link  to  other  resources   3
  • 4. A  Linked  Data  Example   4
  • 5. $ curl -H "Accept: application/rdf+xml" http://www.bbc.co.uk/music/artists/ 084308bd-1654-436f-ba03-df6697104e19 Links within the data source [...] <mo:member rdf:resource="/music/artists/5d06fe54-485a-4a07-b506-5f6f719448cb#artist" /> <mo:member rdf:resource="/music/artists/f332a312-e95b-4413-b6cc-1762a5a6a083#artist" /> <mo:member rdf:resource="/music/artists/0dcee02c-5d2c-4f5c-9d60-d58a4df32d9e#artist" /> [...] RDF links between data sources [...] <owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" /> <mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03- df6697104e19.html" /> [...] [...] <mo:MusicArtist rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist"> <rdf:type rdf:resource="http://purl.org/ontology/mo/MusicGroup" /> <foaf:name>Green Day</foaf:name> [...] 5
  • 6. Problem:  links  can  break   6
  • 7. Ignore  broken  links  ?  Not  a  good  idea  !   Broken  links  on  the  Web  are  annoying  for  humans     but  alternaIve  paths  may  be  used:       search  engines,  URL  manipulaIon,  alternaIve     informaIon  providers,  etc.   Much  harder  for  machines  in  a  Web  of  Data  !     reduced  data  accessibility     data  inconsistencies   7
  • 8. Avoid  broken  links  ?  Great!       But  hard  to  achieve  in  the  Web  environment…     SoluIon  strategies  that  solve  problem  only  parIally:     RelaIve  references     embedded  links     redundancy     SoluIon  strategies  that  are  not  commonly  applicable:     Versioned/staIc  collecIons     regular  (predictable)  updates     dynamic  links     indirecIon  services  (PURLs,  DOIs)   8
  • 9. Solve  the  problem  (1/2)  :  No%fica%on   No%fica%on  strategy:     Data  source  “knows“  about  the  events  that  are  taking  place     NoIfies  clients     Client  may  then  check  their  links  and  fix  the  broken  ones   Current  AcIviIes:     WOD-­‐LMP  [Volz  et  al.  2009]     Triplify  Linked  Data  Update  Log  [Auer  et  al.  2009]     PubSubHubbub  /  sparqlPuSH   1   h^p://groups.google.com/group/dataset-­‐dynamics     …   9
  • 10. Solve  the  problem  (2/2):  Detect  and  correct   Detect  and  correct     If  noIficaIon  is  not  applicable     Clients  detect  broken  links  and  try  to  fix  them   2 Current  acIviIes:     Robust  hyperlinks  [Phelps  &  Wilensky  2000]  –  Web  documents     PageChaser  [Morishima  et  al.  2009]  –  Web  documents     DSNo%fy  –  aims  at  becoming  a  general  framework  for  fixing  broken  links     …   10
  • 11. What  events  cause  the  problem  ?   ? 11
  • 12. Events  that  poten%ally  lead  to  broken  links   Broken  links  due  to  dele%on  events     A  dele%on  event  takes  place  at  Ime  t  when  a  resource  had  (dereferencable)   representaIons  at  t-­‐Δ  but  has  none  at  Ime  t     Vice  versa:  create  event     Easy  to  detect   12
  • 13. Events  that  poten%ally  lead  to  broken  links   Broken  links  due  to  update  events     An  update  events  takes  place  at  Ime  t  when  a  resource  had  different   representaIons  at  t-­‐Δ  compared  to  the  ones  at  Ime  t     Resource  updates  resulIng  in  representaIons  with  different  meaning   (seman%c  dri_)  may  lead  to  seman%cally  broken  links     Hard  to  detect,  open  problem     13
  • 14. Events  that  poten%ally  lead  to  broken  links   What  about  move  events  ?   14
  • 15. Events  that  poten%ally  lead  to  broken  links   What  about  move  events  ?   a b   A  move  event  from  a  to  b  takes  place  at  Ime  t  when     There  were  no  representaIons  of  b  at  Ime  t-­‐Δ       There  are  no  representaIons  of  a  at  Ime  t     The  representaIons  of  at-­‐Δ  are  more  similar  to  the  ones  of  bt  than  to  the   ones  of  any  other  considered  resource  at  Ime  t     The  calculated  similarity  between  them  is  >  than  a  threshold     Instance  matching  problem!   15
  • 16. Events  that  poten%ally  lead  to  broken  links   The  core  algorithm  of  DSNoIfy  detects  move  events  based  on   resource  similari%es   16
  • 17. Changes  in  DBpedia   Class Snapshot 3.2 Snapshot 3.3 Moved Removed Created Person 213,016 244,621 2,841 20,561 49,325 Place 247,508 318,017 2,209 2,430 70,730 Organisation 76,343 105,827 2,020 1,242 28,706 Work 189,725 213,231 4,097 6,558 25,967 Resources that were moved/removed/created between the DBpedia snapshots 3.2 (October 2008) and 3.3 (May 2009)
  • 18. PART  3  :  DSNo%fy   18
  • 20. Usage  Scenario     ApplicaIon  that  consumes  various  LD  sources  and  may     update  a  “source  dataset”   20
  • 21. Usage  Scenario     DSnoIfy  is  an  add-­‐on  for  applicaIons  that  want  to  preserve  high  link   integrity  in  their  data   21
  • 22. Usage  Scenario     Other  actors  (applicaIons)  might  also  be  interested  in  these  events     22
  • 23. General  Approach     Periodically  access  linked  data  sources     Extract  features  from  resource  representa%ons     Combine  them  to  comparable  feature  vectors  (FV)     Store  them  in  3  indices     1st  index  represents  the  current  state  of  the  monitored  data     2nd  index  stores  items  that  became  recently  unavailable     3rd  index  stores  archived  feature  vectors     Periodically  access  index  1+2  and  log  detect  events     Periodically  update  indices  1-­‐3   23
  • 24. From  Resource  to  Feature  Vector     Both,  data  type  and  object     proper%es  supported     Feature  influence  is     weighted     Some  are  used  in   plausibility  checks     RDFHash  over  all   features   24
  • 25. Move  Event  Detec%on     Pair  wise  comparison  using  a  vector  space  model       Feature  comparison  e.g.,  using  Levenshtein  similarity.     It  is  sufficient  to  compare  recently  added  and  recently  removed  feature   vectors  !     Two  thresholds  for  comparing  the  similarity  between  FVs  represenIng   created  and  removed  items:     lower  threshold:  select  predecessor  candidates     consider  URI  of  added  FV  as  possible  new  URI  of  resource  represented  by   removed  FV     upper  threshold:  decidable  by  DSNoIfy?     decide  whether  such  a  candidate  can  be  automaIcally  selected  or  whether   human  user  has  to  be  asked  for  assistance.   25
  • 26. Core  Housekeeping  Algorithm   Ci,Ri and Mi,j denote create, remove and move events of items i and j. mx 26 and hx denote monitoring and housekeeping operations respectively.
  • 27. Resul%ng  Data  Structures     DSNoIfy  constructs  three  data  structures:     An  event  log  containing  all  events  detected  by  the  system     A  log  containing  all  “event  choices”  DSNoIfy  cannot  decide  on  and       a  linked  structure  of  feature  vectors  consItuIng  a  history  of  the   respecIve  items.     Accessible  via   image: www.freeimages.co.uk   Linked  data  interface     Java  interface     XML-­‐RPC   27
  • 28. Evalua%on     Core  quesIons:     Does  DSNoIfy  work  with  real  data  ?     How  does  housekeeping  frequency  affect  its  effec%veness  ?     Used  data:     Data  from  DBpedia  (8380  events)  and  IIMB  (10  x  222  events)  were  used     Hand-­‐picked  features  based  on  coverage  and  entropy  in  the  data  sets       Results:     Housekeeping  frequency  and  data  source  dynamics  determine  the   number  of  FV-­‐pairs  that  have  to  be  compared  (scalability)     Number  of  FV  comparisons  as  well  as  coverage  and  entropy  of  indexed   features  influence  accuracy  of  method   28
  • 29. Evalua%on  -­‐  Results     Influence  of  data  source  agility  and  housekeeping  frequency  on  the  accuracy   of  the  DSNoIfy  algorithm   29
  • 30. Discussion     Broken  links  are  a  considerable  problem  in  a  Web  of  Data     The  broken  link  problem  is  partly  a  special  case  of  the  instance  matching   problem     DSNo%fy  is  an  event-­‐based  approach  to  this  problem:     DSNoIfy  can  be  used  as  an  add-­‐on  for  data  sources  that  want  to  preserve   link  integrity  in  their  data     We  cannot  “cure”  the  Web  of  Data  from  broken  links  (but  at  least  alleviate   the  pain  a  bit  :)   30
  • 31. Current  and  Future  Work     Scalability  issues,  evaluaIon     AutomaIc  feature  selecIon  (parameter  esImaIon)     Event  algebra:  high-­‐level  composite  events     EvaluaIon  with  other  data  sources  (e.g.,  file  system)     Dataset  dynamics:     vocabularies,  protocols,  formats     …   Thank  You  !   niko.popitsch@univie.ac.at   images: NASA / NSSDC h^p://www.dsnoIfy.org   31
  • 32. References  and  Related  Work     H.  Ashman.  Electronic  document  addressing:  dealing  with  change.  ACM  Comput.  Surv.,  32(3),  2000.     F.  Kappe.  A  scalable  architecture  for  maintaining  referenIal  integrity  in  distributed  informaIon   systems.  Journal  of  Universal  Computer  Science,  1(2):84–104,  1995.     A.  Morishima,  A.  Nakamizo,  T.  Iida,  S.  Sugimoto,  and  H.  Kitagawa.  Bringing  your  dead  links  back  to  life:  a   comprehensive  approach  and  lessons  learned.  In  HT  ’09:  Proceedings  of  the  20th  ACM  conference  on   Hypertext  and  hypermedia,  pages  15–24,  2009.       T.  A.  Phelps  and  R.  Wilensky.  Robust  hyperlinks  cost  just  five  words  each.  Technical  Report  UCB/ CSD-­‐00-­‐1091,  EECS  Department,  University  of  California,  Berkeley,  2000     J.  Volz,  C.  Bizer,  M.  Gaedke,  and  G.  Kobilarov.  Discovering  and  maintaining  links  on  the  web  of  data.  In   8th  InternaGonal  SemanGc  Web  Conference,  2009.     A.  Hogan,  A.  Harth,  and  S.  Decker.  Performing  object  consolidaIon  on  the  semanIc  web  data  graph.  In   Proceedings  of  the  1st  I3:  IdenGty,  IdenGfiers,  IdenGficaGon  Workshop,  2007     A.  Ferrara,  D.  Lorusso,  S.  Montanelli,  and  G.  Varese.  Towards  a  benchmark  for  instance  matching.  In   Ontology  Matching  (OM  2008),  volume  431  of  CEUR  Workshop  Proceedings.  CEUR-­‐WS.org,  2008     C.  Bizer,  T.  Heath,  and  T.  Berners-­‐Lee.  Linked  data  -­‐  the  story  so  far.  InternaGonal  Journal  on  SemanGc   Web  and  InformaGon  Systems  (IJSWIS),  5(3),  2009     S.  Auer,  S.  Dietzold,  J.  Lehmann,  S.  Hellmann,  and  D.  Aumüller.  Triplify:  light-­‐weight  linked  data   publicaIon  from  relaIonal  databases.  In  WWW  ’09,  New  York,  NY,  USA,  2009.  ACM     W.  Y.  Arms.  Uniform  resource  names:  handles,  purls,  and  digital  object  idenIfiers.  Commun.  ACM,  44 (5):68,  2001.   32