Infovore: An Open Source MapReduce Framework For Processing Graph Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Infovore: An Open Source MapReduce Framework For Processing Graph Data

am

  • 2,493 Views

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

Statistiken

Views

Gesamtviews
2,493
Views auf SlideShare
2,446
Views einbetten
47

Actions

Gefällt mir
5
Downloads
8
Kommentare
0

1 Einbettung 47

http://www.linkedin.com 47

Zugänglichkeit

Kategorien

Details hochladen

Uploaded via as Microsoft PowerPoint

Benutzerrechte

© Alle Rechte vorbehalten

Report content

Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
  • Full Name Full Name Comment goes here.
    Sind Sie sicher, dass Sie...
    Ihre Nachricht erscheint hier
    Processing...
Kommentar posten
Kommentar bearbeiten

Infovore: An Open Source MapReduce Framework For Processing Graph Data Presentation Transcript

  • 1. Infovore, an Open-Source Map/ReduceFramework For Processing GraphDataPaul HouleOntology2
  • 2. 2+ billion facts, 20+ gb!
  • 3. the data your project needs
  • 4. Why handle complete data sets?Quality PerimeterInfovore
  • 5. RDF Tools vs.Invalid TriplesImage cc-by from arj03
  • 6. Scaling Limits of Triple StoresCPU Main MemoryCPUCPUCPUCPUCPURandom-access bottleneckHard Drive or Flash Storage
  • 7. Map/Reduce conserves memory!Image cc-by-sa from Anua22a
  • 8. Partitioning Datamd5(“http://dbpedia.org/resource/Tree”) =b78f8f508982ceb4e8dd3510fac75f62331 332330 333 334 335… …
  • 9. If you really try it…331332330333334 335… …
  • 10. Preprocessing Freebase• Expand prefixes• Remove• fbase:type.type.instance• fbase:type.type.expected_by• rdfs:type w/ fbase:* subject• Reverse• Fbase:type.permission.controls• Fbase:dataworld_gardening_hint.replaced_by• Rewrite• Fbase:type.object.type to rdfs:type
  • 11. Parallel Super Eyeball
  • 12. sort | uniq:Surgeon a :Occupation .:Surgeon rdfs:label “Surgeon” @en.:Surgeon :mustHave :Md.:Tree a :Plant .:Tree rfs:label “Tree” @en .:Tree :has :Leaves .:Victory a :AbstractConcept .:Vectory rdfs:label “Victory” .:Victory :emotialTone :Positive .
  • 13. Huge scalability…:Tree:Victory:SurgeonMain memory
  • 14. Pig, Hadoop and All That…Source: http://www.dbis.informatik.hu-berlin.de/forschung/projekte/query-optimization-in-rdf-databases.html
  • 15. Monitoring for Quality ControlOperational Statistics(rdf)Preprocess Partition Clean Sort Classify Filter
  • 16. :basekb
  • 17. Parallel Loading into Triple Stores331 332330 333 334 335… …Openlink Virtuoso4x Speedup
  • 18. :basekb lite:Freebase:Chosenfacts:Rulebox:Chosentopics
  • 19. rdf diff
  • 20. See for yourselfhttps://github.com/paulhoule/infovore/wiki