SlideShare wird heruntergeladen. ×
0
Infovore, an Open-Source Map/ReduceFramework For Processing GraphDataPaul HouleOntology2
2+ billion facts, 20+ gb!
the data your project needs
Why handle complete data sets?Quality PerimeterInfovore
RDF Tools vs.Invalid TriplesImage cc-by from arj03
Scaling Limits of Triple StoresCPU Main MemoryCPUCPUCPUCPUCPURandom-access bottleneckHard Drive or Flash Storage
Map/Reduce conserves memory!Image cc-by-sa from Anua22a
Partitioning Datamd5(“http://dbpedia.org/resource/Tree”) =b78f8f508982ceb4e8dd3510fac75f62331 332330 333 334 335… …
If you really try it…331332330333334 335… …
Preprocessing Freebase• Expand prefixes• Remove• fbase:type.type.instance• fbase:type.type.expected_by• rdfs:type w/ fbase...
Parallel Super Eyeball
sort | uniq:Surgeon a :Occupation .:Surgeon rdfs:label “Surgeon” @en.:Surgeon :mustHave :Md.:Tree a :Plant .:Tree rfs:labe...
Huge scalability…:Tree:Victory:SurgeonMain memory
Pig, Hadoop and All That…Source: http://www.dbis.informatik.hu-berlin.de/forschung/projekte/query-optimization-in-rdf-data...
Monitoring for Quality ControlOperational Statistics(rdf)Preprocess Partition Clean Sort Classify Filter
:basekb
Parallel Loading into Triple Stores331 332330 333 334 335… …Openlink Virtuoso4x Speedup
:basekb lite:Freebase:Chosenfacts:Rulebox:Chosentopics
rdf diff
See for yourselfhttps://github.com/paulhoule/infovore/wiki
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Nächste SlideShare
Wird geladen in ...5
×

Infovore: An Open Source MapReduce Framework For Processing Graph Data

2,711

Published on

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

Published in: Technologie, Bildung
0 Kommentare
6 Gefällt mir
Statistiken
Notizen
  • Hinterlassen Sie den ersten Kommentar

Keine Downloads
Views
Gesamtviews
2,711
Bei Slideshare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
9
Kommentare
0
Gefällt mir
6
Einbettungen 0
No embeds

No notes for slide

Transcript of "Infovore: An Open Source MapReduce Framework For Processing Graph Data"

  1. 1. Infovore, an Open-Source Map/ReduceFramework For Processing GraphDataPaul HouleOntology2
  2. 2. 2+ billion facts, 20+ gb!
  3. 3. the data your project needs
  4. 4. Why handle complete data sets?Quality PerimeterInfovore
  5. 5. RDF Tools vs.Invalid TriplesImage cc-by from arj03
  6. 6. Scaling Limits of Triple StoresCPU Main MemoryCPUCPUCPUCPUCPURandom-access bottleneckHard Drive or Flash Storage
  7. 7. Map/Reduce conserves memory!Image cc-by-sa from Anua22a
  8. 8. Partitioning Datamd5(“http://dbpedia.org/resource/Tree”) =b78f8f508982ceb4e8dd3510fac75f62331 332330 333 334 335… …
  9. 9. If you really try it…331332330333334 335… …
  10. 10. Preprocessing Freebase• Expand prefixes• Remove• fbase:type.type.instance• fbase:type.type.expected_by• rdfs:type w/ fbase:* subject• Reverse• Fbase:type.permission.controls• Fbase:dataworld_gardening_hint.replaced_by• Rewrite• Fbase:type.object.type to rdfs:type
  11. 11. Parallel Super Eyeball
  12. 12. sort | uniq:Surgeon a :Occupation .:Surgeon rdfs:label “Surgeon” @en.:Surgeon :mustHave :Md.:Tree a :Plant .:Tree rfs:label “Tree” @en .:Tree :has :Leaves .:Victory a :AbstractConcept .:Vectory rdfs:label “Victory” .:Victory :emotialTone :Positive .
  13. 13. Huge scalability…:Tree:Victory:SurgeonMain memory
  14. 14. Pig, Hadoop and All That…Source: http://www.dbis.informatik.hu-berlin.de/forschung/projekte/query-optimization-in-rdf-databases.html
  15. 15. Monitoring for Quality ControlOperational Statistics(rdf)Preprocess Partition Clean Sort Classify Filter
  16. 16. :basekb
  17. 17. Parallel Loading into Triple Stores331 332330 333 334 335… …Openlink Virtuoso4x Speedup
  18. 18. :basekb lite:Freebase:Chosenfacts:Rulebox:Chosentopics
  19. 19. rdf diff
  20. 20. See for yourselfhttps://github.com/paulhoule/infovore/wiki
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×