SlideShare wird heruntergeladen. ×
0
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Infovore: An Open Source MapReduce Framework For Processing Graph Data
Nächste SlideShare
Wird geladen in ...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Infovore: An Open Source MapReduce Framework For Processing Graph Data

2,681

Published on

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

Published in: Technologie, Bildung
0 Kommentare
6 Gefällt mir
Statistiken
Notizen
  • Hinterlassen Sie den ersten Kommentar

Keine Downloads
Views
Gesamtviews
2,681
Bei Slideshare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
8
Kommentare
0
Gefällt mir
6
Einbettungen 0
No embeds

Inhalte melden
Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
No notes for slide

Transcript

  • 1. Infovore, an Open-Source Map/ReduceFramework For Processing GraphDataPaul HouleOntology2
  • 2. 2+ billion facts, 20+ gb!
  • 3. the data your project needs
  • 4. Why handle complete data sets?Quality PerimeterInfovore
  • 5. RDF Tools vs.Invalid TriplesImage cc-by from arj03
  • 6. Scaling Limits of Triple StoresCPU Main MemoryCPUCPUCPUCPUCPURandom-access bottleneckHard Drive or Flash Storage
  • 7. Map/Reduce conserves memory!Image cc-by-sa from Anua22a
  • 8. Partitioning Datamd5(“http://dbpedia.org/resource/Tree”) =b78f8f508982ceb4e8dd3510fac75f62331 332330 333 334 335… …
  • 9. If you really try it…331332330333334 335… …
  • 10. Preprocessing Freebase• Expand prefixes• Remove• fbase:type.type.instance• fbase:type.type.expected_by• rdfs:type w/ fbase:* subject• Reverse• Fbase:type.permission.controls• Fbase:dataworld_gardening_hint.replaced_by• Rewrite• Fbase:type.object.type to rdfs:type
  • 11. Parallel Super Eyeball
  • 12. sort | uniq:Surgeon a :Occupation .:Surgeon rdfs:label “Surgeon” @en.:Surgeon :mustHave :Md.:Tree a :Plant .:Tree rfs:label “Tree” @en .:Tree :has :Leaves .:Victory a :AbstractConcept .:Vectory rdfs:label “Victory” .:Victory :emotialTone :Positive .
  • 13. Huge scalability…:Tree:Victory:SurgeonMain memory
  • 14. Pig, Hadoop and All That…Source: http://www.dbis.informatik.hu-berlin.de/forschung/projekte/query-optimization-in-rdf-databases.html
  • 15. Monitoring for Quality ControlOperational Statistics(rdf)Preprocess Partition Clean Sort Classify Filter
  • 16. :basekb
  • 17. Parallel Loading into Triple Stores331 332330 333 334 335… …Openlink Virtuoso4x Speedup
  • 18. :basekb lite:Freebase:Chosenfacts:Rulebox:Chosentopics
  • 19. rdf diff
  • 20. See for yourselfhttps://github.com/paulhoule/infovore/wiki

×