• Teilen
  • E-Mail
  • Einbetten
  • Gefällt mir
  • Speichern
  • Privater Inhalt
Infovore: An Open Source MapReduce Framework For Processing Graph Data
 

Infovore: An Open Source MapReduce Framework For Processing Graph Data

on

  • 2,189 Views

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

This talk describes an Infovore, a tool that uses the Map/Reduce approach to clean up, filter and combine RDF data sets to deliver purpose-built data sets for practical consumers of linked data

Statistiken

Views

Gesamtviews
2,189
Views auf SlideShare
2,142
Views einbetten
47

Actions

Gefällt mir
5
Downloads
8
Kommentare
0

1 Einbettung 47

http://www.linkedin.com 47

Zugänglichkeit

Kategorien

Details hochladen

Uploaded via as Microsoft PowerPoint

Benutzerrechte

© Alle Rechte vorbehalten

Report content

Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Ihre Nachricht erscheint hier
    Processing...
Kommentar posten
Kommentar bearbeiten

    Infovore: An Open Source MapReduce Framework For Processing Graph Data Infovore: An Open Source MapReduce Framework For Processing Graph Data Presentation Transcript

    • Infovore, an Open-Source Map/ReduceFramework For Processing GraphDataPaul HouleOntology2
    • 2+ billion facts, 20+ gb!
    • the data your project needs
    • Why handle complete data sets?Quality PerimeterInfovore
    • RDF Tools vs.Invalid TriplesImage cc-by from arj03
    • Scaling Limits of Triple StoresCPU Main MemoryCPUCPUCPUCPUCPURandom-access bottleneckHard Drive or Flash Storage
    • Map/Reduce conserves memory!Image cc-by-sa from Anua22a
    • Partitioning Datamd5(“http://dbpedia.org/resource/Tree”) =b78f8f508982ceb4e8dd3510fac75f62331 332330 333 334 335… …
    • If you really try it…331332330333334 335… …
    • Preprocessing Freebase• Expand prefixes• Remove• fbase:type.type.instance• fbase:type.type.expected_by• rdfs:type w/ fbase:* subject• Reverse• Fbase:type.permission.controls• Fbase:dataworld_gardening_hint.replaced_by• Rewrite• Fbase:type.object.type to rdfs:type
    • Parallel Super Eyeball
    • sort | uniq:Surgeon a :Occupation .:Surgeon rdfs:label “Surgeon” @en.:Surgeon :mustHave :Md.:Tree a :Plant .:Tree rfs:label “Tree” @en .:Tree :has :Leaves .:Victory a :AbstractConcept .:Vectory rdfs:label “Victory” .:Victory :emotialTone :Positive .
    • Huge scalability…:Tree:Victory:SurgeonMain memory
    • Pig, Hadoop and All That…Source: http://www.dbis.informatik.hu-berlin.de/forschung/projekte/query-optimization-in-rdf-databases.html
    • Monitoring for Quality ControlOperational Statistics(rdf)Preprocess Partition Clean Sort Classify Filter
    • :basekb
    • Parallel Loading into Triple Stores331 332330 333 334 335… …Openlink Virtuoso4x Speedup
    • :basekb lite:Freebase:Chosenfacts:Rulebox:Chosentopics
    • rdf diff
    • See for yourselfhttps://github.com/paulhoule/infovore/wiki