Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Demolitions and Dali : Web Dev and Data in a Graph Database

791 Aufrufe

Veröffentlicht am

My presentation at GitHub CodeConf 2016

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Demolitions and Dali : Web Dev and Data in a Graph Database

  1. 1. • 
 
 — TO —>
  2. 2. Also
  3. 3. OK, graph databases • Instead of tables and SQL • Nodes and relationships • Specialized queries
 • Not everything is a graph
 (and this is not sponsored)
  4. 4. Install / Update Neo4j • Neo4j • http://localhost:7474
 Community Edition 3.0.3
 • Python, PIP, and Py2Neo • py2neo.__version__ = ‘3b1’
  5. 5. Step 0 - installing • Install Neo4j - neo4j.com/install • brew on Mac • DigitalOcean has Linux instructions • change default password • Trouble installing locally? • heroku addons:add graphene
  6. 6. Who uses graphs? • Panama Papers • IMDB / Six Degrees of Kevin Bacon • Especially: • social networks, research data, maps • anywhere number of joins is large, indefinite, or unlimited
  7. 7. Cypher
  8. 8. MoMA.org • PostgreSQL sync to “The Museum System” CMS outside our control
  9. 9. Who uses MoMA.org? • Tourists • Researchers • Distant art fans • Members
  10. 10. The trouble with tables • Many joins to get people, titles, photos, additional relationship info
 • Speed of query • Difficult to write new queries
  11. 11. Art Graph DB • did Picasso collaborate with other artists in his lifetime? • are any artists credited as painter, director, sculptor, etc?
 (maybe an art EGOT)
  12. 12. Let’s build that graph • Artists and artworks • Basic bio data, MoMA ID -> Artist node • Future DB: all people connected • Title, date, MoMA ID -> Artwork node • ARTIST_OF relationship (include order)
  13. 13. Let’s build that graph • git clone
 https://github.com/mapmeld/graph ! • Building a scraper for MoMA
  14. 14. Demolitions and Dalí in a Graph Database Nick Doiron - @mapmeld
  15. 15. Cypher
  16. 16. Cypher
  17. 17. On to OSM
  18. 18. If you’re interested • Google: MapZen Extracts • download a city • for this script, download the OSM XML file • if you like PostGIS, there is a download
 (no import script)
  19. 19. Benefits of OSM • Open to use / full data • Open to edit / choose tags • HOT community • Civil e-mail lists (Crimea)
  20. 20. Benefits of OSM
  21. 21. Google on OSM • "Our maps represent
 what you or I need to do on a day-to-day basis
 in the developed part of the world” • — Google Maps Geospatial Technologist (quoted in FastCompany)
  22. 22. In Haiti and worldwide
  23. 23. In Haiti and worldwide
  24. 24. XML data
  25. 25. XML data • Nodes, ways, and relations • Ways made up of multiple nodes • Relations contain nodes and ways • Practically: • Multiple ways connect / combine • Tags are a community construct
  26. 26. Smart Renderer • When is a <way> a line (cul-de-sac) or a polygon (river, lake, parking lot)? • Has to support world’s fonts • Tag for real life, not for the renderer
  27. 27. Building graph data • Script adds all roads to Neo4j • Includes an array of node ids (can mix content types, similar to a document database) • If two ways share a node with the same ID, link them both ways <—>
  28. 28. Cypher + OSM * you can put an index on schema fields now
  29. 29. Problem
  30. 30. Google Prediction API • Prediction based on a CSV • Categorization or numerical • Google generates a model and estimates accuracy • Not allowed in Myanmar
  31. 31. Predicting Houses • Format 60,000+ rows of database export • Choose categories to predict 2-3 years • Competing models determine how important each column is • Can it parse dates? Find patterns • Edging up to ~74 percent accuracy
  32. 32. Network effect • Adding network of streets • Now tokens include not just my street and neighbors, but shared streets
  33. 33. Network effect • Most demolitions have one house on their street demolished (it’s them)
  34. 34. Network effect
  35. 35. Network effect • Google Prediction API reported 81% accuracy • But is it good? • Early optimization studies moved fire stations and left neighborhoods vulnerable • City can’t maintain it… hasn’t continued to open their data
  36. 36. Looking forward • Ideas for graph databases?
 Ways to release large graph data - as an API? As JSON files? As Neo4j dump? • Ideas for statisticians / future research?
  37. 37. Demolitions and Dalí in a Graph Database Nick Doiron - @mapmeld

×