SlideShare ist ein Scribd-Unternehmen logo
1 von 65
An Introduction to
    Neo4j
   Michal Bachman
    @bachmanm
Roadmap
•   Intro to NOSQL
•   Intro to Graph Databases
•   Intro to Neo4j
•   A bit of hacking
•   Current research
•   Q&A



                               @bachmanm
Not Only SQL

          @bachmanm
Why NOSQL now?

   Driving trends




                    @bachmanm
Trend 1: Data Size




                     @bachmanm
Trend 2: Connectedness
                                                                                          GGG
                                                                                 Onotologies

                                                                              RDFa


                                                                         Folksonomies
Information connectivity




                                                               Tagging

                                                     Wikis

                                                               UGC

                                                       Blogs

                                                    Feeds


                                        Hypertext
                              Text
                           Documents




                                                                                                @bachmanm
Trend 3: Semi-structured Data




                            @bachmanm
Trend 4: Application Architecture (80’s)



                           Application




                               DB




                                         @bachmanm
Trend 4: Application Architecture (90’s)



                        App   App    App




                               DB




                                    @bachmanm
Application   Application   Application




    DB            DB            DB


                                          @bachmanm
Side note: RDBMS performance
 Salary List




                          @bachmanm
Four NOSQL Categories




                        @bachmanm
Key-Value Stores
• “Dynamo: Amazon’s Highly Available Key-
  Value Store” (2007)
• Data model:
  – Global key-value mapping
  – Big scalable HashMap
  – Highly fault tolerant (typically)
• Examples:
  – Riak, Redis, Voldemort

                                            @bachmanm
Pros and Cons
• Strengths
  – Simple data model
  – Great at scaling out horizontally
     • Scalable
     • Available
• Weaknesses:
  – Simplistic data model
  – Poor for complex data


                                        @bachmanm
Column Family (BigTable)
• Google’s “Bigtable: A Distributed Storage
  System for Structured Data” (2006)
• Data model:
  – A big table, with column families
  – Map-reduce for querying/processing
• Examples:
  – HBase, HyperTable, Cassandra



                                              @bachmanm
Pros and Cons
• Strengths
  – Data model supports semi-structured data
  – Naturally indexed (columns)
  – Good at scaling out horizontally
• Weaknesses:
  – Unsuited for interconnected data




                                               @bachmanm
Document Databases
• Data model
  – Collections of documents
  – A document is a key-value collection
  – Index-centric, lots of map-reduce
• Examples
  – CouchDB, MongoDB




                                           @bachmanm
Pros and Cons
• Strengths
  – Simple, powerful data model (just like SVN!)
  – Good scaling (especially if sharding supported)
• Weaknesses:
  – Unsuited for interconnected data
  – Query model limited to keys (and indexes)
     • Map reduce for larger queries




                                                 @bachmanm
Graph Databases
• Data model:
  – Nodes with properties
  – Named relationships with properties
  – Hypergraph, sometimes
• Examples:
  – Neo4j (of course), Sones GraphDB, OrientDB,
    InfiniteGraph, AllegroGraph



                                                  @bachmanm
Pros and Cons
• Strengths
  – Powerful data model
  – Fast
     • For connected data, can be many orders of magnitude
       faster than RDBMS
• Weaknesses:
  – Sharding
     • Though they can scale reasonably well
     • And for some domains you can shard too!

                                                     @bachmanm
Social Network “path exists”
              Performance
• Experiment:
  • ~1k persons                           # persons query time

  • Average 50 friends per   Relational   1000      2000ms
                             database
    person
                             Neo4j        1000      2ms
  • pathExists(a,b)
                             Neo4j        1000000   2ms
    limited to depth 4
  • Caches warm to
    eliminate disk IO


                                                      @bachmanm
Four NOSQL Categories




                        @bachmanm
What are graphs good for?
•   Recommendations
•   Business intelligence
•   Social computing
•   Geospatial
•   MDM
•   Systems management
•   Web of things
•   Genealogy
•   Time series data
•   Product catalogue
•   Web analytics
•   Scientific computing (especially bioinformatics)
•   Indexing your slow RDBMS
•   And much more!


                                                       @bachmanm
Neo4j is a Graph Database

So we need to detour through a little
           graph theory



                                        @bachmanm
@bachmanm
Meet Leonhard Euler
    • Swiss mathematician
    • Inventor of Graph
      Theory (1736)




                                       @bachmanm
http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
http://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg   @bachmanm
Property Graph Model
                                  name: Michal Bachman




• nodes / vertices
• relationships / edges
                                  title: Intro to Neo4j
• properties                      duration: 45




                    name: Neo4j           name: NOSQL




                                                          @bachmanm
Graphs are very whiteboard-friendly




                                @bachmanm
@bachmanm
Neo4j




        @bachmanm
32 billion nodes
32 billion relationships
64 billion properties
                           @bachmanm
@bachmanm
http://opfm.jpl.nasa.gov/




                      @bachmanm
http://news.xinhuanet.com




                       @bachmanm
@bachmanm
@bachmanm
Community


  Advanced



    Enterprise


                 @bachmanm
How do I use it?




                   @bachmanm
Getting started is easy
• Single package download, includes server stuff
  – http://neo4j.org/download/
• For developer convenience, Ivy (or whatever):
  –   <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/>




                                                                   @bachmanm
Run it!
• Server is easy to start stop
  – cd <install directory>
  – bin/neo4j start
  – bin/neo4j stop
• Provides a REST API in addition to the other
  APIs we’ve seen
• Provides some ops support
  – JMX, data browser, graph visualisation

                                             @bachmanm
Embed it!
• If you want to host the database in your
  process just load the jars

• And point the config at the right place on disk

• Embedded databases can be HA too
  – You don’t have to run as server



                                             @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo");

Transaction tx = neo.beginTx();
try {
      Node speaker = neo.createNode();
      speaker.setProperty("name", "Michal Bachman");

    Node talk = neo.createNode();
    talk.setProperty("title", "Intro to Neo4j");

    Relationship delivers
         = speaker.createRelationshipTo(talk,
              DynamicRelationshipType.withName("DELIVERS"));
    delivers.setProperty("day", ”Saturday");

      neo.index().forNodes("people")
             .add(speaker, "name", "Michal Bachman");
} finally {
      tx.finish();
}


      name: Michal Bachman                 DELIVERS     title: Intro to Neo4j
                                        day: Saturday

                                                                         @bachmanm
@bachmanm
Core API
• Nodes
  – Properties (optional K-V pairs)
• Relatiosnhips
  – Start node (required)
  – End node (required)
  – Properties (optional K-V pairs)




                                      @bachmanm
All Conference Topics




                        @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
All Conference Topics
    Node webExpo = neo.getReferenceNode();
    for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) {
          Node speaker = talksAt.getStartNode();
          for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) {
                Node talk = delivers.getEndNode();
                for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) {
                      String topicName = (String) about.getEndNode().getProperty(NAME);
                      //add to result...
                }
          }
    }




-------------------
Printing all topics
All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software,
responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design,
marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j,
rest, css, design, publishing, nosql. Took: 2 ms
Which talks should I attend?




                               @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
Which talks should I attend?
   TraversalDescription talksTraversal = Traversal.description()
        .uniqueness(Uniqueness.NONE)
        .breadthFirst()
        .relationships(INTERESTED, OUTGOING)
        .relationships(ABOUT, INCOMING)
        .evaluator(Evaluators.atDepth(2));

   Node attendee =
        neo.index().forNodes("people").get("name", ”Jeremy White").getSingle();

   Iterable<Node> talks = talksTraversal.traverse(attendee).nodes();

   //iterate over talks and print




------------------------------------------
Suggesting talks for 100 random attendees.
...
Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms
Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms
Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms
Suggested talks for 100 random attendees in 449 ms
What do we have in common?




                         @bachmanm
name: Phil Johnson



title: Cognitive Psychology
duration: 30                                               name: Michal Bachman




                                           name: UX



                                                           title: Intro to Neo4j
                                                           duration: 45

    name: Martin Macke




      name: Jeremy White      INTERESTED   name: Neo4j   name: NOSQL




                                                                       @bachmanm
What do we have in common?
      //retrieve attendeeOne and attendeeTwo from index

      int maxDepth = 2;
      Iterable<Path> paths = GraphAlgoFactory
            .allPaths(Traversal.expanderForAllTypes(), maxDepth)
            .findAllPaths(attendeeOne, attendeeTwo);

      for (Path path : paths) {
            //print it
      }



------------------------------------------------------------
Finding things in common for 100 random couples of attendees
...
Karel Kunc and Phil Smith:

(Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith),
(Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith),
(Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith).
Took: 0 ms.
...

Found things in common for 100 random couples of attendees in 142 ms.
Youngsters, Y U No Like Java?




                            @bachmanm
Who is my beer mate?

myself                     beerMate:?




                talk:?



                                 @bachmanm
Who is my beer mate?

(myself)                     (beerMate)




                  (talk)



                                   @bachmanm
Who is my beer mate?
start myself=node:people(name = "Emil Votruba")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                       @bachmanm
Cypher Query
start myself=node:people(name = ”Alex Smart")

match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                       @bachmanm
Cypher Query
start myself=node:people(name = ”Emil Votruba")

match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate)

return distinct beerMate.name, count(beerMate)

order by count(beerMate) desc

limit 5;




                                                   @bachmanm
Who is my beer mate?




                       @bachmanm
Current Research
•   Graph partitioning
•   Graph analytics (“OLAP” and predictive)
•   Performance improvements
•   Query languages
•   MVCC and single-threaded write models
•   ACID (tradeoffs for weakening C and I)
•   Yield and Harvest in distributed systems
•   Application-level
    – Recommendations
    – Protein interactions
    –…

                                               @bachmanm
Questions?
Neo4j: http://neo4j.org
Neo Technology: http://neotechnology.com
Twitter: @bachmanm
Code: git://github.com/bachmanm/neo4j-imperial.git

Weitere ähnliche Inhalte

Andere mochten auch

Finance Tips for New Parents
Finance Tips for New ParentsFinance Tips for New Parents
Finance Tips for New ParentsMiguel Aliaga
 
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationMobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationDavid Eads
 
Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)Dawgen Global
 
TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!Mikalai Alimenkou
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangSpark Summit
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteDatabricks
 
Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesMikalai Alimenkou
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackRed_Hat_Storage
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...DataWorks Summit
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas Tom Goodwin
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Jiří Šimša
 
What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?Marsha Benson
 

Andere mochten auch (18)

Easy AJAX with Java and DWR
Easy AJAX with Java and DWREasy AJAX with Java and DWR
Easy AJAX with Java and DWR
 
Finance Tips for New Parents
Finance Tips for New ParentsFinance Tips for New Parents
Finance Tips for New Parents
 
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop PresentationMobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
Mobile Strategy Partners 2010 Mobile Banking Summit Workshop Presentation
 
Jamaica Personal Income Tax Guide 2016 Edition (1)
Jamaica Personal Income Tax Guide  2016 Edition (1)Jamaica Personal Income Tax Guide  2016 Edition (1)
Jamaica Personal Income Tax Guide 2016 Edition (1)
 
TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!TDD для интеграции с БД легко и просто!
TDD для интеграции с БД легко и просто!
 
Pomodoro technique
Pomodoro techniquePomodoro technique
Pomodoro technique
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Spark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin KeynoteSpark Summit EU 2015: Reynold Xin Keynote
Spark Summit EU 2015: Reynold Xin Keynote
 
Great functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and ThucydidesGreat functional testing with WebDriver and Thucydides
Great functional testing with WebDriver and Thucydides
 
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStackCeph at Work in Bloomberg: Object Store, RBD and OpenStack
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
 
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
Epiphany: Connecting Millions of Events to Thirty Billion Data Points in Real...
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas CES 2016 Trends and Implications - Havas
CES 2016 Trends and Implications - Havas
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
 
What is Architecture?
What is Architecture?What is Architecture?
What is Architecture?
 
CV espanol
CV espanolCV espanol
CV espanol
 

Ähnlich wie Neo4j Introduction at Imperial College London

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLzenyk
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql MovementAjit Koti
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichPatrick Baumgartner
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectDATAVERSITY
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataMaori Ito
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big dataPeter O'Kelly
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 

Ähnlich wie Neo4j Introduction at Imperial College London (20)

An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
 
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow ZurichHow to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
How to use NoSQL in Enterprise Java Applications - NoSQL Roadshow Zurich
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
 
Grails goes Graph
Grails goes GraphGrails goes Graph
Grails goes Graph
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
How to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 

Mehr von Michal Bachman

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Michal Bachman
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkMichal Bachman
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework IntroMichal Bachman
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Michal Bachman
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Michal Bachman
 
Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesMichal Bachman
 
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)Michal Bachman
 

Mehr von Michal Bachman (9)

Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)Recommendations with Neo4j (FOSDEM 2015)
Recommendations with Neo4j (FOSDEM 2015)
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
GraphAware Framework Intro
GraphAware Framework IntroGraphAware Framework Intro
GraphAware Framework Intro
 
Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)Modelling Data as Graphs (Neo4j)
Modelling Data as Graphs (Neo4j)
 
Intro to Neo4j (CZ)
Intro to Neo4j (CZ)Intro to Neo4j (CZ)
Intro to Neo4j (CZ)
 
Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)Modelling Data in Neo4j (plus a few tips)
Modelling Data in Neo4j (plus a few tips)
 
(Big) Data Science
(Big) Data Science(Big) Data Science
(Big) Data Science
 
Neo4j - Tales from the Trenches
Neo4j - Tales from the TrenchesNeo4j - Tales from the Trenches
Neo4j - Tales from the Trenches
 
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)WebExpo Prague 2012 - Introduction to Neo4j (Czech)
WebExpo Prague 2012 - Introduction to Neo4j (Czech)
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Neo4j Introduction at Imperial College London

  • 1. An Introduction to Neo4j Michal Bachman @bachmanm
  • 2. Roadmap • Intro to NOSQL • Intro to Graph Databases • Intro to Neo4j • A bit of hacking • Current research • Q&A @bachmanm
  • 3. Not Only SQL @bachmanm
  • 4. Why NOSQL now? Driving trends @bachmanm
  • 5. Trend 1: Data Size @bachmanm
  • 6. Trend 2: Connectedness GGG Onotologies RDFa Folksonomies Information connectivity Tagging Wikis UGC Blogs Feeds Hypertext Text Documents @bachmanm
  • 7. Trend 3: Semi-structured Data @bachmanm
  • 8. Trend 4: Application Architecture (80’s) Application DB @bachmanm
  • 9. Trend 4: Application Architecture (90’s) App App App DB @bachmanm
  • 10. Application Application Application DB DB DB @bachmanm
  • 11. Side note: RDBMS performance Salary List @bachmanm
  • 13. Key-Value Stores • “Dynamo: Amazon’s Highly Available Key- Value Store” (2007) • Data model: – Global key-value mapping – Big scalable HashMap – Highly fault tolerant (typically) • Examples: – Riak, Redis, Voldemort @bachmanm
  • 14. Pros and Cons • Strengths – Simple data model – Great at scaling out horizontally • Scalable • Available • Weaknesses: – Simplistic data model – Poor for complex data @bachmanm
  • 15. Column Family (BigTable) • Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006) • Data model: – A big table, with column families – Map-reduce for querying/processing • Examples: – HBase, HyperTable, Cassandra @bachmanm
  • 16. Pros and Cons • Strengths – Data model supports semi-structured data – Naturally indexed (columns) – Good at scaling out horizontally • Weaknesses: – Unsuited for interconnected data @bachmanm
  • 17. Document Databases • Data model – Collections of documents – A document is a key-value collection – Index-centric, lots of map-reduce • Examples – CouchDB, MongoDB @bachmanm
  • 18. Pros and Cons • Strengths – Simple, powerful data model (just like SVN!) – Good scaling (especially if sharding supported) • Weaknesses: – Unsuited for interconnected data – Query model limited to keys (and indexes) • Map reduce for larger queries @bachmanm
  • 19. Graph Databases • Data model: – Nodes with properties – Named relationships with properties – Hypergraph, sometimes • Examples: – Neo4j (of course), Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph @bachmanm
  • 20. Pros and Cons • Strengths – Powerful data model – Fast • For connected data, can be many orders of magnitude faster than RDBMS • Weaknesses: – Sharding • Though they can scale reasonably well • And for some domains you can shard too! @bachmanm
  • 21. Social Network “path exists” Performance • Experiment: • ~1k persons # persons query time • Average 50 friends per Relational 1000 2000ms database person Neo4j 1000 2ms • pathExists(a,b) Neo4j 1000000 2ms limited to depth 4 • Caches warm to eliminate disk IO @bachmanm
  • 23. What are graphs good for? • Recommendations • Business intelligence • Social computing • Geospatial • MDM • Systems management • Web of things • Genealogy • Time series data • Product catalogue • Web analytics • Scientific computing (especially bioinformatics) • Indexing your slow RDBMS • And much more! @bachmanm
  • 24. Neo4j is a Graph Database So we need to detour through a little graph theory @bachmanm
  • 26. Meet Leonhard Euler • Swiss mathematician • Inventor of Graph Theory (1736) @bachmanm http://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
  • 28. Property Graph Model name: Michal Bachman • nodes / vertices • relationships / edges title: Intro to Neo4j • properties duration: 45 name: Neo4j name: NOSQL @bachmanm
  • 29. Graphs are very whiteboard-friendly @bachmanm
  • 31. Neo4j @bachmanm
  • 32. 32 billion nodes 32 billion relationships 64 billion properties @bachmanm
  • 38. Community Advanced Enterprise @bachmanm
  • 39. How do I use it? @bachmanm
  • 40. Getting started is easy • Single package download, includes server stuff – http://neo4j.org/download/ • For developer convenience, Ivy (or whatever): – <dependency org="org.neo4j" name="neo4j-community" rev="1.9.M04"/> @bachmanm
  • 41. Run it! • Server is easy to start stop – cd <install directory> – bin/neo4j start – bin/neo4j stop • Provides a REST API in addition to the other APIs we’ve seen • Provides some ops support – JMX, data browser, graph visualisation @bachmanm
  • 42. Embed it! • If you want to host the database in your process just load the jars • And point the config at the right place on disk • Embedded databases can be HA too – You don’t have to run as server @bachmanm
  • 43. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 44. GraphDatabaseService neo = new EmbeddedGraphDatabase("/data/webexpo"); Transaction tx = neo.beginTx(); try { Node speaker = neo.createNode(); speaker.setProperty("name", "Michal Bachman"); Node talk = neo.createNode(); talk.setProperty("title", "Intro to Neo4j"); Relationship delivers = speaker.createRelationshipTo(talk, DynamicRelationshipType.withName("DELIVERS")); delivers.setProperty("day", ”Saturday"); neo.index().forNodes("people") .add(speaker, "name", "Michal Bachman"); } finally { tx.finish(); } name: Michal Bachman DELIVERS title: Intro to Neo4j day: Saturday @bachmanm
  • 45.
  • 47. Core API • Nodes – Properties (optional K-V pairs) • Relatiosnhips – Start node (required) – End node (required) – Properties (optional K-V pairs) @bachmanm
  • 49. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 50. All Conference Topics Node webExpo = neo.getReferenceNode(); for (Relationship talksAt : webExpo.getRelationships(INCOMING, TALKS_AT)) { Node speaker = talksAt.getStartNode(); for (Relationship delivers : speaker.getRelationships(OUTGOING, DELIVERS)) { Node talk = delivers.getEndNode(); for (Relationship about : talk.getRelationships(OUTGOING, ABOUT)) { String topicName = (String) about.getEndNode().getProperty(NAME); //add to result... } } } ------------------- Printing all topics All topics: development, data, advertising, education, usa, business, microsoft, webdesign, software, responsiveness, ux, e-commerce, php, psychology, crm, api, chef, javascript, patterns, product design, marketing, metro, social media, web, startup, analytics, lean, cqrs, node.js, branding, cloud, testing, neo4j, rest, css, design, publishing, nosql. Took: 2 ms
  • 51. Which talks should I attend? @bachmanm
  • 52. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 53. Which talks should I attend? TraversalDescription talksTraversal = Traversal.description() .uniqueness(Uniqueness.NONE) .breadthFirst() .relationships(INTERESTED, OUTGOING) .relationships(ABOUT, INCOMING) .evaluator(Evaluators.atDepth(2)); Node attendee = neo.index().forNodes("people").get("name", ”Jeremy White").getSingle(); Iterable<Node> talks = talksTraversal.traverse(attendee).nodes(); //iterate over talks and print ------------------------------------------ Suggesting talks for 100 random attendees. ... Aneta Lebedova: Measure Everything!, To the USA, The real me. Took: 1 ms Bohumir Kubat: Beyond the polar bear, How (not) to do API, Critical interface design. Took: 1 ms Vladimir Vales: Application Development for Windows 8 Metro. Took: 1 ms Suggested talks for 100 random attendees in 449 ms
  • 54. What do we have in common? @bachmanm
  • 55. name: Phil Johnson title: Cognitive Psychology duration: 30 name: Michal Bachman name: UX title: Intro to Neo4j duration: 45 name: Martin Macke name: Jeremy White INTERESTED name: Neo4j name: NOSQL @bachmanm
  • 56. What do we have in common? //retrieve attendeeOne and attendeeTwo from index int maxDepth = 2; Iterable<Path> paths = GraphAlgoFactory .allPaths(Traversal.expanderForAllTypes(), maxDepth) .findAllPaths(attendeeOne, attendeeTwo); for (Path path : paths) { //print it } ------------------------------------------------------------ Finding things in common for 100 random couples of attendees ... Karel Kunc and Phil Smith: (Karel Kunc)--[INTERESTED]-->(ux)<--[INTERESTED]--(Phil Smith), (Karel Kunc)--[DISLIKED]-->(Be a punk consumer!)<--[DISLIKED]--(Phil Smith), (Karel Kunc)--[DISLIKED]-->(Beyond the polar bear)<--[LIKED]--(Phil Smith), (Karel Kunc)--[LIKED]-->(Shipito.com – business in USA)<--[LIKED]--(Phil Smith). Took: 0 ms. ... Found things in common for 100 random couples of attendees in 142 ms.
  • 57. Youngsters, Y U No Like Java? @bachmanm
  • 58. Who is my beer mate? myself beerMate:? talk:? @bachmanm
  • 59. Who is my beer mate? (myself) (beerMate) (talk) @bachmanm
  • 60. Who is my beer mate? start myself=node:people(name = "Emil Votruba") match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 61. Cypher Query start myself=node:people(name = ”Alex Smart") match (myself)-[:LIKED]->(talk)<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 62. Cypher Query start myself=node:people(name = ”Emil Votruba") match (myself)-[:LIKED]->()<-[:LIKED]-(beerMate) return distinct beerMate.name, count(beerMate) order by count(beerMate) desc limit 5; @bachmanm
  • 63. Who is my beer mate? @bachmanm
  • 64. Current Research • Graph partitioning • Graph analytics (“OLAP” and predictive) • Performance improvements • Query languages • MVCC and single-threaded write models • ACID (tradeoffs for weakening C and I) • Yield and Harvest in distributed systems • Application-level – Recommendations – Protein interactions –… @bachmanm
  • 65. Questions? Neo4j: http://neo4j.org Neo Technology: http://neotechnology.com Twitter: @bachmanm Code: git://github.com/bachmanm/neo4j-imperial.git

Hinweis der Redaktion

  1. WelcomeIntroduce myself, NeoTechMotivations:Presented this at a conference Conversations with FriendsTalked to Serena, no affiliationBigData and NOSQL popular termsGraphs are getting more and more popular (Facebook)Not much attention at ImperialAsk about the audience, heard about graph databases? Graphs? Databases?Outcomes:Learn about a new technologySee application of graph theory in practiceTailored to students (not industry)Agenda:Intro to NOSQLIntro to Graph DatabasesIntro to Neo4jPractical part – how to work with oneReal experiencesCurrent researchQ &amp; A
  2. Why now?Not woke up one day thinking Rel DBs are not cool any moretrends
  3. Generate, process, store and work with
  4. UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)– každýkousíček, každájednotkazajímavýchdat je sémantickypropojená s každoudalšízajímavoujednotkoudat (Tim Berners-Lee)Data jsoupropojenější (lineárně)RDFa (Resource Description Framework in attributes), českysystémpopisuzdrojů v atributech, je technologie pro přenosstrukturovanýchinformacíuvnitřwebovýchstránek. RDFa je jedenzezpůsobůzápisu (serializace) datovéhoformátu Resource Description Framework (RDF). Ontologie je v informaticevýslovný (explicitní) a formalizovanýpopisurčitéproblematiky. Je to formální a deklarativníreprezentace, kteráobsahujeglosář (definicipojmů) a tezaurus (definicivztahůmezijednotlivýmipojmy). Ontologie je slovníkem, kterýslouží k uchovávání a předáváníznalostitýkající se určitéproblematiky.
  5. Data losing predictable structureIndividualisation of data, can’t box each individual, want data about meShape of data, less predictable structureDecentralisation of data creation accelerates this trend
  6. Apps can choose what makes sense to store the data
  7. This is strictly about connected data – joins kill performance there.No bashing of RDBMS performance for tabular transaction processing
  8. Krásavesvětě NOSQL - nikdovámnepřikazuje, vybratdatabázi, kteráodpovídátypučicharakteristicedat, se kterýmipracujete. key-value databáze: jedenklíč - jednahodnota, hash mapy, Redis, Riak (Amazon Dynamo), Většinouvysocetolerantnívůčivýpadkům, Jednoduchýdatový model, Vynikajícíhorizontálníškálovatelnost, Dostupnost, BigTabledatabáze: k-vvvvvvv store s implicitnímiindexy, Cassandra (Google), PodporačástečněstrukturovanýchdatAutomatický index (sloupce), Dobráhorizontálníškálovatelnost, opětnevhodné pro propojená dataDokumentovédatabáze, známá je například subversion, MongoDB, CouchDB, …Kolekcedokumentů, Dokument je kolekce key-value párů, Index je důležitý, hodně map-reduce,Škálovatelnostcelkemdobrá. (Ne takjako key-value, složitějšímdatovýmmodelem, Jednoduchý a výkonýdatový model, jako subversion.Nevýhodouvšech 3 je nejsouúplněvhodné pro hustěpropojená data. Přílišjednoduchýdatový (HashMap, rychlá, ale…) model znamená, žechceme-li získatjakékolivokamžitéhlubšíporozuměníuloženýmdatům. Musí to býtzodpovědnostíaplikačnívrstvy (čili to musímenějaknaprogramovat). Velmičastojsoutedytytodatabázespojeny s frameworkyjako Map-Reduce, pro kterémusímevytvořitúlohy, kterénámtotoporozuměníumožnízískat.Map-reduce je dávkováoperace (to bychuvedl v kontrastu s on-line / in-the-click-stream synchronníoperací), abystezískalipohlednavašepropojená data.Všechny 3 pracují s agregovanýmidaty, tzn. Ževyžadujístruktutupředem, data, kterápatřílogicky k sobě (jakoobjednávka a jejíjednotlivépoložky), jsou v databáziuloženy u sebe a je k nimtaké v dotazechpřistupovánojako k celku. V key-value úložištích je tímcelkemhodnota, v CF CF a v Dok. Dbsdokumenty.OKvpřípadech, kdypřístup k datůmvyžadujepřesnětutostrukturu. Pokud se ale chcemena data podívatjinak, napříkladanalyzovat z objednávekcelkovéprodejejednotlivýchproduktů, musíme s toustrukturoutrochubojovat a to je ten důvod, proč se tolikmluví o map-reduce vespojení s těmitodatabázemi. Výhodouukládánídat v neagregovanýchformách je to, že se dajíanalyzovat a prezentovat z různáchúhlůpohledy v závislotinakonkrétnímpřípadě.A samozřejměgrafovédatabáze, kvůlikterýmtudnesjsme a o kterých se tohodozvíme o něcovíczaminutku
  9. History – Amazon decide that they always wanted the shopping basket to be available, but couldn’t take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
  10. Mongo DB has a reputation for taking liberties with durability to get speedCouch DB has good multimaster replication from Lotus Notes
  11. People talk about Codd’s relational model being mature because it was proposed in 1969 – 42 years old.Euler’s graph theory was proposed in 1736 – 275 years old.
  12. Can’t easily shard graphs like documents or KV stores.This means that high performance graph databases are limited in terms of data set size that can be handled by a single machine.Can use replicas to speed things up (and improve availability) but limits data set size limited to a single machine’s disk/memory.Some domains can shard easily (.e.g geo, most web apps) using consistent routing approach and cache sharding – we’ll cover that later.
  13. Teoriegrafůzkoumávlastnostistruktur, zvanýchgrafy. Ty jsoutvořenyvrcholy, kteréjsouvzájemněspojenéhranami. Znázorňuje se obvyklejakomnožinabodůspojenýchčárami. Formálně je grafuspořádanoudvojicímnožinyvrcholů V a množinyhran E.
  14. SedmmostůměstaKrálovce (dnes Kaliningrad)Kdodělá pro velkoufirmu, tímmyslímněkolikvrstevmanagementu, softwarovýarchitektnajinémpatřenežvývojářiTatoinformace je pro Vás, v těchtofirmáchbývátěžképrosadit “nové” technologie. Ale relační model, se kterýmpřišel E.F. Codd v roce 1969, je pouze 43 let starý. Grafový model je 276 starý. TakžepříštěažVámšéfnebochytrýarchitektřeknenaadopci NOSQL něcovesmyslu “tadypoužívámejenomzralé a prokázanévyspělétechnologie”, víte, kterýmsměrem ho máteposlat… tímmámnamyslitřebatutopřednáškunawebunebopříslušnéstránkynawikipedii. Takžejakukládáme data v grafu…
  15. Takžejakukládáme data v grafu…V grafuukládámedata jakovrcholy a vrcholyjsouvlastnědokumenty, kterémodoumítlibovolnéklíče a k nimpřiřazenéhodnoty. Stejnějakodokument v MongoDB. V čem se grafliší od MongoDB je že v grafujsouvztahymezivrcholy. A to je trade-off, MongoDB je lépeškálovatelné, protožetohlenedělá. Neo4J je lepší pro propojená data, tohledělá. Ukládávztahymezijednotlivýmivrcholy. Ale nenítakdobřeškálovatelné. A do musímevzít v potazpřiřešeníVašichproblémů: chcetemasivníškálovatelnost, nebookamžitýnáhled do propojenostiVašich dat. POPSAT GRAFVztahymajisemantickyvyznam! Recnici, prednasky v RDBMSJe to poměrněintuitivnízpůsobukládánídat! Úkolgrafovédatabáze je vzíttatointuitivní data, kterásimůžemejednodušenačrtnoutnatabulinebokuspapíru a rychle je procházetvevašichprogramech.
  16. A to je jednahezkávlastnostgrafů – jsouideální pro tabule,zadnístranyobálek, pivníchtácků a krabiček od cigaret… to jsouvěci, nakterýchtynejlepšídesigny (zejménavestartupech) většinouvznikajíJájsemsivybraljakopříkladWebExpo, původnějsemchtělzmapovatkorupčníaféryčeskýchpolitiků, ale tohle je o něconeškodnější. Vztahymeziřečníky, přednáškam, tématy, účastníky a podobněsimůžemenakreslitnapivnítácek! WebExpo je doména,kterámáspoustuvztahů – řečnícimajípřednášky, …To simůžetejednodušenakreslitnatabuli, to je mimochodem to, co dělámejakoprogramátoři, kdyžsedíme s lidmi, kteřípotřebujínějakýkussoftwaru a my se snažímetomu business problému, tédoméněporozumět. Sednemsi k tabuli, nakreslímezákazníky, objednávky, faktury, produkty a podobně a vztahymezinimi!A co udělámepak – vezmemenášpěkný design a denormalizujeme ho. Potíme se vymýšlením, jak to všechnonaládujeme do tabulek. A jsmešťastní a usměvaví, než to zpustímenaživo, do provozu…. A ono to bežíjakželva… Co uděláme? Denormalitzujemenáš model! Všechnaenergie, kteroujsmeinvestovali, krev, pot a slzy, všechno v niveč. U grafovédatabáze, to co je napapíře je přesně to, co naházíte do databáze.
  17. To neznamená,žejsteomluveni s designovéfáze. Pořád se musítehlubocezamysletnadtím, jaké entity (neboobjekty) tvořívašidoménu a jakéjsoumezinimivztahy! Stálepotřebujete design.Nemůžetejednoduševzít data ztabulek, kterámáte a násilím je natřískat do vašízbrusunovégrafovédatabáze. Člověkmusízačítmyslet v nódách a vztazích.Přinavrhovánídatovéhomodelu pro WebExpomusímeudělathodnědesignovýchrozhodnutí: jakodlišitřečníky od účastníků? A je to vůbecpotřeba? Udělatzepátka a sobotynódy, nebojenomvlastnostnajednotlivýchpřednáškách?Stálemusítedělat design, ale pointa je že design datovéhomodelu pro grafovoudatabázimůžebýtpříjemná a přirozenázkušenost.
  18. Stará se proVás o nódy, vztahymezinimi a indexy.Neo4j je stabilní a běží od roku 2003ProcházíaktivnímvývojemPrimárně pro Javu, ale použitelná se spoustoudalšíchtechnologiíIdeální pro škáludesítekserverů v clusteru, ne pro stovkyPro hustěpropojená data, není to KV store
  19. 32 billion nodes, 32 billion relationships, 64 billion properties
  20. Plně a militantně ACID. Kdoneví, co to znamená?Rychlevysvětlit: atomicity, consistency, isolation, durabilityNěkterédalší NOSQL databáze se vzdávajíněkterýchgarancíveprospěchvýkonu, u Neo4j tohlevypnoutnejde. Data jsouvždyzapsánana disk.
  21. Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
  22. Vyhledatzacatek v indexu (Lucene)Prozkoumavatokoli
  23. Neo mázabudovanoucelouknihovnugrafovýchalgoritmů, jakonejkratšícesta, všechnycesty, atp
  24. 1m hops zasekundunanormálnímlaptopu, žádnýrozdílpřiznásobenípočtudatHigh performance graph operationsTraverses 1,000,000+ relationships / second on commodity hardware
  25. Obecněpokudpoužíváte MySQL a neplatítezaněj, nebudeteplatitaniza Neo.
  26. Pojďmesikázatpoužití v embedded módunakonkrétnímpříkladu. Vytvořiljsemgraf z webexpa, řečníci a přednáškyjsouopravdové, 1000 účastníkůmánáhodněvygenerovanájména. Popsatgraf a scénář.KdonečteJavuKodbudenagithubu
  27. Vztahymůžoubýtbuďřetězceznaků, neboEnum, kterévámdajívýhodustatickéhotypování v IDE, pro Neo4j v tom nenížádnýrozdíl.Postupopakujemedokudnemámecelýgraf
  28. Tohle je screenshot z webovékonzole, kdemůžemegrafvizálněprocházet. Běžínalaptopu, dámVámnakonci URL, abystesi s tímmohlipohrát.Tak, mámegraf, ale jak z nějteďdostaneme data ven?
  29. Existujeněkolikzpůsobů,jakpsátdotazy v Neo4j, liší se čitelností, složitostí, výkonem a úrovníabstrakce. UkážuVámněkterézezpůsobů a začnuodspoda, tzn. On nativníhonejrychlejšího API.
  30. Core API pracujepřímo s jednotkami, kteréjsme do databázeuložili – vrcholy, hrany a jejichvlastnosti.
  31. Podívejme se ještějednounavelýgraf. Novýgrafmávždyjednunódu s ID 0, z téjsmeudělalliWebExpo.
  32. Tohle je imperativní API, všechnupráciděláprogramátor, je nejvýkonnější
  33. Pojďme se podívat o úroveňvýš co se abstrakcetýčenatakzvané traversal API, kterénámumožnípsátdotazydeklarativně, to znamenápopsat, jakchcemegrafprocházet. Samotnéprocházeníudělá Neo4J zanás.
  34. Můžemepsátvlastníevaluatory
  35. Dalšípovedenoufunkcí je knihovnaalgoritmů pro hledánícestmezidvěmauzly.
  36. Takénejkratšícesta, Dijkstra a další
  37. Těžké pro neprogramátory, pojďmě se podívatnaněcojednoduššího
  38. Na nejvyššíúrovniabstrakce Neo4j zprostředkovávásvůjvlastníjazyk pro psanídotazů, částečněinspirovaný SQL. Ten jazyk se jmenuje Cypher a rozumílidskyčitelnýmpříkazům, jakonapříkladtomu, kterýtadyteďvidíte.
  39. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  40. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  41. Musímenědezačít, napomocsivezmeme index s názvem people, kdenajdemepanaEmilaVotrubupodlejména.Dálemusímeupřesnit, co za data vlastněchcemezískat, v tomtopřípadějménočlověka a skóre, kolikvěcímámespolečnýchNakonecasinechcemejítnapivoúplně se všemi, ale janomřekněme s 5 lidmi, se kterýmitohomámespolečnéhonejvícAsividítevliv SQL----- Meeting Notes (09/09/2012 20:18) -----animace
  42. A výsledek pro panavotrubu.
  43. “Tales from the Trenches” for further tips