SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Social graphs and discovery. At scale.
Jason Lucas, Scalability Architect, Tagged
Stig is...
•   A very large-scale non-SQL database...
    •   But it speaks and can emulate SQL
•   A graph-oriented data store...
    •   But can look like a key value store, relational tables, file
        system

•   A foundation for building general web applications...
    •   But it particularly excels at social apps
•   A distributed system with a shared-nothing architecture...
    •   But it gives developers an easy-to-manage path to data
•   A solution to complex problem of CAP-limited systems...
Part 1: Stig Project Goals
Part 2: Stig Concepts
Part 3: Lunch Workshop
Part 1: Stig Project Goals
 •   Facilitate the developer.
     • Be like a good waiter
     • Easy should be easy, complex should be possible
     • Untangle some existing messes
 •   Scale like crazy. (Without driving ops crazy.)
     • Go big
     • Go fast
     • Go smooth
 •   Exceed expectations.
     • Enable previously unthinkable features
Facilitate the developer
•   Decrease the burden.
    • Provide a single path to data
    • Create a uniform representation available to multiple
      application languages
    • Reduce the need for “defensive programming”
•   Enforce consistency.
    • Re-introduce atomic transactions
    • Control assumptions with assertions
•   Promote correctness.
    • Provide a more robust data representation
    • Support unit testing
Facilitate the developer
•   Offer power in simplicity.
    • Offer a robust expression language
    • Describe effects rather than details of distribution
•   Above all else:
    • “I want to feel like I'm doing a good job.”
Scale like crazy...
•   Use a distributed architecture.
    • Shard data over multiple machines
    • Use commodity hardware
    • Scale as linearly as possible
    • Use replicas to speed average access
•   Move queries to data.
    • Decompose queries by separating areas of concern
    • Farm sub-queries to shards which hold the relevant data
    • Use comprehensions instead of realizations when
      possible
Scale like crazy...
•   Build for the web.
    •  Provide durable sessions
    •  Allow clients to disconnect and reconnect at will
    •  Continue running in the background
•   Increase concurrency.
    •  Break large objects down into smaller ones
    •   Escrow deltas around fields which are partitioned or
        contentious
    •   Use assertions instead of locks to permit interleaving of
        operations
Without driving Ops crazy
•   Be highly available.
    • Replicate storage across multiple machines
    • Shift responsibilities between machines transparently
    • Bring machines back into service transparently
•   Tolerate partitioning.
    • Fall back transparently to lower levels of service
    • Reconcile database automatically when partitions rejoin
Without driving Ops crazy
•   Simplify maintenance.
    •  Tolerate unreliable hardware
    •  Make software upgrades easy to manage
    •  Be flexible with regard to physical topology
    •  Make system status, performance, and capacity easy to
       measure and comprehend
    •   Degrade gracefully under load
    •   To the greatest degree possible, make the system
        maintain itself
Exceed expectations
•   Enable previously unthinkable features.

    •   Don’t include histories in your schemas; the database
        keeps histories
    •   Design apps with real-time, multi-user
        communications; database sessions are “chatty”
    •   Feel free to compute Erdős Numbers or routes to
        Kevin Bacon
    •   Test for the existence of interesting data states in
        constant time, not log time
    •   Execute queries in time proportionate to the size of
        the answer, not the size of the database
Exceed expectations
•   Decrease development cycle time.

    •   Build working apps on your desktop; the database can
        be simulated
    •   Evolve your schema at will; the database doesn’t make
        a distinction between data and metadata
    •   Use any language you like; the database looks the same
        from all clients
Part 2: Stig Concepts
•   Representing Graphs
•   Deconstructing Commits
•   Making Time Flow
•   Finding Meaning
•   Querying
Representing Graphs...
             Without Stig
•   Graphs in Tables.
    • Walks spread outward in waves
    • Self-joins proliferate
•   Graphs Key-Value Stores.
    • Generally node-centric
    • Edges are denormalized conjugate sets
    • Non-transactional multi-set is deadly
•   Graphs in XML Stores.
    • Floating chunk syndrome
    • Worst of both worlds
•   Graphs in Doc & Graph Stores.
    • Typeless, interned at nodes
Representing Graphs...
                With Stig
              Locations, Nodes & Edges

  /user/alice@foo.bar              /user/bob@baz.gak

                 pets    owns      pets
 person         player            player        person


 mafia                             mafia
 player                           player
Deconstructing Commits...
             Without Stig
•Two States.
    • Uncommitted: only me
    • Committed: everybody else
    • One sandbox per connection
•   Variable Isolation.
    • High isolation limits concurrency
    • Low isolation hard to cope with
•   Two Guarantees.
    • Written to disk
    • Ephemeral
•   Some NoSQL Options.
    • No transactional integrity
    • Post-hoc reconciliation
Deconstructing Commits...
                With Stig
•   Private.
    •  Only me, but I get as many as I want; maybe ephemeral
•   Shared.
    •  Restricted scope, rapid communication; maybe
       ephemeral
•   Global.
    •  A singleton, same as commit
•   Guarantees
    •  Self-consistent
    •  Replicated in data center
    •  Written to disks
    •  Replicated to other data centers
Deconstructing Commits...With Stig
                  Points of View in Diplomacy

        Alice                   Bob               Carol
      (Private)               (Private)         (Private)




             Alice/Bob Alliance
                  (Shared)




                           Diplomacy Game
                               (Shared)




                                  (Global)
Making Time Flow...
             Without Stig
•   Time Flows Naturally.
    • System clock is OK
•   Execution Time ≈ Query Time.
    • A query made after an update will see the results of the
      update because time flow is linear
    • The order of events is definite
•   Locks Enforce Consistency.
    • Updates block each other
•   MVCC in Lieu of Locks.
    • Reads are writes
    • Collisions are rollbacks
Making Time Flow...
                With Stig
•   Time is Uncertain.
    •   Distributed machines cannot rely on their system clocks
•   Declared Dependencies.
    •   Each query declares its predecessors, so causality is a graph
    •   The order of events is unknowable, but any topological sort
        of the graph is OK
•   Assertions Enforce Consistency.
    •   MVCC with Paxos facilitates time travel
    •   Query: seek a time in the past at which assertions are true
    •   Update: seek a time in the future at which assertions are still
        true
Making Time Flow...
                With Stig
                Checkout Time
                        Enter
                        Credit
                        Card



  Display
            Request                         Confirm
 Shopping
            Gift Wrap                        Order
   Cart
                                 Specify
                                 Shipping
            Update
            Qty. of
             Item
Finding Meaning...
             Without Stig
•   Tables & Views.
    • Tables store the base data
    • Views collect data from tables and other views
    • Views often present performance bottlenecks
•   Analysis Belongs to Data Definition.
    • Adding or changing a view or index is a schema change
    • Programmers must work with DBAs, limiting individual
      initiative
    •   Changes have the potential to degrade the data service
        as a whole
Finding Meaning...
                 With Stig
•   Asserted & Inferred Edges.
    • Asserted edges store the base data
    • Inferred edges collect data from asserted and inferred
      edges
    •   Inference is distributed, on-going, and subject to time-
        travel
•   Analysis Belongs to Program Definition.
    • Inference rules aren’t “special”
    • Programmers can invent as they like
    • Scope of risk is limited
Finding Meaning...
                 With Stig
                 Inferring Friends & Stalkers
    Alice                                             Alice


                   <a, ‘is friend of’ b>
has friendship       if <a, ‘has friendship’, x>
                     and <b, ‘has friendship’, x>
                     and a is not b;
      x                                             is friend of
                   <a, ‘is stalking’ b>
                     if <a, ‘is friend of’, b>
                     and a.age >= 18
has friendship
                     and b.age < 18;


    Bob                                               Bob
Querying...
                                    Without Stig
•   SQL
    • Easy-to-use, commonly known, and mostly harmless
    • Suffers from poor composability and is useless as a
      general-purpose programming language
•   Map-Reduce, Erlang, etc.
    • Not so easy-to-use, not so commonly known, and
      capable of shooting you in the foot
    •   Often requires knowledge of underlying distributed
        architecture and are still not front-runners as general-
        purpose programming languages
Querying...
                                             With Stig
•   Robust and General-Purpose Language.
    • Purely functional, lazily evaluated, and strictly, robustly
      typed
    • Pattern-oriented notation for describing walks across
      graph
•   Composability Rules.
    • Comprehensions of sequences form the foundation
    • Transformations of sequences (map, reduce, filter, zip,
      etc.) are the building blocks
•   Distributed Evaluation Rocks.
    • Queries are broken down and sent to the servers where
      they need to be
    •   Evaluation occurs in parallel
Querying...
                                         With Stig
•   Compiled & Stored.
    • Queries compile down to machine code and get stored
      in the graph itself
    •  Stored programs are subject to on-going analysis
    •  Programs can call each other
•   Library-Driven.
    •  Language fundamentals support construction of libraries
    •  We can emulate other languages, such as LINQ and
       Python
•   Clients.
    •  Currently Java, Perl, PHP, Python, and C/C++
    •  We can also serve HTTP directly
Querying...
                                       With Stig
                     Mutual Friends
o /* function definition */
  mutual_friends x y = solve f: [
    <x, ‘is friend of’, f>;
    <y, ‘is friend of’, f> ];
o /* function application */
  mutual_friends person@/users/alice
  person@/users/bob;
o /* results */
  [ { f = person@/users/carol },
    { f = person@/users/dave } ];
Wrapup
Is your project...?
•   Graph-shaped?
    •   Representing graphs as graphs (instead of as tables or key
        pairs) simplifies your life

    •   Stig graphs are fat, meaning they're really any number of
        simultaneous, intersecting graphs, so go nuts
•   Transactional?
    •  Reliably atomic state transitions also simplify your life
    •  Asynchronous transaction management makes it more
       tolerable
•   Real-time?
    • Control the influence of updates with shared points-of-
      view
    •   Never be blocked waiting for the database to respond
Is your project...?
•   Really huge?
    • The store scales very close to linearly, so more data just
      means more machines
    •   The size of the cluster doesn't generally doesn't affect
        the performance of individual operations
•   Deeply analytic?
    • Use inferences to describe relations and conditions
      you're interested in
    •   Build up arbitrarily complex libraries of inference to
        extract meaning from data
Open-sourcing this year!
•   About our Code.
    • Written in C++0x and Haskell, with Python for tools
    • Entirely unit-test driven and designed for easy adoption
•   Why Open Source?
    • We want to give back
    • We benefit first and most
    • Competitive advantage would be temporary anyway
    • Knowing it’s open keeps us on our toes
    • There’s more to do than we can do ourselves
    • We attract the kind of people we want to work with
Our doors are open
•   About Tagged
    • #3 in social networking and growing (100+ Million
      members)
    •   Located in downtown SF, 10 Ten Places to work by San
        Francisco Business Journal
    •   Profitable since 2008. We answer only to ourselves and
        our users
Our doors are open




•   About the Stig Team.
    • Five full-time engineers with backgrounds in compilers,
      databases, distributed systems, and AI
    •   Interns year-round with opportunities to publish
    •   And yes, we're hiring!
Got ideas?
•   Contact us!
    • Sign up for Stig news at: www.stigdb.org
    •   Follow the Tagged Dev Blog at: blog.tagged.com
    •   Jason Lucas
        Architect of Scalable Infrastructure
        jlucas@tagged.com
Part 3: Lunch Workshop
•   But wait, there’s more!
•   Join us as we get our hands messy with food and take a
    deep dive into the Stig query language and the Stig API!
•   Lunch 'N Learn 01:15 PM - 02:15 PM

Weitere ähnliche Inhalte

Was ist angesagt?

Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherJohn Wood
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Softwarejazoon13
 
The Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native ApplicationsThe Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native ApplicationsJonas Bonér
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsAchievers Tech
 
Spil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startupSpil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startupart-spilgames
 
BP-8 Global Federation and Search
BP-8 Global Federation and SearchBP-8 Global Federation and Search
BP-8 Global Federation and SearchAlfresco Software
 
Introduction to Java 7 (Devoxx Nov/2011)
Introduction to Java 7 (Devoxx Nov/2011)Introduction to Java 7 (Devoxx Nov/2011)
Introduction to Java 7 (Devoxx Nov/2011)Martijn Verburg
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 

Was ist angesagt? (12)

redis
redisredis
redis
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
 
The Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native ApplicationsThe Reactive Principles: Design Principles For Cloud Native Applications
The Reactive Principles: Design Principles For Cloud Native Applications
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
JavaFX 101
JavaFX 101JavaFX 101
JavaFX 101
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 
Qcon talk
Qcon talkQcon talk
Qcon talk
 
Spil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startupSpil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startup
 
BP-8 Global Federation and Search
BP-8 Global Federation and SearchBP-8 Global Federation and Search
BP-8 Global Federation and Search
 
Introduction to Java 7 (Devoxx Nov/2011)
Introduction to Java 7 (Devoxx Nov/2011)Introduction to Java 7 (Devoxx Nov/2011)
Introduction to Java 7 (Devoxx Nov/2011)
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 

Ähnlich wie Stig: Social Graphs & Discovery at Scale

The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleChristophe Grand
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's ArchitectureTony Tam
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsJon Haddad
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012Sean Laurent
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 

Ähnlich wie Stig: Social Graphs & Discovery at Scale (20)

The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
FP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit HoleFP Days: Down the Clojure Rabbit Hole
FP Days: Down the Clojure Rabbit Hole
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 

Mehr von DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Mehr von DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Kürzlich hochgeladen

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 

Kürzlich hochgeladen (20)

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 

Stig: Social Graphs & Discovery at Scale

  • 1. Social graphs and discovery. At scale. Jason Lucas, Scalability Architect, Tagged
  • 2. Stig is... • A very large-scale non-SQL database... • But it speaks and can emulate SQL • A graph-oriented data store... • But can look like a key value store, relational tables, file system • A foundation for building general web applications... • But it particularly excels at social apps • A distributed system with a shared-nothing architecture... • But it gives developers an easy-to-manage path to data • A solution to complex problem of CAP-limited systems...
  • 3. Part 1: Stig Project Goals Part 2: Stig Concepts Part 3: Lunch Workshop
  • 4. Part 1: Stig Project Goals • Facilitate the developer. • Be like a good waiter • Easy should be easy, complex should be possible • Untangle some existing messes • Scale like crazy. (Without driving ops crazy.) • Go big • Go fast • Go smooth • Exceed expectations. • Enable previously unthinkable features
  • 5. Facilitate the developer • Decrease the burden. • Provide a single path to data • Create a uniform representation available to multiple application languages • Reduce the need for “defensive programming” • Enforce consistency. • Re-introduce atomic transactions • Control assumptions with assertions • Promote correctness. • Provide a more robust data representation • Support unit testing
  • 6. Facilitate the developer • Offer power in simplicity. • Offer a robust expression language • Describe effects rather than details of distribution • Above all else: • “I want to feel like I'm doing a good job.”
  • 7. Scale like crazy... • Use a distributed architecture. • Shard data over multiple machines • Use commodity hardware • Scale as linearly as possible • Use replicas to speed average access • Move queries to data. • Decompose queries by separating areas of concern • Farm sub-queries to shards which hold the relevant data • Use comprehensions instead of realizations when possible
  • 8. Scale like crazy... • Build for the web. • Provide durable sessions • Allow clients to disconnect and reconnect at will • Continue running in the background • Increase concurrency. • Break large objects down into smaller ones • Escrow deltas around fields which are partitioned or contentious • Use assertions instead of locks to permit interleaving of operations
  • 9. Without driving Ops crazy • Be highly available. • Replicate storage across multiple machines • Shift responsibilities between machines transparently • Bring machines back into service transparently • Tolerate partitioning. • Fall back transparently to lower levels of service • Reconcile database automatically when partitions rejoin
  • 10. Without driving Ops crazy • Simplify maintenance. • Tolerate unreliable hardware • Make software upgrades easy to manage • Be flexible with regard to physical topology • Make system status, performance, and capacity easy to measure and comprehend • Degrade gracefully under load • To the greatest degree possible, make the system maintain itself
  • 11. Exceed expectations • Enable previously unthinkable features. • Don’t include histories in your schemas; the database keeps histories • Design apps with real-time, multi-user communications; database sessions are “chatty” • Feel free to compute Erdős Numbers or routes to Kevin Bacon • Test for the existence of interesting data states in constant time, not log time • Execute queries in time proportionate to the size of the answer, not the size of the database
  • 12. Exceed expectations • Decrease development cycle time. • Build working apps on your desktop; the database can be simulated • Evolve your schema at will; the database doesn’t make a distinction between data and metadata • Use any language you like; the database looks the same from all clients
  • 13. Part 2: Stig Concepts • Representing Graphs • Deconstructing Commits • Making Time Flow • Finding Meaning • Querying
  • 14. Representing Graphs... Without Stig • Graphs in Tables. • Walks spread outward in waves • Self-joins proliferate • Graphs Key-Value Stores. • Generally node-centric • Edges are denormalized conjugate sets • Non-transactional multi-set is deadly • Graphs in XML Stores. • Floating chunk syndrome • Worst of both worlds • Graphs in Doc & Graph Stores. • Typeless, interned at nodes
  • 15. Representing Graphs... With Stig Locations, Nodes & Edges /user/alice@foo.bar /user/bob@baz.gak pets owns pets person player player person mafia mafia player player
  • 16. Deconstructing Commits... Without Stig •Two States. • Uncommitted: only me • Committed: everybody else • One sandbox per connection • Variable Isolation. • High isolation limits concurrency • Low isolation hard to cope with • Two Guarantees. • Written to disk • Ephemeral • Some NoSQL Options. • No transactional integrity • Post-hoc reconciliation
  • 17. Deconstructing Commits... With Stig • Private. • Only me, but I get as many as I want; maybe ephemeral • Shared. • Restricted scope, rapid communication; maybe ephemeral • Global. • A singleton, same as commit • Guarantees • Self-consistent • Replicated in data center • Written to disks • Replicated to other data centers
  • 18. Deconstructing Commits...With Stig Points of View in Diplomacy Alice Bob Carol (Private) (Private) (Private) Alice/Bob Alliance (Shared) Diplomacy Game (Shared) (Global)
  • 19. Making Time Flow... Without Stig • Time Flows Naturally. • System clock is OK • Execution Time ≈ Query Time. • A query made after an update will see the results of the update because time flow is linear • The order of events is definite • Locks Enforce Consistency. • Updates block each other • MVCC in Lieu of Locks. • Reads are writes • Collisions are rollbacks
  • 20. Making Time Flow... With Stig • Time is Uncertain. • Distributed machines cannot rely on their system clocks • Declared Dependencies. • Each query declares its predecessors, so causality is a graph • The order of events is unknowable, but any topological sort of the graph is OK • Assertions Enforce Consistency. • MVCC with Paxos facilitates time travel • Query: seek a time in the past at which assertions are true • Update: seek a time in the future at which assertions are still true
  • 21. Making Time Flow... With Stig Checkout Time Enter Credit Card Display Request Confirm Shopping Gift Wrap Order Cart Specify Shipping Update Qty. of Item
  • 22. Finding Meaning... Without Stig • Tables & Views. • Tables store the base data • Views collect data from tables and other views • Views often present performance bottlenecks • Analysis Belongs to Data Definition. • Adding or changing a view or index is a schema change • Programmers must work with DBAs, limiting individual initiative • Changes have the potential to degrade the data service as a whole
  • 23. Finding Meaning... With Stig • Asserted & Inferred Edges. • Asserted edges store the base data • Inferred edges collect data from asserted and inferred edges • Inference is distributed, on-going, and subject to time- travel • Analysis Belongs to Program Definition. • Inference rules aren’t “special” • Programmers can invent as they like • Scope of risk is limited
  • 24. Finding Meaning... With Stig Inferring Friends & Stalkers Alice Alice <a, ‘is friend of’ b> has friendship if <a, ‘has friendship’, x> and <b, ‘has friendship’, x> and a is not b; x is friend of <a, ‘is stalking’ b> if <a, ‘is friend of’, b> and a.age >= 18 has friendship and b.age < 18; Bob Bob
  • 25. Querying... Without Stig • SQL • Easy-to-use, commonly known, and mostly harmless • Suffers from poor composability and is useless as a general-purpose programming language • Map-Reduce, Erlang, etc. • Not so easy-to-use, not so commonly known, and capable of shooting you in the foot • Often requires knowledge of underlying distributed architecture and are still not front-runners as general- purpose programming languages
  • 26. Querying... With Stig • Robust and General-Purpose Language. • Purely functional, lazily evaluated, and strictly, robustly typed • Pattern-oriented notation for describing walks across graph • Composability Rules. • Comprehensions of sequences form the foundation • Transformations of sequences (map, reduce, filter, zip, etc.) are the building blocks • Distributed Evaluation Rocks. • Queries are broken down and sent to the servers where they need to be • Evaluation occurs in parallel
  • 27. Querying... With Stig • Compiled & Stored. • Queries compile down to machine code and get stored in the graph itself • Stored programs are subject to on-going analysis • Programs can call each other • Library-Driven. • Language fundamentals support construction of libraries • We can emulate other languages, such as LINQ and Python • Clients. • Currently Java, Perl, PHP, Python, and C/C++ • We can also serve HTTP directly
  • 28. Querying... With Stig Mutual Friends o /* function definition */ mutual_friends x y = solve f: [ <x, ‘is friend of’, f>; <y, ‘is friend of’, f> ]; o /* function application */ mutual_friends person@/users/alice person@/users/bob; o /* results */ [ { f = person@/users/carol }, { f = person@/users/dave } ];
  • 30. Is your project...? • Graph-shaped? • Representing graphs as graphs (instead of as tables or key pairs) simplifies your life • Stig graphs are fat, meaning they're really any number of simultaneous, intersecting graphs, so go nuts • Transactional? • Reliably atomic state transitions also simplify your life • Asynchronous transaction management makes it more tolerable • Real-time? • Control the influence of updates with shared points-of- view • Never be blocked waiting for the database to respond
  • 31. Is your project...? • Really huge? • The store scales very close to linearly, so more data just means more machines • The size of the cluster doesn't generally doesn't affect the performance of individual operations • Deeply analytic? • Use inferences to describe relations and conditions you're interested in • Build up arbitrarily complex libraries of inference to extract meaning from data
  • 32. Open-sourcing this year! • About our Code. • Written in C++0x and Haskell, with Python for tools • Entirely unit-test driven and designed for easy adoption • Why Open Source? • We want to give back • We benefit first and most • Competitive advantage would be temporary anyway • Knowing it’s open keeps us on our toes • There’s more to do than we can do ourselves • We attract the kind of people we want to work with
  • 33. Our doors are open • About Tagged • #3 in social networking and growing (100+ Million members) • Located in downtown SF, 10 Ten Places to work by San Francisco Business Journal • Profitable since 2008. We answer only to ourselves and our users
  • 34. Our doors are open • About the Stig Team. • Five full-time engineers with backgrounds in compilers, databases, distributed systems, and AI • Interns year-round with opportunities to publish • And yes, we're hiring!
  • 35. Got ideas? • Contact us! • Sign up for Stig news at: www.stigdb.org • Follow the Tagged Dev Blog at: blog.tagged.com • Jason Lucas Architect of Scalable Infrastructure jlucas@tagged.com
  • 36. Part 3: Lunch Workshop • But wait, there’s more! • Join us as we get our hands messy with food and take a deep dive into the Stig query language and the Stig API! • Lunch 'N Learn 01:15 PM - 02:15 PM