SlideShare a Scribd company logo
1 of 51
Download to read offline
Data, dev-ops, and cloud services


    Building a distributed data-platform


               Charles Care

            Engineering Team
              Kasabi / Talis
Talk overview
●   About me...
●   What Kasabi is,
    ●   what we are trying to do
    ●   how we are working to achieve that
    ●   a quick walk-though
●   Discussion of the Kasabi platform team
    ●   Our technology / architecture
    ●   Our engineering culture
    ●   Lessons learnt
Views are mine...

…and not necessarily those of
my (current/past) employers
About me...
About me...
●   2001-2004 – BSc Computer Science (Warwick)
●   2004-2008 – PhD Computer Science (Warwick)
●   2007-2011 – BT Plc
    ●   Technical risk analyst – BT Global MPLS Network
    ●   Software Engineer – Infrastructure for Financial Markets
    ●   Senior Software Engineer – Central software standards
        and tools
●   2011-Present – Talis/Kasabi
    ●   Software Engineer – Semantic web platform
About Kasabi
About Kasabi
●   Data market place
●   Bringing together data...
    ●   owners
    ●   consumers
●   Lowering the barrier for data-driven apps to
    enter the market
●   Enabling new opportunities for aggregating and
    mixing data
Data licensing today




                 Bespoke, expensive, contracts




Data Owners                                      Data Consumers
Kasabi as a data platform


                                          Data engineers
                       Data enthusiasts


  Data Owners
                                                            Application
                                                            Developers




Third-party services                       API developers
About Kasabi
●   Publish datasets using standard APIs
●   Access data using standard APIs
    ●   Query a dataset using SPARQL
    ●   Search a dataset using a simple full-text search
●   Define, contribute, and share your own APIs
Data marketplace




 http://www.kasabi.com/
A dataset
Access data using standard APIs
Contribute custom APIs
Example – contributed APIs
Current organisation
●   Product development
●   Data engineering
●   Customer operations
●   Platform development
Current organisation
●   Product development
●   Data engineering
●   Customer operations
●   Platform development
Platform architecture
Data Platform
                  Load balancing and routing


    Update services   Search services     Query services

                                                           Datasets
●   Need to store and update datasets
●   Access data via various services
●   Must scale with load and increasing data
●   Must be tolerant to failure
●   Extensible
    ●   Should be easy to add new services over time
To distribute...

...or not to distribute
Distributed Platform
                                    Routing layer

Dynamic Gossip Network
                                                     Update
                                                     service            SPARQL
             Update
             service            Search                                   service
                                service


   Update                                                                          New
   service                                                                       service?

                                          SPARQL
                                           service
                                                                                 Search
              Search                                       SPARQL                service
              service                                       service


Sequence Service        Storage Service                    Monitoring Services
Distributed Platform – updates
                                    Routing layer

Dynamic Gossip Network
                                                     Update
                                                     service            SPARQL
             Update
             service            Search                                   service
                                service


   Update     - Updates are sequenced
              - Data stored in distributed storage                                 New
   service                                                                       service?

                                          SPARQL
                                           service
                                                                                 Search
              Search                                       SPARQL                service
              service                                       service


Sequence Service        Storage Service                    Monitoring Services
Distributed Platform – updates
                                    Routing layer

Dynamic Gossip Network
                                                     Update
                                                     service            SPARQL
             Update
             service            Search                                   service
                                service

                                          - Updates are gossiped around
   Update                                 network
                                                                                   New
   service                                - Here a SPARQL node realises
                                                                                 service?
                                          that it should apply the update

                                          SPARQL
                                           service
                                                                                 Search
              Search                                       SPARQL                service
              service                                       service


Sequence Service        Storage Service                    Monitoring Services
Distributed Platform – query
                                     Routing layer

Dynamic Gossip Network
                                                      Update
                                                      service            SPARQL
              Update
              service            Search                                   service
                                 service
                                                  SPARQL queries
                                                  will now reflect
                                                  the update that
   Update                                                                           New
                                                  was submitted
   service                                                                        service?

                                           SPARQL
                                            service
                                                                                  Search
               Search                                       SPARQL                service
               service                                       service


Sequence Service         Storage Service                    Monitoring Services
Monolithic vs distributed
●   Monolithic
    ●   Easy to synchronise events and data
    ●   Consistent views and queries
    ●   Less inter-process communication / less network overhead
    ●   Easier to optimise for high throughput
    ●   Single code-base
    ●   Fewer processes to monitor
●   Distributed
    ●   Service-oriented - separate concerns run in isolated processes (and can be scaled
        independently)
    ●   Development is component-based
        –   Changes are more focussed / helps avoids scope-creep
    ●   Deployment can be localised to avoid downtime
    ●   Failure is more likely – so you need to plan for it
    ●   Easier to integrate out-of-the box software – e.g. using standard Apache Solr
Distributed data platform
●   Separate services for each API
●   Communication via Gossip messages
●   Have to manage eventual consistency
●   Highly scalable
●   Easy to add new services
●   Use standard protocols and open-source components
    ●   HTTP libraries / REST / ZeroMQ / Apache Thrift
    ●   RDF and SPARQL using Apache Jena
    ●   Search using Apache Solr
    ●   Avoid modification and forks
●   Deploy into Amazon EC2 (also using: S3, EMR, and ELB)
Benefits of using cloud services
Consider a start-up in 2002
●   Have an idea...
●   Get funding (development, op-ex,
    cap-ex)
●   Aquire servers
    ●   Set-up your servers
        –   mail, web, source code repo, build
            systems
        –   development, staging, live
    ●   Some 'cloud' services
        –   …, SourceForge, shared servers, etc
●   Build, and go, to market
    ●   Probably embedding open-source
        components
●   Delivery based on full-stack,
    monolithic, architectures
Consider a start-up in 2012
●   Have an idea...
●   Get funding (development capital, op-ex)
    ●   you will probably not get cap-ex
●   Use cloud services... rent rather than buy
    ●   SaaS – Software as a Service
        –   Why would you run your own (chat/email etc)
        –   Host your code in GitHub/BitBucket etc
    ●   PaaS – Platform as a Service
        –   Do you need to control the full stack?
        –   Could you leverage platforms like: Heroku, Joyant,
            AppEngine etc
        –   Amazon RDS
    ●   IaaS – Infrastructure as a Service
        –   Cloud services to provide 'bare metal'
●   Build and go to market quickly
●   scale elastically over time
But what about the enterprise?
●   Benefits of cloud services are
    already transforming the enterprise
    ●   Private clouds
    ●   Virtual appliances
    ●   Cloud bursting
    ●   Independent scaling
    ●   Separation of concerns
    ●   SOA architecture
●   And in future...
    ●   Appetite for IaaS is growing
    ●   PaaS and SaaS will follow.
    ●   Perimeter security will be replaced by
        localised security boundaries
So how do we build this stuff...?
How it all happens
●   Constantly iterating through...
    ●   Requirements
    ●   Development (Test-driven)
    ●   Testing/Review
    ●   Deployment
    ●   Operation
●   We're an Agile, dev-ops team...
        so all the above is a shared responsibility
Being a dev-ops team...
●   Removing barriers between development and operations
●   Shared responsibilities rather than distrust
●   Everyone has root access
●   Developers are responsible for operating systems they build
●   Everyone is free to make changes
        ...and responsible to manage the roll-out of those changes
●   Ops/Deployment/Monitoring are automated
●   Everyone should have full-stack awareness
●   Read more...
    ●   http://dev2ops.org/blog/2010/2/22/what-is-devops.html
    ●   http://www.jedi.be/blog/
    ●   http://en.wikipedia.org/wiki/Devops
    ●   http://www.slideshare.net/jallspaw/ 10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Life-cycle of a change
Requirements and Planning
●   Identification of requirement
●   Planning
    ●   Break down big changes into smaller tasks
        –   Can the change be deployed in small steps?
        –   Can the change be dark-deployed?
    ●   Understand the wider impact
    ●   Find middle ground between generic and specific
●   Team is self-organising
    ●   People pull work from the prioritised, planned stories
Branch based development
●   One branch per change, squash before merge
Writing the code
●   Work on a branch
     ●    don't know if/when you'll merge
●   Test-driven
     ●    Unit tests first
     ●    Do acceptance tests need to change?
     ●    What technology? Which tool-sets?
●   Smoke testing
     ●    How do you know it works?
     ●    What's different in production?
     ●    What are the risks of failure?
●   Feature flags?
Tests run: 110, Failures: 0, Errors: 0, Skipped: 2

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 39 seconds
[INFO] Finished at: Sat Feb 18 15:20:36 GMT 2012
[INFO] Final Memory: 33M/240M
[INFO] ------------------------------------------------------------------------
Writing the code
●   Avoid unnecessary scope-creep
    ●   “I'll just fix this...”
    ●   “It would be much cleaner if I re-factored this...”
    ●   “It would be neat if I also added this...”
    ●   …however, these observations can be written as new stories
    ●   …and sometimes it's good to fix things before they cause pain
    ●   …if extra changes are really necessary, can they be implemented separately?
    ●   …team should be empowered to fix technical debt
    ●   ...managing scope-creep is a shared responsibility
●   Be prepared to abandon a change if it's taking too long, maybe it needs
    more planning?
●   Should you be pairing?
●   Should you demo your work?
Code review
●   Code review possible with tools for distributed
    teams (e.g. Gerrit or ReviewBoard)
●   If you're not following a strict pairing policy,
    code-review is vital
●   Useful to make others aware of changes
●   Gerrit
    ●   Build agent automatically builds your change and
        runs tests – verify +/- 1
    ●   Invite others to review your code, they can give it a
        score between -2 and +2.
    ●   Can only deploy code once at least one person has
        given a +2
    ●   Work-flow is customisable
●   Self-organising... anyone can review

              $> git commit
              $> git review
Code review (2)
Code review (3)
Merge / Deployment
●   Merge & Deployment
    ●   One-click deployment
    ●   Developer should press the button
    ●   Code is merged into the
        master/release branch
    ●   Build server automatically checks
        out the code and builds, tags, and
        uploads the release to an artefact
        repository
    ●   Package is automatically
        deployed on all servers
        –   Extra orchestration for external-facing
            services to avoid “thundering-herd”
            problems
Managing infrastructure
●   Puppet or Chef
●   Build packages (e.g. DEB or RPM)
●   Centralise configuration management
●   Utilising cloud compute infrastructure
    ●   Amazon EC2
    ●   Amazon S3
    ●   Elastic load balancers
    ●   Elastic Map-Reduce
●   Application monitoring
    ●   Metrics
    ●   Log analysis
    ●   Internal monitoring
    ●   External checks
Lessons learnt

(again, my views!)
Technical lessons learnt
●   Use distributed SOA-based services to reduce tight-
    coupling
●   Monitor everything...
●   Leverage cloud offerings
    ●   wrap them with well-defined interfaces to avoid lock-in
●   Design systems to scale
●   Use open and unmodified components where possible
    ●   Standard components fronting external APIs
    ●   E.g. Jena, Solr, Haproxy, Apache
Practices that have helped us
●   Dev-ops culture
●   Pragmatic approach to agile development
    ●   Task allocation should be 'pull', rather than 'push'
    ●   Teams should be self-organising
    ●   Pairing when working on new problems
●   Test-Driven-Development (TDD)
●   Continuous integration
●   Peer-review of code
●   Continuous deployment
…so, in summary...
Conclusion
●   Isolate your design into components
●   Empower your team to release small changes
    frequently
●   Leverage hosted/cloud offerings
Thanks for listening!
Credits
●   Thanks for the invite to speak
●   Thanks to Kasabi / Talis Systems Ltd

●   Sign up at http://www.kasabi.com




    Graphics from http://www.iconarchive.com/,
    http://www.oxygen-icons.org and http://www.icons-land.com
Questions?

More Related Content

Viewers also liked

Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013David Linthicum
 
The Open Source Messaging Landscape
The Open Source Messaging LandscapeThe Open Source Messaging Landscape
The Open Source Messaging LandscapeRichard Seroter
 
What is Enterprise Architecture?
What is Enterprise Architecture?What is Enterprise Architecture?
What is Enterprise Architecture?Brett Colbert
 
An agile approach to cloud infrastructure
An agile approach to cloud infrastructureAn agile approach to cloud infrastructure
An agile approach to cloud infrastructureRichard Seroter
 
Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013David Linthicum
 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingDavid Linthicum
 
Enterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital TransformationEnterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital TransformationRiaz A. Khan, OpenCA, TOGAF
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azuregjuljo
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconAdrian Cockcroft
 
Implementing Effective Enterprise Architecture
Implementing Effective Enterprise ArchitectureImplementing Effective Enterprise Architecture
Implementing Effective Enterprise ArchitectureLeo Shuster
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud ArchitectureAdrian Cockcroft
 
Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Leo Shuster
 

Viewers also liked (12)

Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013Becomming a cloud governance ninja linthicum interop fall 2013
Becomming a cloud governance ninja linthicum interop fall 2013
 
The Open Source Messaging Landscape
The Open Source Messaging LandscapeThe Open Source Messaging Landscape
The Open Source Messaging Landscape
 
What is Enterprise Architecture?
What is Enterprise Architecture?What is Enterprise Architecture?
What is Enterprise Architecture?
 
An agile approach to cloud infrastructure
An agile approach to cloud infrastructureAn agile approach to cloud infrastructure
An agile approach to cloud infrastructure
 
Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013Geting cloud architecture right the first time linthicum interop fall 2013
Geting cloud architecture right the first time linthicum interop fall 2013
 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud Computing
 
Enterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital TransformationEnterprise Architecture, Project Management & Digital Transformation
Enterprise Architecture, Project Management & Digital Transformation
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at GlueconNetflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
 
Implementing Effective Enterprise Architecture
Implementing Effective Enterprise ArchitectureImplementing Effective Enterprise Architecture
Implementing Effective Enterprise Architecture
 
Netflix Global Cloud Architecture
Netflix Global Cloud ArchitectureNetflix Global Cloud Architecture
Netflix Global Cloud Architecture
 
Introduction to Enterprise Architecture
Introduction to Enterprise Architecture Introduction to Enterprise Architecture
Introduction to Enterprise Architecture
 

Similar to Building a distributed data-platform - A perspective on current trends in computing

Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellDatabricks
 
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLRethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLKai Wähner
 
Openflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationOpenflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationEricsson Labs
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)Ontico
 
Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentOpenStack Foundation
 
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...gogo6
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3Csssw2012
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifiGaneshJoshi47
 
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012OpenCity Community
 
Intro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World CongressIntro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World CongressOPNFV
 
Summit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-OSummit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-OOPNFV
 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Data Con LA
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkMatt Ingenthron
 
Network Service Benchmarking
Network Service BenchmarkingNetwork Service Benchmarking
Network Service BenchmarkingMichelle Holley
 

Similar to Building a distributed data-platform - A perspective on current trends in computing (20)

Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLRethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL
 
Openflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationOpenflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson Collaboration
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)SkySQL Reference Architecture (Kaj Arno)
SkySQL Reference Architecture (Kaj Arno)
 
Deploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production EnvironmentDeploy from OpenStack Trunk into a Production Environment
Deploy from OpenStack Trunk into a Production Environment
 
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
A10 Networks: IPv6 Solutions for Enterprise by Paul Nicholson at gogoNET LIVE...
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3C
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Difference between apache spark and apache nifi
Difference between apache spark and apache nifiDifference between apache spark and apache nifi
Difference between apache spark and apache nifi
 
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
Dell open stack powered cloud solution introduce & crowbar demo cosug-2012
 
Intro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World CongressIntro: OPFNV Mini Summit at 2015 NFV World Congress
Intro: OPFNV Mini Summit at 2015 NFV World Congress
 
Summit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-OSummit 16: Open-O Mini-Summit - OPNFV & Open-O
Summit 16: Open-O Mini-Summit - OPNFV & Open-O
 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
Network Service Benchmarking
Network Service BenchmarkingNetwork Service Benchmarking
Network Service Benchmarking
 

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Building a distributed data-platform - A perspective on current trends in computing

  • 1. Data, dev-ops, and cloud services Building a distributed data-platform Charles Care Engineering Team Kasabi / Talis
  • 2. Talk overview ● About me... ● What Kasabi is, ● what we are trying to do ● how we are working to achieve that ● a quick walk-though ● Discussion of the Kasabi platform team ● Our technology / architecture ● Our engineering culture ● Lessons learnt
  • 3. Views are mine... …and not necessarily those of my (current/past) employers
  • 5. About me... ● 2001-2004 – BSc Computer Science (Warwick) ● 2004-2008 – PhD Computer Science (Warwick) ● 2007-2011 – BT Plc ● Technical risk analyst – BT Global MPLS Network ● Software Engineer – Infrastructure for Financial Markets ● Senior Software Engineer – Central software standards and tools ● 2011-Present – Talis/Kasabi ● Software Engineer – Semantic web platform
  • 7. About Kasabi ● Data market place ● Bringing together data... ● owners ● consumers ● Lowering the barrier for data-driven apps to enter the market ● Enabling new opportunities for aggregating and mixing data
  • 8. Data licensing today Bespoke, expensive, contracts Data Owners Data Consumers
  • 9. Kasabi as a data platform Data engineers Data enthusiasts Data Owners Application Developers Third-party services API developers
  • 10. About Kasabi ● Publish datasets using standard APIs ● Access data using standard APIs ● Query a dataset using SPARQL ● Search a dataset using a simple full-text search ● Define, contribute, and share your own APIs
  • 13. Access data using standard APIs
  • 16. Current organisation ● Product development ● Data engineering ● Customer operations ● Platform development
  • 17. Current organisation ● Product development ● Data engineering ● Customer operations ● Platform development
  • 19. Data Platform Load balancing and routing Update services Search services Query services Datasets ● Need to store and update datasets ● Access data via various services ● Must scale with load and increasing data ● Must be tolerant to failure ● Extensible ● Should be easy to add new services over time
  • 20. To distribute... ...or not to distribute
  • 21. Distributed Platform Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service Update New service service? SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 22. Distributed Platform – updates Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service Update - Updates are sequenced - Data stored in distributed storage New service service? SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 23. Distributed Platform – updates Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service - Updates are gossiped around Update network New service - Here a SPARQL node realises service? that it should apply the update SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 24. Distributed Platform – query Routing layer Dynamic Gossip Network Update service SPARQL Update service Search service service SPARQL queries will now reflect the update that Update New was submitted service service? SPARQL service Search Search SPARQL service service service Sequence Service Storage Service Monitoring Services
  • 25. Monolithic vs distributed ● Monolithic ● Easy to synchronise events and data ● Consistent views and queries ● Less inter-process communication / less network overhead ● Easier to optimise for high throughput ● Single code-base ● Fewer processes to monitor ● Distributed ● Service-oriented - separate concerns run in isolated processes (and can be scaled independently) ● Development is component-based – Changes are more focussed / helps avoids scope-creep ● Deployment can be localised to avoid downtime ● Failure is more likely – so you need to plan for it ● Easier to integrate out-of-the box software – e.g. using standard Apache Solr
  • 26. Distributed data platform ● Separate services for each API ● Communication via Gossip messages ● Have to manage eventual consistency ● Highly scalable ● Easy to add new services ● Use standard protocols and open-source components ● HTTP libraries / REST / ZeroMQ / Apache Thrift ● RDF and SPARQL using Apache Jena ● Search using Apache Solr ● Avoid modification and forks ● Deploy into Amazon EC2 (also using: S3, EMR, and ELB)
  • 27. Benefits of using cloud services
  • 28. Consider a start-up in 2002 ● Have an idea... ● Get funding (development, op-ex, cap-ex) ● Aquire servers ● Set-up your servers – mail, web, source code repo, build systems – development, staging, live ● Some 'cloud' services – …, SourceForge, shared servers, etc ● Build, and go, to market ● Probably embedding open-source components ● Delivery based on full-stack, monolithic, architectures
  • 29. Consider a start-up in 2012 ● Have an idea... ● Get funding (development capital, op-ex) ● you will probably not get cap-ex ● Use cloud services... rent rather than buy ● SaaS – Software as a Service – Why would you run your own (chat/email etc) – Host your code in GitHub/BitBucket etc ● PaaS – Platform as a Service – Do you need to control the full stack? – Could you leverage platforms like: Heroku, Joyant, AppEngine etc – Amazon RDS ● IaaS – Infrastructure as a Service – Cloud services to provide 'bare metal' ● Build and go to market quickly ● scale elastically over time
  • 30. But what about the enterprise? ● Benefits of cloud services are already transforming the enterprise ● Private clouds ● Virtual appliances ● Cloud bursting ● Independent scaling ● Separation of concerns ● SOA architecture ● And in future... ● Appetite for IaaS is growing ● PaaS and SaaS will follow. ● Perimeter security will be replaced by localised security boundaries
  • 31. So how do we build this stuff...?
  • 32. How it all happens ● Constantly iterating through... ● Requirements ● Development (Test-driven) ● Testing/Review ● Deployment ● Operation ● We're an Agile, dev-ops team... so all the above is a shared responsibility
  • 33. Being a dev-ops team... ● Removing barriers between development and operations ● Shared responsibilities rather than distrust ● Everyone has root access ● Developers are responsible for operating systems they build ● Everyone is free to make changes ...and responsible to manage the roll-out of those changes ● Ops/Deployment/Monitoring are automated ● Everyone should have full-stack awareness ● Read more... ● http://dev2ops.org/blog/2010/2/22/what-is-devops.html ● http://www.jedi.be/blog/ ● http://en.wikipedia.org/wiki/Devops ● http://www.slideshare.net/jallspaw/ 10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 34. Life-cycle of a change
  • 35. Requirements and Planning ● Identification of requirement ● Planning ● Break down big changes into smaller tasks – Can the change be deployed in small steps? – Can the change be dark-deployed? ● Understand the wider impact ● Find middle ground between generic and specific ● Team is self-organising ● People pull work from the prioritised, planned stories
  • 36. Branch based development ● One branch per change, squash before merge
  • 37. Writing the code ● Work on a branch ● don't know if/when you'll merge ● Test-driven ● Unit tests first ● Do acceptance tests need to change? ● What technology? Which tool-sets? ● Smoke testing ● How do you know it works? ● What's different in production? ● What are the risks of failure? ● Feature flags? Tests run: 110, Failures: 0, Errors: 0, Skipped: 2 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 39 seconds [INFO] Finished at: Sat Feb 18 15:20:36 GMT 2012 [INFO] Final Memory: 33M/240M [INFO] ------------------------------------------------------------------------
  • 38. Writing the code ● Avoid unnecessary scope-creep ● “I'll just fix this...” ● “It would be much cleaner if I re-factored this...” ● “It would be neat if I also added this...” ● …however, these observations can be written as new stories ● …and sometimes it's good to fix things before they cause pain ● …if extra changes are really necessary, can they be implemented separately? ● …team should be empowered to fix technical debt ● ...managing scope-creep is a shared responsibility ● Be prepared to abandon a change if it's taking too long, maybe it needs more planning? ● Should you be pairing? ● Should you demo your work?
  • 39. Code review ● Code review possible with tools for distributed teams (e.g. Gerrit or ReviewBoard) ● If you're not following a strict pairing policy, code-review is vital ● Useful to make others aware of changes ● Gerrit ● Build agent automatically builds your change and runs tests – verify +/- 1 ● Invite others to review your code, they can give it a score between -2 and +2. ● Can only deploy code once at least one person has given a +2 ● Work-flow is customisable ● Self-organising... anyone can review $> git commit $> git review
  • 42. Merge / Deployment ● Merge & Deployment ● One-click deployment ● Developer should press the button ● Code is merged into the master/release branch ● Build server automatically checks out the code and builds, tags, and uploads the release to an artefact repository ● Package is automatically deployed on all servers – Extra orchestration for external-facing services to avoid “thundering-herd” problems
  • 43. Managing infrastructure ● Puppet or Chef ● Build packages (e.g. DEB or RPM) ● Centralise configuration management ● Utilising cloud compute infrastructure ● Amazon EC2 ● Amazon S3 ● Elastic load balancers ● Elastic Map-Reduce ● Application monitoring ● Metrics ● Log analysis ● Internal monitoring ● External checks
  • 45. Technical lessons learnt ● Use distributed SOA-based services to reduce tight- coupling ● Monitor everything... ● Leverage cloud offerings ● wrap them with well-defined interfaces to avoid lock-in ● Design systems to scale ● Use open and unmodified components where possible ● Standard components fronting external APIs ● E.g. Jena, Solr, Haproxy, Apache
  • 46. Practices that have helped us ● Dev-ops culture ● Pragmatic approach to agile development ● Task allocation should be 'pull', rather than 'push' ● Teams should be self-organising ● Pairing when working on new problems ● Test-Driven-Development (TDD) ● Continuous integration ● Peer-review of code ● Continuous deployment
  • 48. Conclusion ● Isolate your design into components ● Empower your team to release small changes frequently ● Leverage hosted/cloud offerings
  • 50. Credits ● Thanks for the invite to speak ● Thanks to Kasabi / Talis Systems Ltd ● Sign up at http://www.kasabi.com Graphics from http://www.iconarchive.com/, http://www.oxygen-icons.org and http://www.icons-land.com