SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Pictures at an Exhibition
                                    Ruby, Rails, NoSQL and Big Data
                                                        John Repko

John Repko -- Pikasoft LLC
Agenda

   The Goal: Exploring Big Data with NoSQL and Ruby on Rails
   Just Two Solutions – Here’s How We Get There
   •      Key-Value Data Stores
           –    Redis
           –    Riak


   •      Document Data Stores
           –    MongoDB
           –    Cassandra


   •      Graph Data Stores
           –    Neo4J


   •      MapReduce
           –    Through Hadoop
           –    Through Riak / MongoDB
           –    Through Elastic Mapreduce




John Repko -- Pikasoft LLC                                     2
So How Did We Get to Big Data Anyway?




  Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg   Source: http://www.startribune.com/sports/164830346.html




                      Big Data Is Not Just About “Big” Data … It’s About FAST Data!
                                       (http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)



John Repko -- Pikasoft LLC                                                                                                                3
Why is Everyone Diving into Big Data?


       There Are Big Data Breakthroughs Everywhere…




                                                                    Google Wins                Progressive’s
                                                                     the Search                   Instant
                                                                       Market                 “Overnight” rate
                                                                                                  quotes
     Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/
     kayjay_1_blog_main_horizontal.jpg
                                                                                               Progressive creates an
                                                                      Massively parallel
                                                                                                 insurance quote for
                                                                      web searches with
     “Watson” Wins on Jeopardy                                      results back in a tenth
                                                                                               every car and truck in
                                                                                                the US – every night
         Beat the best Jeopardy players of all time                       of a second



John Repko -- Pikasoft LLC                                                                                              4
Exploring Big Data


           Big Data frequently provides solutions to a common set of problems




                     Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616




                These appear to be “10 Problems” but are really only “2 Problems”
John Repko -- Pikasoft LLC                                                                                                 5
Exploring Big Data


   The variety of Big Data wins in the press fall into just two solution patterns


    • Foresight
            – We are presented a pattern – What has the outcome
              been when we’ve seen similar patterns in the past?


    • Hindsight
            – We are presented an outcome -- What pattern of events
              anticipated the outcome in the past?




    You Don’t Need Dozens Of Solution Approaches For Big Data –       Just Two
John Repko -- Pikasoft LLC                                                          6
Exploring Big Data

        In this light, let’s take a look at the “10 Hadoop-able Problems” of Big Data

                        Summary – 10 Common Hadoop-able Problems*

               1. Modeling True Risk
                    •        What past patterns led to success or default?

               1. Customer Churn Analysis
                    •        What do customer churn patterns predict about our products and markets?

               1. Recommendation Engine
                    •        We have search terms – what have the results been from similar searches in the past?


               1. Ad Targeting
                    •        We have profile information – what offers have led to sales for similar profiles in the past?


               1. PoS Transaction Analysis
                    •        We have your purchase history – what deals might we offer in the future?


                                          Foresight                                   Hindsight
John Repko -- Pikasoft LLC                                                                                                   7
Exploring Big Data

       These two solution types apply generally to the Hadoop-able problems

                         Summary – 10 Common Hadoop-able Problems

                6. Analyzing Data Logs to Forecast Events
                     •       We have your logs – what pattern of events have anticipated failures before?

                6. Threat Analysis
                     •       We have a specific event – what results have we seen from similar threats in the past?

                6. Trade Surveillance
                     •       Does this parcel raise any alarms, based on our history of past parcel-tracking?

                6. Search Quality
                     •       We have a set of search terms – what have similar searches succeeded in finding in the
                             past?

                6. Data “Sandbox”
                     •       We have your data, possibly unstructured data. What patterns in that data might we
                             bring to your attention now?


                                         Foresight                                 Hindsight
John Repko -- Pikasoft LLC                                                                                            8
The Big Data Platform Provides with Rich Analytics Tools

                             Key Big Data Analytics Solution Patterns



     1.    Predictive Modeling                         5.   Outlier Analysis




     2.    Data Visualization                          6.   AB Testing




                                                       7.   Markov Chains
     3.    Cluster Partitioning




                                                       8.   Bloom Filters
     4.    Collaborative Filtering




John Repko -- Pikasoft LLC                                                     9
Exploring Big Data




                    With Just Two Standard Solution Models We Can
                             Solve Most Big Data Problems

                       The Key Is To Shape Big Data Into A Standard
                         Platform Onto Which We Can Apply These
                                     Analytics Tools…


                                              “It is not the technology that creates a competitive edge, but the
                                              management process that exploits technology."
                                              ~ Shaping the Future- Peter Keen (1991)




John Repko -- Pikasoft LLC                                                                                         10
Agenda

   The Goal: Exploring Big Data
   Just Two Solutions – Here’s How We Get There
   •      Key-Value Data Stores
           –    Redis
           –    Riak


   •      Document Data Stores
           –    MongoDB
           –    Cassandra


   •      Graph Data Stores
           –    Neo4J


   •      MapReduce
           –    Through Hadoop
           –    Through Riak / MongoDB
           –    Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        11
The Core Development Platform


     •      Clean install of 12.04 and all latest
            updates

     •      sudo apt-get update
     •      sudo apt-get upgrade                                           Core Platform: Ubuntu 12.04 + AWS
     •      sudo apt-get dist-upgrade

     •      sudo apt-get install build-essential openssl
            libreadline6 libreadline6-dev curl git-core
            zlib1g zlib1g-dev libyaml-dev libsqlite3-0
            libsqlite3-dev sqlite3 libxml2-dev libxslt-dev
            autoconf libc6-dev ncurses-dev automake
            libtool bison subversion

     •      sudo apt-get install libcurl3 libcurl3-gnutls
            libcurl4-openssl-dev

     •      bash -s stable < <(curl -shttps
            ://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer
            )

     •      source ~/.bashrc

     •      gem update --system (Latest version currently
            installed)

     •      rvm ruby-1.9.2-p290@rails31 --create --default

     •      sudo apt-get install nodejs

     •      gem install rake

     •      gem install rails -v=3.1.3


John Repko -- Pikasoft LLC                                                                                     12
Agenda

   The Goal: Exploring Big Data

   Just Two Solutions – Here’s How We Get There

   •      Key-Value Data Stores
           –    Redis
           –    Riak

   •      Document Data Stores
           –    MongoDB
           –    Cassandra

   •      Graph Data Stores
           –    Neo4J


   •      MapReduce
           –    Through Hadoop
           –    Through Riak
           –    Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        13
Redis
                                                                                                             Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

    •      Example:
            –    http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key
                 -value-example-for-the-holidays.html

    •      Backing Articles:
            –    http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/

    •      Code:
            –    http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-value-example-for-the-holidays.html


        The good news is, we've already got our base image, and adding a new Redis data store and
        example app to it only took about an hour. As before, you can play with the URL-shortener at Redis
        URL Shortener, and you can download and play with the code for the application at:Redis URL
        Shortener Source Code.


                             Play with this online at:
                       http://jkr-blog.dyndns.org:3001/mini_urls




John Repko -- Pikasoft LLC                                                                                                                                                  14
Riak
                                                                                Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

    •      Example:
            –    http://www.pikasoft.com/journal/2012/1/15/you
                 -only-live-twice-basho-and-riak.html

    •      Backing Articles:
            –    http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and-
                 ripple.html
            –    http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in
                 /

    •      Code:




John Repko -- Pikasoft LLC                                                                                                                      15
Agenda

   The Goal: Exploring Big Data
   Just Two Solutions – Here’s How We Get There
   •      Key-Value Data Stores
           –    Redis
           –    Riak


   •      Document Data Stores
           –    MongoDB
           –    Cassandra


   •      Graph Data Stores
           –    Neo4J


   •      MapReduce
           –    Through Hadoop
           –    Through Riak / MongoDB
           –    Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        16
MongoDB
                                                                                                                 Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-
                                                                                                                 redis
    •      Example:
            –    http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first-
                 application.html

    •      Backing Articles:
            –    http://www.mongodb.org/display/DOCS/Building+for+
                 Linux

    •      Code:
            –    http://www.pikasoft.com/journal/2010/8/16/why-our-little-
                 nosql-app-matters.html
   So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code,
   we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud:

   •Created a cloud account
   •Got our first app created, and saw it in a browser on the web
   •Loaded up real development environments (Ruby/Rails we added, Java we got for free)
   •Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything)
   •Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL
   •Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address:
   Rails Mongo Notes Example

   Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the
   cloud, and if you like you can take a look at the code for it here: Rails Mongo Code.



                                     Play with this online at:
                                 http://jkr-code.dyndns.org:3000/notes

John Repko -- Pikasoft LLC                                                                                                                                                       17
Cassandra
                                                                                                         Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
    •      Example:
            –    http://www.pikasoft.com/journal/2011/2/14/casi-casi-
                 cassandra.html
    •      Backing Articles:
            –    http://www.25hoursaday.com/weblog/2008/05/23/
                 SomeThoughtsOnTwittersAvailabilityProblems.aspx
    •      Code:

     Here's what the code for that broadcast might look like:

     # Tweeter class Tweeter < ActiveRecord::Base has_many :followers end -
     class Follower < ActiveRecord::Base belongs_to :tweeter end

     All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what
     I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the
     messages from the other tweeters):

     @tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter|
     @followers = tweeter.find(:all) @followers.each do |follower|
     tweeter.broadcast_to :recipient => follower end end end

     So here we're going to do a query for each of the X tweeters, and for them we'll do another query for
     each of their Y followers.

     Code smell! Fail Whale!!!




John Repko -- Pikasoft LLC                                                                                                                                              18
Agenda

   Exploring Big Data

   Just Two Solutions – Here’s How We Get There

   •      Key-Value Data Stores
           –    Redis
           –    Riak


   •      Document Data Stores
           –    MongoDB
           –    Cassandra


   •      Graph Data Stores
           –    Neo4J


   •      MapReduce
           –    Through Hadoop
           –    Through Riak / MongoDB
           –    Through Elastic Mapreduce



John Repko -- Pikasoft LLC                        19
Neo4J
                                                                            Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

    •      Example:
            –    http://www.pikasoft.com/journal/2011/1/21/graph-databases-and-star-
                 wars.html
    •      Backing Articles:
            –    http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/
    •      Code


                             Play with this online at:
        Six Degrees of Kevin Bacon =     http://jkr-blog.dyndns.org:9292/




John Repko -- Pikasoft LLC                                                                                                                 20
Agenda

   Exploring Big Data

   Just Two Solutions – Here’s How We Get There

   •      Key-Value Data Stores
           –    Redis
           –    Riak

   •      Document Data Stores
           –    MongoDB
           –    Cassandra

   •      Graph Data Stores
           –    Neo4J


   •      MapReduce
           –    Through Hadoop
           –    Through Riak
           –    Through Elastic Mapreduce




John Repko -- Pikasoft LLC                        21
MapReduce via Hadoop, Thrift and AWS

    •      Example:                                                                   Reduce
            –    http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and-
                 cloudera.html

    •      Backing Articles:
            –    http://www.joelonsoftware.com/items/2006/08/01.
                 html

    •      Code:
                                                       Map




John Repko -- Pikasoft LLC                                                                     22
MapReduce via Riak / MongoDB

  •     Example:
         –     http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filters-using-riak-mapreduce
               /
         –     http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out
  •     Backing Articles:
         – MapReduce on Riak
                  •   http://wiki.basho.com/MapReduce.html
                  •   http://stackoverflow.com/questions/2123004/mapreduce-with-
                      riak
                  •   http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its-
                      mapreduce.php
                  •   http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-store
                      Riak
         – MapReduce on MongoDB
                  •   http://dllhell.net/2010/07/17/on-mapreduce-in-mongodb
                      /
                  •   http://www.mongodb.org/display/DOCS/
                      MapReduce
                  •   http://jonathanhui.com/mongodb-mapreduce
                  •   http://blog.boxedice.com/2010/06/21/map-
                      reduce-and-mongodb/
                                                                                Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/




John Repko -- Pikasoft LLC                                                                                                                            23
Elastic MapReduce

    •      Example:
            –    http://www.commoncrawl.org/mapreduce-for-the-masses/
    •      Backing Articles:
            –    http://www.commoncrawl.org/mapreduce-for-the-masses/
    •      Code:




John Repko -- Pikasoft LLC                                              24
Summary




                    This Is Only The Beginning. With A
                Standard Platform We’ll See Richer Big Data
                       Discoveries Become Routine

                             The Solution Tools (Slide 9) Become
                             Straightforward if We Run Them on a
                                    Standard Architecture
                                                        “One man’s noise is another man’s data.”
                                                        ~ Bill Stensrud - InstantEncore




John Repko -- Pikasoft LLC                                                                         25
Contacts



      •     John Repko:               john.repko@pikasoft.com




                    http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx




John Repko -- Pikasoft LLC                                                            26

Más contenido relacionado

Was ist angesagt?

From the Big Bang to Ecommerce, a journey in making sense of Big Data
From the Big Bang to Ecommerce, a journey in making sense of Big DataFrom the Big Bang to Ecommerce, a journey in making sense of Big Data
From the Big Bang to Ecommerce, a journey in making sense of Big DataPatrick Deglon
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Gregory Piatetsky-Shapiro
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
A Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterpriseA Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterprisePatrick Deglon
 
Knowing How People Are Playing Your Game Gives You the Winning Hand
Knowing How People Are Playing Your Game Gives You the Winning HandKnowing How People Are Playing Your Game Gives You the Winning Hand
Knowing How People Are Playing Your Game Gives You the Winning HandWilliam Grosso
 
From the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big DataFrom the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big DataPatrick Deglon
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...BigMine
 
Wavelength USA 2012 Making The Web Work For You
Wavelength USA 2012 Making The Web Work For YouWavelength USA 2012 Making The Web Work For You
Wavelength USA 2012 Making The Web Work For YouWavelength
 

Was ist angesagt? (10)

From the Big Bang to Ecommerce, a journey in making sense of Big Data
From the Big Bang to Ecommerce, a journey in making sense of Big DataFrom the Big Bang to Ecommerce, a journey in making sense of Big Data
From the Big Bang to Ecommerce, a journey in making sense of Big Data
 
Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?Public Data and Data Mining Competitions - What are Lessons?
Public Data and Data Mining Competitions - What are Lessons?
 
Analytics Education in the era of Big Data
Analytics Education in the era of Big DataAnalytics Education in the era of Big Data
Analytics Education in the era of Big Data
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
A Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterpriseA Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the Enterprise
 
Knowing How People Are Playing Your Game Gives You the Winning Hand
Knowing How People Are Playing Your Game Gives You the Winning HandKnowing How People Are Playing Your Game Gives You the Winning Hand
Knowing How People Are Playing Your Game Gives You the Winning Hand
 
13 jun13 gaming-webinar
13 jun13 gaming-webinar13 jun13 gaming-webinar
13 jun13 gaming-webinar
 
From the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big DataFrom the Big Bang to the New Economy, a journey in making sense of Big Data
From the Big Bang to the New Economy, a journey in making sense of Big Data
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
Wavelength USA 2012 Making The Web Work For You
Wavelength USA 2012 Making The Web Work For YouWavelength USA 2012 Making The Web Work For You
Wavelength USA 2012 Making The Web Work For You
 

Ähnlich wie Ruby, rails, no sql and big data

Using big data_to_your_advantage
Using big data_to_your_advantageUsing big data_to_your_advantage
Using big data_to_your_advantageJohn Repko
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
 
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationSmart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationDATAVERSITY
 
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterImplementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterDATAVERSITY
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Big data and bi best practices slidedeck
Big data and bi best practices slidedeckBig data and bi best practices slidedeck
Big data and bi best practices slidedeckActian Corporation
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best PracticesYellowfin
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
 
Internet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, HiteInternet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, HiteGovLoop
 
Big data overview external
Big data overview externalBig data overview external
Big data overview externalBrett Colbert
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Think Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial ServicesThink Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial ServicesAmazon Web Services
 
BI Past Present and Future - 2016 Persepective
BI Past Present and Future - 2016 PersepectiveBI Past Present and Future - 2016 Persepective
BI Past Present and Future - 2016 PersepectiveGary Nuttall MBCS CITP
 
Hadoop is Happening
Hadoop is HappeningHadoop is Happening
Hadoop is HappeningPrecisely
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven productionMark Reynolds
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseJake Bouma
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 

Ähnlich wie Ruby, rails, no sql and big data (20)

Using big data_to_your_advantage
Using big data_to_your_advantageUsing big data_to_your_advantage
Using big data_to_your_advantage
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationSmart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
 
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterImplementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Big data and bi best practices slidedeck
Big data and bi best practices slidedeckBig data and bi best practices slidedeck
Big data and bi best practices slidedeck
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
Internet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, HiteInternet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, Hite
 
Big data overview external
Big data overview externalBig data overview external
Big data overview external
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Think Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial ServicesThink Big Analytics AWS for Financial Services
Think Big Analytics AWS for Financial Services
 
BI Past Present and Future - 2016 Persepective
BI Past Present and Future - 2016 PersepectiveBI Past Present and Future - 2016 Persepective
BI Past Present and Future - 2016 Persepective
 
Hadoop is Happening
Hadoop is HappeningHadoop is Happening
Hadoop is Happening
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven production
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 

Ruby, rails, no sql and big data

  • 1. Pictures at an Exhibition Ruby, Rails, NoSQL and Big Data John Repko John Repko -- Pikasoft LLC
  • 2. Agenda The Goal: Exploring Big Data with NoSQL and Ruby on Rails Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic Mapreduce John Repko -- Pikasoft LLC 2
  • 3. So How Did We Get to Big Data Anyway? Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg Source: http://www.startribune.com/sports/164830346.html Big Data Is Not Just About “Big” Data … It’s About FAST Data! (http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html) John Repko -- Pikasoft LLC 3
  • 4. Why is Everyone Diving into Big Data? There Are Big Data Breakthroughs Everywhere… Google Wins Progressive’s the Search Instant Market “Overnight” rate quotes Source: https://newshour.s3.amazonaws.com/photos/2011/02/16/ kayjay_1_blog_main_horizontal.jpg Progressive creates an Massively parallel insurance quote for web searches with “Watson” Wins on Jeopardy results back in a tenth every car and truck in the US – every night Beat the best Jeopardy players of all time of a second John Repko -- Pikasoft LLC 4
  • 5. Exploring Big Data Big Data frequently provides solutions to a common set of problems Source: http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616 These appear to be “10 Problems” but are really only “2 Problems” John Repko -- Pikasoft LLC 5
  • 6. Exploring Big Data The variety of Big Data wins in the press fall into just two solution patterns • Foresight – We are presented a pattern – What has the outcome been when we’ve seen similar patterns in the past? • Hindsight – We are presented an outcome -- What pattern of events anticipated the outcome in the past? You Don’t Need Dozens Of Solution Approaches For Big Data – Just Two John Repko -- Pikasoft LLC 6
  • 7. Exploring Big Data In this light, let’s take a look at the “10 Hadoop-able Problems” of Big Data Summary – 10 Common Hadoop-able Problems* 1. Modeling True Risk • What past patterns led to success or default? 1. Customer Churn Analysis • What do customer churn patterns predict about our products and markets? 1. Recommendation Engine • We have search terms – what have the results been from similar searches in the past? 1. Ad Targeting • We have profile information – what offers have led to sales for similar profiles in the past? 1. PoS Transaction Analysis • We have your purchase history – what deals might we offer in the future? Foresight Hindsight John Repko -- Pikasoft LLC 7
  • 8. Exploring Big Data These two solution types apply generally to the Hadoop-able problems Summary – 10 Common Hadoop-able Problems 6. Analyzing Data Logs to Forecast Events • We have your logs – what pattern of events have anticipated failures before? 6. Threat Analysis • We have a specific event – what results have we seen from similar threats in the past? 6. Trade Surveillance • Does this parcel raise any alarms, based on our history of past parcel-tracking? 6. Search Quality • We have a set of search terms – what have similar searches succeeded in finding in the past? 6. Data “Sandbox” • We have your data, possibly unstructured data. What patterns in that data might we bring to your attention now? Foresight Hindsight John Repko -- Pikasoft LLC 8
  • 9. The Big Data Platform Provides with Rich Analytics Tools Key Big Data Analytics Solution Patterns 1. Predictive Modeling 5. Outlier Analysis 2. Data Visualization 6. AB Testing 7. Markov Chains 3. Cluster Partitioning 8. Bloom Filters 4. Collaborative Filtering John Repko -- Pikasoft LLC 9
  • 10. Exploring Big Data With Just Two Standard Solution Models We Can Solve Most Big Data Problems The Key Is To Shape Big Data Into A Standard Platform Onto Which We Can Apply These Analytics Tools… “It is not the technology that creates a competitive edge, but the management process that exploits technology." ~ Shaping the Future- Peter Keen (1991) John Repko -- Pikasoft LLC 10
  • 11. Agenda The Goal: Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic Mapreduce John Repko -- Pikasoft LLC 11
  • 12. The Core Development Platform • Clean install of 12.04 and all latest updates • sudo apt-get update • sudo apt-get upgrade Core Platform: Ubuntu 12.04 + AWS • sudo apt-get dist-upgrade • sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g zlib1g-dev libyaml-dev libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison subversion • sudo apt-get install libcurl3 libcurl3-gnutls libcurl4-openssl-dev • bash -s stable < <(curl -shttps ://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer ) • source ~/.bashrc • gem update --system (Latest version currently installed) • rvm ruby-1.9.2-p290@rails31 --create --default • sudo apt-get install nodejs • gem install rake • gem install rails -v=3.1.3 John Repko -- Pikasoft LLC 12
  • 13. Agenda The Goal: Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak – Through Elastic Mapreduce John Repko -- Pikasoft LLC 13
  • 14. Redis Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key -value-example-for-the-holidays.html • Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/ • Code: – http://www.pikasoft.com/journal/2011/1/2/a-quick-redis-key-value-example-for-the-holidays.html The good news is, we've already got our base image, and adding a new Redis data store and example app to it only took about an hour. As before, you can play with the URL-shortener at Redis URL Shortener, and you can download and play with the code for the application at:Redis URL Shortener Source Code. Play with this online at: http://jkr-blog.dyndns.org:3001/mini_urls John Repko -- Pikasoft LLC 14
  • 15. Riak Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2012/1/15/you -only-live-twice-basho-and-riak.html • Backing Articles: – http://jit.nuance9.com/2010/07/ruby-192-rails-3-riak-and- ripple.html – http://jbbarth.com/archives/2011/4/23/basic_usage_of_riak_in / • Code: John Repko -- Pikasoft LLC 15
  • 16. Agenda The Goal: Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic Mapreduce John Repko -- Pikasoft LLC 16
  • 17. MongoDB Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs- redis • Example: – http://www.pikasoft.com/journal/2010/7/31/nosql-on-the-cloud-our-first- application.html • Backing Articles: – http://www.mongodb.org/display/DOCS/Building+for+ Linux • Code: – http://www.pikasoft.com/journal/2010/8/16/why-our-little- nosql-app-matters.html So let's sum up -- after a handful of posts and a small but still sorrowful amount of command-line and rails code, we've managed to accomplish the following "Hello World" tasks in NoSQL on the cloud: •Created a cloud account •Got our first app created, and saw it in a browser on the web •Loaded up real development environments (Ruby/Rails we added, Java we got for free) •Added a stronger app server (thin >> webrick) and a stronger web server (nginx >> almost anything) •Added our first NoSQL data store (MongoDB) and mapping software to simulate ActiveRecord in NoSQL •Created a little NoSQL app to show all this, and made it visible though a dynamic DNS address: Rails Mongo Notes Example Just to wrap the little app up: I updated John Nunemaker's Mongomapper demo app to work with Rails3 and the cloud, and if you like you can take a look at the code for it here: Rails Mongo Code. Play with this online at: http://jkr-code.dyndns.org:3000/notes John Repko -- Pikasoft LLC 17
  • 18. Cassandra Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2011/2/14/casi-casi- cassandra.html • Backing Articles: – http://www.25hoursaday.com/weblog/2008/05/23/ SomeThoughtsOnTwittersAvailabilityProblems.aspx • Code: Here's what the code for that broadcast might look like: # Tweeter class Tweeter < ActiveRecord::Base has_many :followers end - class Follower < ActiveRecord::Base belongs_to :tweeter end All fine so far -- that's the twittery world we all live in. I can send out my breathless message of what I had for breakfast, and then Twitter picks it up and broadcasts the message from me (and all the messages from the other tweeters): @tweeters = Tweeter.find(:all_tweeters) @tweeters.each do |tweeter| @followers = tweeter.find(:all) @followers.each do |follower| tweeter.broadcast_to :recipient => follower end end end So here we're going to do a query for each of the X tweeters, and for them we'll do another query for each of their Y followers. Code smell! Fail Whale!!! John Repko -- Pikasoft LLC 18
  • 19. Agenda Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak / MongoDB – Through Elastic Mapreduce John Repko -- Pikasoft LLC 19
  • 20. Neo4J Source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis • Example: – http://www.pikasoft.com/journal/2011/1/21/graph-databases-and-star- wars.html • Backing Articles: – http://purevirtual.de/2010/04/url-shortener-with-redis-and-rails3/ • Code Play with this online at: Six Degrees of Kevin Bacon = http://jkr-blog.dyndns.org:9292/ John Repko -- Pikasoft LLC 20
  • 21. Agenda Exploring Big Data Just Two Solutions – Here’s How We Get There • Key-Value Data Stores – Redis – Riak • Document Data Stores – MongoDB – Cassandra • Graph Data Stores – Neo4J • MapReduce – Through Hadoop – Through Riak – Through Elastic Mapreduce John Repko -- Pikasoft LLC 21
  • 22. MapReduce via Hadoop, Thrift and AWS • Example: Reduce – http://www.pikasoft.com/journal/2011/1/9/nosql-next-up-hadoop-and- cloudera.html • Backing Articles: – http://www.joelonsoftware.com/items/2006/08/01. html • Code: Map John Repko -- Pikasoft LLC 22
  • 23. MapReduce via Riak / MongoDB • Example: – http://www.control-alt-del.org/2011/09/14/fun-with-bloom-filters-using-riak-mapreduce / – http://verboselogging.com/2010/03/22/super-mongodb-mapreduce-max-out • Backing Articles: – MapReduce on Riak • http://wiki.basho.com/MapReduce.html • http://stackoverflow.com/questions/2123004/mapreduce-with- riak • http://www.readwriteweb.com/hack/2011/06/riak-pipe-rethinks-its- mapreduce.php • http://www.quora.com/What-are-the-advantages-and-limitations-of-MapReduce-backed-by-distributed-key-value-store Riak – MapReduce on MongoDB • http://dllhell.net/2010/07/17/on-mapreduce-in-mongodb / • http://www.mongodb.org/display/DOCS/ MapReduce • http://jonathanhui.com/mongodb-mapreduce • http://blog.boxedice.com/2010/06/21/map- reduce-and-mongodb/ Source: http://blog.boxedice.com/2010/06/21/map-reduce-and-mongodb/ John Repko -- Pikasoft LLC 23
  • 24. Elastic MapReduce • Example: – http://www.commoncrawl.org/mapreduce-for-the-masses/ • Backing Articles: – http://www.commoncrawl.org/mapreduce-for-the-masses/ • Code: John Repko -- Pikasoft LLC 24
  • 25. Summary This Is Only The Beginning. With A Standard Platform We’ll See Richer Big Data Discoveries Become Routine The Solution Tools (Slide 9) Become Straightforward if We Run Them on a Standard Architecture “One man’s noise is another man’s data.” ~ Bill Stensrud - InstantEncore John Repko -- Pikasoft LLC 25
  • 26. Contacts • John Repko: john.repko@pikasoft.com http://pikasoft.s3.amazonaws.com/Pictures_at_an_Exhibition.pptx John Repko -- Pikasoft LLC 26