SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Ruby for the soul of
  BigData Nerds
Who Am I?
●
    Engineering Team Lead
    Analytics & Data Platforms @ Viki.com




●
    Founder of http://BigData.SG




●
    Contributor to fluentd, pfeed, cartographer, watir
BigData & Its Challenges
"big data" is when the size of the data itself becomes part of the problem
                                                        - Mike Loukides



●
  Twitter produces over 230 million tweets per day
●
  Wal-Mart is logging one million transactions per hour
●
  Facebook creates over 30 billion pieces of content
ranging from web links, news, blogs, photo
Everyone has a big data problem
Evolving Trends

       Batch Processing
        Hadoop , HPCC, Google BigQuery



      Stream Processing
         STORM (Twitter) & S4 (Yahoo)
Common Engineering Challenges

●
    Data Collection
●
    Filtering / Segmentation
●
    Data Storage
●
    Analysis
●
    Visualization
●
    Prediction / Extrapolation
Data Collection + Filtering /
Segmentation




           http://fluentd.org/
Data Collection + Filtering /
Segmentation
                You send events as:
                Http://domain:8080/namespace?key1=value1&key2=value2



                Fluent forwards the data as:
                <timestamp> <namespace> {key1:value1,key2:value2}




           http://fluentd.org/
Screencast:
http://www.bigdata.sg/videos/fluentd/
Storage

          Hadoop HDFS

           OpenTSDB
           (http://opentsdb.net)



          SciDB (DMAS)
Analysis



   Hadoop Streaming (Ruby)

  Hadoop Hive (Using rbhive)
Visualization
          Custom Dashboard
                   (Rails + Google Charts / d3.js)




   Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
Stream Computing
What is STORM?
STORM terminology
●
 Streams
●
 Spouts
●
 Bolts
●
 Topologies
RedStorm
        (https://github.com/colinsurprenant/redstorm)


$ rvm use jruby-1.6.3
$ bundle install redstorm
$ bundle exec redstorm install
Visualizing average bandwidth
experienced by users while
watching videos on viki.com across
the globe.
Thank you!




    Let's stay in touch :)
●
    Signup for my newsletter at http://parolkar.com
●
    Visit BigData.SG Meetup in Singapore.

Weitere ähnliche Inhalte

Was ist angesagt?

Using mruby in the nosql database Avocadodb
Using mruby in the nosql database AvocadodbUsing mruby in the nosql database Avocadodb
Using mruby in the nosql database Avocadodbavocadodb
 
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)EUDAT
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Dan Harvey
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)FIWARE
 
Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Bernhard Krabina
 
Proteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift PitfallsProteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift Pitfallsproteon-openshift-services
 
Chaos engineering for start ups and sm es
Chaos engineering for start ups and sm esChaos engineering for start ups and sm es
Chaos engineering for start ups and sm esJagdeep Singh
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE
 
Hungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsHungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsFerenc Szalai
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabMayaData Inc
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaLinuxmalaysia Malaysia
 
Expert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive MetastoreExpert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive MetastorelakeFS
 
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Codemotion
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)FIWARE
 
Accelerating Spark with Kubernetes
Accelerating Spark with KubernetesAccelerating Spark with Kubernetes
Accelerating Spark with KubernetesAlluxio, Inc.
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 

Was ist angesagt? (20)

Using mruby in the nosql database Avocadodb
Using mruby in the nosql database AvocadodbUsing mruby in the nosql database Avocadodb
Using mruby in the nosql database Avocadodb
 
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
SeaDataCloud - Developing the SeaDataCloud Virtual Research Environment (VRE)
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 2)
 
Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki Wikidata and Semantic MediaWiki
Wikidata and Semantic MediaWiki
 
Redis IU
Redis IURedis IU
Redis IU
 
Proteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift PitfallsProteon - DevOps Live 2019 - OpenShift Pitfalls
Proteon - DevOps Live 2019 - OpenShift Pitfalls
 
Chaos engineering for start ups and sm es
Chaos engineering for start ups and sm esChaos engineering for start ups and sm es
Chaos engineering for start ups and sm es
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
 
Hungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applicationsHungarian ClusterGrid and its applications
Hungarian ClusterGrid and its applications
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
 
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysiaFOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
 
Expert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive MetastoreExpert Roundtable: The Future of Metadata After Hive Metastore
Expert Roundtable: The Future of Metadata After Hive Metastore
 
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
Miguel Angel Fajardo - NewSQL: the magic wand of data - Codemotion Rome 2019
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWAREFIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
 
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
FIWARE Wednesday Webinars - The Use of DDS Middleware in Robotics (Part 1)
 
Accelerating Spark with Kubernetes
Accelerating Spark with KubernetesAccelerating Spark with Kubernetes
Accelerating Spark with Kubernetes
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Maximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpaceMaximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpace
 

Ähnlich wie Ruby for soul of BigData Nerds

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data PersistenceFIWARE
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudlohitvijayarenu
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
Thrombus Training Dec. 2013
Thrombus Training Dec. 2013Thrombus Training Dec. 2013
Thrombus Training Dec. 2013CREATIS
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsShawn Zhu
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaborationJulien Pivotto
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012scorlosquet
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
 

Ähnlich wie Ruby for soul of BigData Nerds (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Big Trends in Big Data
Big Trends in Big DataBig Trends in Big Data
Big Trends in Big Data
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
Thrombus Training Dec. 2013
Thrombus Training Dec. 2013Thrombus Training Dec. 2013
Thrombus Training Dec. 2013
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...
 

Mehr von Abhishek Parolkar

MyDuniya enterprise offering
MyDuniya enterprise offeringMyDuniya enterprise offering
MyDuniya enterprise offeringAbhishek Parolkar
 
Nirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity clusterNirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity clusterAbhishek Parolkar
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ HomeAbhishek Parolkar
 
Building SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPPBuilding SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPPAbhishek Parolkar
 
Beyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By ParolkarBeyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By ParolkarAbhishek Parolkar
 
Canvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek ParolkarCanvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek ParolkarAbhishek Parolkar
 

Mehr von Abhishek Parolkar (6)

MyDuniya enterprise offering
MyDuniya enterprise offeringMyDuniya enterprise offering
MyDuniya enterprise offering
 
Nirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity clusterNirvigna - Rendering Hi-Res graphics on commodity cluster
Nirvigna - Rendering Hi-Res graphics on commodity cluster
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
Building SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPPBuilding SMS Applications with Ruby-SMPP
Building SMS Applications with Ruby-SMPP
 
Beyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By ParolkarBeyond Version Controlling Git By Parolkar
Beyond Version Controlling Git By Parolkar
 
Canvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek ParolkarCanvas Tag By Abhishek Parolkar
Canvas Tag By Abhishek Parolkar
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Ruby for soul of BigData Nerds

  • 1. Ruby for the soul of BigData Nerds
  • 2. Who Am I? ● Engineering Team Lead Analytics & Data Platforms @ Viki.com ● Founder of http://BigData.SG ● Contributor to fluentd, pfeed, cartographer, watir
  • 3. BigData & Its Challenges "big data" is when the size of the data itself becomes part of the problem - Mike Loukides ● Twitter produces over 230 million tweets per day ● Wal-Mart is logging one million transactions per hour ● Facebook creates over 30 billion pieces of content ranging from web links, news, blogs, photo
  • 4. Everyone has a big data problem
  • 5. Evolving Trends Batch Processing Hadoop , HPCC, Google BigQuery Stream Processing STORM (Twitter) & S4 (Yahoo)
  • 6. Common Engineering Challenges ● Data Collection ● Filtering / Segmentation ● Data Storage ● Analysis ● Visualization ● Prediction / Extrapolation
  • 7. Data Collection + Filtering / Segmentation http://fluentd.org/
  • 8. Data Collection + Filtering / Segmentation You send events as: Http://domain:8080/namespace?key1=value1&key2=value2 Fluent forwards the data as: <timestamp> <namespace> {key1:value1,key2:value2} http://fluentd.org/
  • 10. Storage Hadoop HDFS OpenTSDB (http://opentsdb.net) SciDB (DMAS)
  • 11. Analysis Hadoop Streaming (Ruby) Hadoop Hive (Using rbhive)
  • 12. Visualization Custom Dashboard (Rails + Google Charts / d3.js) Some Hosted Services: tableaupublic.com, geckoboard.com, splunkstorm.com
  • 15. STORM terminology ● Streams ● Spouts ● Bolts ● Topologies
  • 16. RedStorm (https://github.com/colinsurprenant/redstorm) $ rvm use jruby-1.6.3 $ bundle install redstorm $ bundle exec redstorm install
  • 17. Visualizing average bandwidth experienced by users while watching videos on viki.com across the globe.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Thank you! Let's stay in touch :) ● Signup for my newsletter at http://parolkar.com ● Visit BigData.SG Meetup in Singapore.