SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Moving from C#/.NET to
Hadoop/MongoDB
Robert Vandehey
December 4, 2012
We power the Discovery,
Delivery and Display of
Digital Entertainment
4   © 2012 Rovi Corporation. Company confidential.
Global Reach

137M+                                                 47M+
    Viewers use our guide technologies                Storefronts with entertainment services
    through service provider offerings                powered by Rovi Entertainment Store

    266M+
Consumer electronic (CE) devices                      Data coverage:
have our CE guide technologies
                                                          4.5M+
                                                          TV shows, movies, sports and celebrities
    40M+
    Households reached globally by
    Rovi Advertising Network                              3.3M+
                                                          Album releases and 32M music tracks
    600M+
Devices certified for high quality DivX video
playback                                                  500K+
                                                          Movie titles




7    © 2012 Rovi Corporation. Company confidential.
11   © 2012 Rovi Corporation. Company confidential.
The Problem
13   © 2012 Rovi Corporation. Company confidential.
ETL/Cache Loading Data Takes Too Long




                                                          Node 1                        MemcacheD              MemcacheD
                                                          DB                            (Scratch               Cluster
DSG DB                                                    Server                        Server(s))
               WSP ETL Server                             Backup &                                             MemcacheD
Server(s)
                                                           Restore
                                                                                        MemcacheD
                                   Transform      CI                                                  Cache    MemcacheD
  DSG         Extract   Database                              CI        Table Loading
                                               Database                                              Loading
Database                                                   Database        Process                   Process
                                                                                        MemcacheD              MemcacheDB
                                                          Node 2                                               Cluster
                                                          DB Server                                            MemcacheDB
                                                          Backup &
                                                           Restore
                                                                                                               MemcacheDB
                                                          CI Database




    Page 16
The Solution
17   © 2012 Rovi Corporation. Company confidential.
Hadoop/MongoDB




18   Copyright ®2012 Rovi Corporation. Company confidential.
Network Diagram




20   Copyright ®2012 Rovi Corporation. Company confidential.
Mongo Sharding




21   Copyright ®2012 Rovi Corporation. Company confidential.
Challenges
23   © 2012 Rovi Corporation. Company confidential.
Challenges
• Transition existing Windows/.NET team to Linux/Java
      – Environment setup. Technology framework choices
      – Coding differences
      – Cultural differences
      – Platform differences
      – Easier than expected to transition team from .NET to Java – No religious battles

• Backwards compatibility of CXF web services to Microsoft .NET web services
• Managing new releases of Hadoop
• BCP took too long
      – Converted to base tables. Used Pig to join the data

• Writes to Mongo are very fast. Updates are slower and saturated disks
      – Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo




24   © 2012 Rovi Corporation. Company confidential.
Lessons Learned

25   © 2012 Rovi Corporation. Company confidential.
Lessons Learned
• General
      – Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products
               • We purchased enterprise support agreements from both Cloudera and 10gen
      – Create a developers VM image
      – Deploy early and often even if not ready for real customers
      – Use the same setup in test and production environments
               • Sharding caused differences

• SQL
      – Get raw tables without any transformation or joins
               • Let Hadoop do the processing for you

• Hadoop
      – Do as much work as you can in Hadoop
      – Take the time to create small datasets to iterate fast
      – Take the time to learn and use Pig
               • It is very fast and provides tons of functionality that you don’t need to code in Java
      – Don’t create Runners - Use Oozie workflows
      – Measure, benchmark and track performance – Use Hadoop counters



26   © 2012 Rovi Corporation. Company confidential.
Lessons Learned - 2
• MongoDB
      – RAM, RAM, RAM!!!
      – Many writes from Hadoop can easily overwhelm MongoDB
               • Single database lock
               • Drive bandwidth saturation – Can be expanded through sharding
               • Do as much as possible to minimize writes
               • Measure where your application is blocking and optimize
      – Don’t shard unless you have to – if you do shard, preconfigure your shard key
               • You need a good shard key

      – Use Replica sets. They are easy to setup and work good.
               • Make sure repllog is large enough.

      – Use MongoDB Monitoring Service (MMS) – It’s free
      – Mongo queries are fast!




27   © 2012 Rovi Corporation. Company confidential.
Mongo Query – returns 90 rows from a database of 9
million in 44ms




28   © 2012 Rovi Corporation. Company confidential.
Q&A

31 © 2012 Rovi Corporation. Company confidential.
Follow-up Information

• Email: robert.vandehey@rovicorp.com
• LinkedIn: http://www.linkedin.com/in/bvandehey
• Twitter: @bvandehey
• Rovi Cloud Services: http://developer.rovicorp.com/




32   © 2012 Rovi Corporation. Company confidential.
Thank You
33 © 2012 Rovi Corporation. Company confidential.

Weitere ähnliche Inhalte

Was ist angesagt?

Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBaseHBaseCon
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive TuningAdam Muise
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldLester Martin
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 

Was ist angesagt? (20)

Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Data Evolution in HBase
Data Evolution in HBaseData Evolution in HBase
Data Evolution in HBase
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 

Ähnlich wie Moving from C#/.NET to Hadoop/MongoDB

MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMat Keep
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul BertucciSQL Server Workshop Paul Bertucci
SQL Server Workshop Paul BertucciMark Ginnebaugh
 
SQL Server 2008 Migration Workshop 04/29/2009
SQL Server 2008 Migration Workshop 04/29/2009SQL Server 2008 Migration Workshop 04/29/2009
SQL Server 2008 Migration Workshop 04/29/2009Database Architechs
 
Sina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloudSina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloudcong lei
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese天青 王
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You? EMC
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbaseiammutex
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopContinuent
 
Cloud Consolidation with Oracle (RAC) - How much is too much?
Cloud Consolidation with Oracle (RAC) - How much is too much?Cloud Consolidation with Oracle (RAC) - How much is too much?
Cloud Consolidation with Oracle (RAC) - How much is too much?Markus Michalewicz
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Gear6 and Scaling Website Performance: Caching Session and Profile Data with...
Gear6 and Scaling Website Performance:  Caching Session and Profile Data with...Gear6 and Scaling Website Performance:  Caching Session and Profile Data with...
Gear6 and Scaling Website Performance: Caching Session and Profile Data with...Gear6
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeDipti Borkar
 
Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase
 
Implementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedImplementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedGear6
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 

Ähnlich wie Moving from C#/.NET to Hadoop/MongoDB (20)

MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached API
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Virtualization and Containers
Virtualization and ContainersVirtualization and Containers
Virtualization and Containers
 
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul BertucciSQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
 
SQL Server 2008 Migration Workshop 04/29/2009
SQL Server 2008 Migration Workshop 04/29/2009SQL Server 2008 Migration Workshop 04/29/2009
SQL Server 2008 Migration Workshop 04/29/2009
 
Sina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloudSina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloud
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
Real-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to HadoopReal-Time Data Loading from MySQL to Hadoop
Real-Time Data Loading from MySQL to Hadoop
 
SQL Server User Group 02/2009
SQL Server User Group 02/2009SQL Server User Group 02/2009
SQL Server User Group 02/2009
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Cloud Consolidation with Oracle (RAC) - How much is too much?
Cloud Consolidation with Oracle (RAC) - How much is too much?Cloud Consolidation with Oracle (RAC) - How much is too much?
Cloud Consolidation with Oracle (RAC) - How much is too much?
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
Gear6 and Scaling Website Performance: Caching Session and Profile Data with...
Gear6 and Scaling Website Performance:  Caching Session and Profile Data with...Gear6 and Scaling Website Performance:  Caching Session and Profile Data with...
Gear6 and Scaling Website Performance: Caching Session and Profile Data with...
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011
 
Ruby at UW C4C
Ruby at UW C4CRuby at UW C4C
Ruby at UW C4C
 
Implementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedImplementing High Availability Caching with Memcached
Implementing High Availability Caching with Memcached
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Moving from C#/.NET to Hadoop/MongoDB

  • 1. Moving from C#/.NET to Hadoop/MongoDB Robert Vandehey December 4, 2012
  • 2. We power the Discovery, Delivery and Display of Digital Entertainment 4 © 2012 Rovi Corporation. Company confidential.
  • 3. Global Reach 137M+ 47M+ Viewers use our guide technologies Storefronts with entertainment services through service provider offerings powered by Rovi Entertainment Store 266M+ Consumer electronic (CE) devices Data coverage: have our CE guide technologies 4.5M+ TV shows, movies, sports and celebrities 40M+ Households reached globally by Rovi Advertising Network 3.3M+ Album releases and 32M music tracks 600M+ Devices certified for high quality DivX video playback 500K+ Movie titles 7 © 2012 Rovi Corporation. Company confidential.
  • 4.
  • 5.
  • 6. 11 © 2012 Rovi Corporation. Company confidential.
  • 7. The Problem 13 © 2012 Rovi Corporation. Company confidential.
  • 8. ETL/Cache Loading Data Takes Too Long Node 1 MemcacheD MemcacheD DB (Scratch Cluster DSG DB Server Server(s)) WSP ETL Server Backup & MemcacheD Server(s) Restore MemcacheD Transform CI Cache MemcacheD DSG Extract Database CI Table Loading Database Loading Database Database Process Process MemcacheD MemcacheDB Node 2 Cluster DB Server MemcacheDB Backup & Restore MemcacheDB CI Database Page 16
  • 9. The Solution 17 © 2012 Rovi Corporation. Company confidential.
  • 10. Hadoop/MongoDB 18 Copyright ®2012 Rovi Corporation. Company confidential.
  • 11. Network Diagram 20 Copyright ®2012 Rovi Corporation. Company confidential.
  • 12. Mongo Sharding 21 Copyright ®2012 Rovi Corporation. Company confidential.
  • 13. Challenges 23 © 2012 Rovi Corporation. Company confidential.
  • 14. Challenges • Transition existing Windows/.NET team to Linux/Java – Environment setup. Technology framework choices – Coding differences – Cultural differences – Platform differences – Easier than expected to transition team from .NET to Java – No religious battles • Backwards compatibility of CXF web services to Microsoft .NET web services • Managing new releases of Hadoop • BCP took too long – Converted to base tables. Used Pig to join the data • Writes to Mongo are very fast. Updates are slower and saturated disks – Implemented Diff process (MD5 calc) to allow Hadoop to do the work and minimize writes to Mongo 24 © 2012 Rovi Corporation. Company confidential.
  • 15. Lessons Learned 25 © 2012 Rovi Corporation. Company confidential.
  • 16. Lessons Learned • General – Current versions of Hadoop CDH4 and MongoDB 2.0 are actually very stable products • We purchased enterprise support agreements from both Cloudera and 10gen – Create a developers VM image – Deploy early and often even if not ready for real customers – Use the same setup in test and production environments • Sharding caused differences • SQL – Get raw tables without any transformation or joins • Let Hadoop do the processing for you • Hadoop – Do as much work as you can in Hadoop – Take the time to create small datasets to iterate fast – Take the time to learn and use Pig • It is very fast and provides tons of functionality that you don’t need to code in Java – Don’t create Runners - Use Oozie workflows – Measure, benchmark and track performance – Use Hadoop counters 26 © 2012 Rovi Corporation. Company confidential.
  • 17. Lessons Learned - 2 • MongoDB – RAM, RAM, RAM!!! – Many writes from Hadoop can easily overwhelm MongoDB • Single database lock • Drive bandwidth saturation – Can be expanded through sharding • Do as much as possible to minimize writes • Measure where your application is blocking and optimize – Don’t shard unless you have to – if you do shard, preconfigure your shard key • You need a good shard key – Use Replica sets. They are easy to setup and work good. • Make sure repllog is large enough. – Use MongoDB Monitoring Service (MMS) – It’s free – Mongo queries are fast! 27 © 2012 Rovi Corporation. Company confidential.
  • 18. Mongo Query – returns 90 rows from a database of 9 million in 44ms 28 © 2012 Rovi Corporation. Company confidential.
  • 19. Q&A 31 © 2012 Rovi Corporation. Company confidential.
  • 20. Follow-up Information • Email: robert.vandehey@rovicorp.com • LinkedIn: http://www.linkedin.com/in/bvandehey • Twitter: @bvandehey • Rovi Cloud Services: http://developer.rovicorp.com/ 32 © 2012 Rovi Corporation. Company confidential.
  • 21. Thank You 33 © 2012 Rovi Corporation. Company confidential.

Hinweis der Redaktion

  1. This is the new Data Load Process. It makes it look easy…
  2. …The reality it is quite complex. This is just one of our workflows. The orange/tan-ish boxes are Java map/reduce processes. The pink boxes are pig processes. The white boxes are BCP processes. The green boxes are MongoDB collections.
  3. Here is our sharding scheme. We actually have 6 more servers than is shown because we decided to have multiple replicas at each remote site.