SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Implementing and Visualizing Click-
  Stream Data with MongoDB	

                      	

Jan 22, 2013 - New York MongoDB User Group	

                        	

            Cameron Sim - LearnVest.com
Agenda	

•  About LearnVest	

•  HL Application Architecture	

•  Data Capture	

•  Event Packaging	

•  MongoDB Data Warehousing	

•  Loading & Visualization	

•  Finishing up
LearnVest Inc.
                            www.learnvest.com	

                             Mission Statement	

    Aiming to making Financial Planning as accessible as having a gym membership	

                                          	

                                          	

           Company	

                                          Key Products	

nded in 2008 by Alexa Von Tobel, CEO	

            Account Aggregation and Managem
                	

                              (Bank, Credit, Loan, Investment, Mort
 50+ People and Growing rapidly	

                                     	

          Based in NYC	

                       Original and Syndicated Newsletter Co
                	

                                                    	

           Platforms	

                                       Financial Planning	

         Web  iPhone	

                                  (tiered product offering)	

                	

                                                    	


                                        Stack	

                                                             Analytics	

        Operational	

                             MongoDB 2.2.0 (3-node replica-set
Wordpress, Backbone.js, Node.js	

                         Java 6, Spring 3	

ava Spring 3, Redis, Memcached,
LearnVest.com	

      Web
LearnVest.com	

     IPhone
High Level Architecture	

      Production	

                            Analytics	

               	

                                  	

elivery               Services	

   Services              Loaders  Dashbo




  HTTPS	

  pyMongo
ure Everything	

                            Collection	

-Driven events over web and mobile	

 m-level exceptions	

ything else	


porary Data	

ok’ with approximate data	

rational Databases are the system of record	


egate events as they come in	

ove the overhead of basic metrics (counts, sums) on core events	

p by user unique id and increment counts per event, over time-dimensions
eek-ending, month, year)
Data Capture	

OS	


 (void) sendAnalyticEventType:(NSString*)eventType
                       object:(NSString*)object
                         name:(NSString*)name
                         page:(NSString*)page
                       source:(NSString*)source;

    NSMutableDictionary *eventData = [NSMutableDictionary dictionary];

    if   (eventType!=nil) [params setObject:eventType forKey:@eventType];
    if   (object!=nil) [eventData setObject:object forKey:@object];
    if   (name!=nil) [eventData setObject:name forKey:@name];
    if   (page!=nil) [eventData setObject:page forKey:@page];
    if   (source!=nil) [eventData setObject:source forKey:@source];
    if   (eventData!=nil) [params setObject:eventData forKey:@eventData];

    [[LVNetworkEngine sharedManager] analytics_send:params];
Data Capture	

WEB (JavaScript)	


unction internalTrackPageView() {
  var cookie = {
            userContext: jQuery.cookie('UserContextCookie'),
      };
  var trackEvent = {
            eventType: pageView,
            eventData: {
                   page: window.location.pathname + window.location.search
            }
      };
      // AJAX
      jQuery.ajax({
             url: /api/track,
             type: POST,
             dataType: json,
             data: JSON.stringify(trackEvent),
             // Set Request Headers
             beforeSend: function (xhr, settings) {
                    xhr.setRequestHeader('Accept', 'application/json');
                    xhr.setRequestHeader('User-Context', cookie.userContext)
                    if(settings.type === 'PUT' || settings.type === 'POST')
                           xhr.setRequestHeader('Content-Type', 'application/js
                    }
             }
      });
Bus Event Packaging	

ng 3 RESTful service layer, controller methods dene the eventCode via @tracki
otation	


tom Intercepter class extends HandlerInterceptorAdapter and implements
 Handle() (for each event) to invoke calls via Spring @async to an EventPublisher	


ntPublisher publishes to common event bus queue with multiple subscribers, one o
kages the eventPayload MapString, Object object and forwards to Analytics Rest
Bus Event Packaging	

ing RestController Methods	

ace	


estMapping(value = /user/login, method = RequestMethod.POST,
rs=Accept=application/json)
c MapString, Object userLogin(@RequestBody MapString, Object event,
ervletRequest request);

ete/Impl Class	

ride
king(user.login)
c MapString, Object userLogin(@RequestBody MapString, Object event,
ervletRequest request){

/Implementation

eturn event;
Bus Event Packaging	

stom Intercepter class extends HandlerInterceptorAdapter 	


cted void handleTracking(String trackingCode, MapString, Object modelMap
ervletRequest request) {


MapString, Object responseModel = new HashMapString, Object();

 // remove non-serializables  copy over data from modelMap

 try {
        this.eventPublisher.publish(trackingCode, responseModel, request);
 } catch (Exception e) {
        log.error(Error tracking event ' + trackingCode + ' : 
                     + ExceptionUtils.getStackTrace(e));
 }
Bus Event Packaging	

stom Intercepter class extends HandlerInterceptorAdapter 	

c void publish (String eventCode, MapString,Object eventData,
                                                HttpServletRequest request

MapString,Object payload = new HashMapString,Object();
String eventId=UUID.randomUUID().toString();
MapString, String requestMap = HttpRequestUtils.getRequestHeaders(reques

//Normalize message
payload.put(eventType, eventData.get(eventType));
payload.put(eventData, eventData.get(eventType));
payload.put(version, eventData.get(eventType));
payload.put(eventId, eventId);
payload.put(eventTime, new Date());
payload.put(request, requestMap);
.
.
.
//Send to the Analytics Service for MongoDB persistence




c void sendPost(EventPayload payload){
   HttpEntity request = new HttpEntity(payload.getEventPayload(), headers)
Map m = restTemplate.postForObject(endpoint, request, java.util.Map.class)
Bus Event Packaging	

erialized Json (User Action)	


tCode”   :   “user.login”,
tType”   :   “login”,
ion”     :   “1.0”,
tTime”   :   “1358603157746”,
tData”   :   {
                  “” : “”,
                  “” : “”,
                  “” : “”
             },
est” : {
             “call-source” : “WEB”,
             “user-context” : “00002b4f1150249206ac2b692e48ddb3”,
             “user.agent”   : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)
                                AppleWebKit/537.11 (KHTML, like Gecko) Chrome/
                                23.0.1271.101 Safari/537.11”,
             “cookie”       : “size=4; CP.mode=B; PHPSESSID=c087908516
                                ee2fae50cef6500101dc89; resolution=1920;
                                JSESSIONID=56EB165266A2C4AFF9
                                46F139669D746F; csrftoken=73bdcd
                                ddf151dc56b8020855b2cb10c8, content-length :
                                204, accept-encoding : gzip,deflate,sdch”,

         }
Bus Event Packaging	

erialized Json (Generic Event)	


tCode”   :   “generic.ui”,
tType”   :   “pageView”,
ion”     :   “1.0”,
tTime”   :   “1358603157746”,
tData”   :   {
                  “page”    : “/learnvest/moneycenter/inbox”,
                  “section” : “transactions”,
                  “name”    : “view transactions”
                  “object” : “page”
             },
est” : {
             “call-source” : “WEB”,
             “user-context” : “00002b4f1150249206ac2b692e48ddb3”,
             “user.agent”   : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)
                                AppleWebKit/537.11 (KHTML, like Gecko) Chrome/
                                23.0.1271.101 Safari/537.11”,
             “cookie”       : “size=4; CP.mode=B; PHPSESSID=c087908516
                                ee2fae50cef6500101dc89; resolution=1920;
                                JSESSIONID=56EB165266A2C4AFF9
                                46F139669D746F; csrftoken=73bdcd
                                ddf151dc56b8020855b2cb10c8, content-length :
                                204, accept-encoding : gzip,deflate,sdch”,

         }
MongoDB Data Warehousing	

goDB Information	

 0	

 de replica-set	

rge (primary), 2x Medium (secondary) AWS Amazon-Linux machines	

  with single 500GB EBS volumes mounted to /opt/data	


goDB Cong File	

  = /opt/data/mongodb/datarest = truereplSet = voyager	

mes	

vents daily on web, ~600K on mobile	

B per day at start, slowed to ~1GB per day	

ntly at 78GB (collecting since August 2012)	


re Scaling Strategy	

p 2nd Replica-Set	

d replica-sets to n at 60% / 250GB per EBS volume	

d key probably based on sequential mix of email_address  additional string
MongoDB Data Warehousing	

OBILE	


 ist all events, bucketed by source, event code and time:-	

EB/MOBILE	

er.login	

 e (day, week-ending, month, year)	


ert into collection e_web / e_mobile	


sert into:- 	

web_user_login_day	

web_user_login_week	

web_user_login_month	

web_user_login_year	


 dictable model for scaling and measuring business growth
MongoDB Data Warehousing	

DBObject newDocument = new BasicDBObject().append($inc
                     new BasicDBObject().append(count, 1));

ate day dimension
ction_day.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_day.format(d)),newDocument, true, false

ate week dimension
ction_week.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_day.format(w)), newDocument, true, fals

ate month dimension
ction_month.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_month.format(d)), newDocument, true, fa

ate month dimension
ction_year.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_year.format(d)), newDocument, true, fal
MongoDB Data Warehousing	

ount_addManual_weeke_web_account_addManual_year
_user_login_day
_user_login_week
_user_login_month
_user_login_yeare_mobile_generic_ui_daye_mobile_generic_ui_monthe_mobile_g
weeke_mobile_generic_ui_year

e_web_user_login_day.find()
d : ObjectId(50e4b9871b36921910222c42), count   : 5, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50cd6cfcb9a80a2b4ee21422), count   : 7, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50cd6e51b9a80a2b4ee21427), count   : 2, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50e4b9871b36921910222c42), count   : 3, date : 01/03,
-context : 50e49a561b36921910222c33 }
MongoDB Data Warehousing	

1, accept-charset : ISO-8859-1,utf-8;q=0.7,*;q=0.3, cookie : size=
de=B; PHPSESSID=c087908516ee2fae50cef6500101dc89; resolution=1920;
IONID=56EB165266A2C4AFF946F139669D746F;
oken=73bdcdddf151dc56b8020855b2cb10c8, content-length : 255, accept-
ing : gzip,deflate,sdch }, eventType : flick, eventData : { obje
on, name : split transaction button, page : #inbox/79876/, secti
saction_river_details } }
MongoDB Data Warehousing	

xing Strategy	


xes on core collections (e_web and e_mobile) come in under 3GB on 7.5GB Large
ce and 3.75GB on Medium instances	


 datetime in two elds and compound index on date with other elds like eventTyp
unique id (user-context)	


vy insertion rates, much lower read rates....so less indexes the better
MongoDB Data Warehousing	

ing Strategy
e_web.getIndexes()[
        v : 1,            key : {                  request.user-contex
               created_date : 1        },            ns :
ycenter.e_web,             name : request.user-context_1_created_date_

        v : 1,            key : {                  eventData.name : 1
     created_date : 1            },           ns : moneycenter.e_web
 name : eventData.name_1_created_date_1     }]
jective	

Loading  Visualization	

 how historic and intraday stats on core use cases (logins, conversions)	

 how user funnel rates on conversion pages	

 how general usability - how do users really use the Web and IOS platforms?	


on-Functionals	

 traday doesn’t need to be “real-time”, polling is good enough for now	

Overnight batch job for historic must scale horizontally	


 neral Implementation Strategy	

 o all heavy lifting  object manipulation, UI should just display graph or table	

Modularize the service to be able to regenerate any graphs/tables without a full load
Loading  Visualization	

va Batch Service	


a Mongo library to query key collections and return user counts and sum of events

ursor webUserLogins = c.find(
   new BasicDBObject(date, sdf.format(new Date())));

vate HashMapString, Object getSumAndCount(DBCursor cursor){
          HashMapString, Object m = new HashMapString, Object();

           int sum=0;
           int count=0;
           DBObject obj;
           while(cursor.hasNext()){
                  obj=(DBObject)cursor.next();
                  count++;
                  sum=sum+(Integer)obj.get(count);
           }

           m.put(sum, sum);
           m.put(count, count);
           m.put(average, sdf.format(new Float(sum)/count));

           return m;
Loading  Visualization	

va Batch Service	


e Aggregation Framework where required on core collections (e_web) and externa
reate aggregation objects
bject project = new BasicDBObject($project,
 new BasicDBObject(day_value, fields) );
bject day_value = new BasicDBObject( day_value, $day_value);
bject groupFields = new BasicDBObject( _id, day_value);

reate the fields to group by, in this case “number”
upFields.put(number, new BasicDBObject( $sum, 1));

reate the group
bject group = new BasicDBObject($group, groupFields);

xecute
regationOutput output = mycollection.aggregate( project, group );

(DBObject obj : output.results()){
Loading  Visualization	


va Batch Service	


ngoDB Command Line example on aggregation over a time period, e.g. month
b.e_web.aggregate( [      { $match : { created_date : { $gt :
Date(2012-10-25T00:00:00)}}},     { $project : {        day_value : {day
dayOfMonth : $created_date },                          month:{ $month :
reated_date }} }},     { $group : {         _id : {day_value:$day_value}
    number : { $sum : 1 }      } },   { $sort : { day_value : -1 } } ])
Loading  Visualization	

va Batch Service	


sisting events into graph and table collections	


.homeGraphs.find()

_id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
counts_total : 54, date : ISODate(2011-02-06T05:00:00Z), linked_rate
.96, premium_rate : 0, str_date : 2011,01,06, upgrade_rate : 0
ers_avg_linked : 3.43, users_linked : 7 }
_id : ObjectId(50f57b5c1d4e714b581674e3), accounts_natural : 144,
counts_total : 144, date : ISODate(2011-02-07T05:00:00Z), linked_rat
.11, premium_rate : 0, str_date : 2011,01,07, upgrade_rate : 0
ers_avg_linked : 4, users_linked : 16 }
_id : ObjectId(50f57b5c1d4e714b581674e4), accounts_natural : 119,
counts_total : 119, date : ISODate(2011-02-08T05:00:00Z), linked_rat
.13, premium_rate : 0, str_date : 2011,01,08, upgrade_rate : 0
ers_avg_linked : 4.5, users_linked : 18 }
17)
           Loading  Visualization	

day numbers    try:        conn = pymongo.Connection('localhost',
           db = conn['lvanalytics']
accountmetrics.find(
                                           cursor =

           {date : {$gte : dt_from, $lte : dt_to}}).sort(date)
urn buildMetricsDict(cursor)    except Exception as e:
ger.error(e.message)


urn the graph object (as a list or a dict of lists) to the view that called the
thod	

edata={}
edata['accountsGraph']=mongodb_home.getHomeChart()

urn render_to_response('home.html',{'pagedata': pagedata},
text_instance=RequestContext(request))




.homeGraphs.find()

_id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
Loading  Visualization	


ango and HighCharts

pulate the series.. (JavaScript with Django templating)	

iesOptions[0] = {
id: 'naturalAccounts',    name: Natural Accounts,    data: [     {% for
n pagedata.metrics.accounts_natural %}          {% if not forloop.first
 {% endif %}               [Date.UTC({{a.0}}),{{a.1}}]         {% endfor
  ],   tooltip: {      valueDecimals: 2   }   };
Loading  Visualization	

ango and HighCharts

d Create the Charts and Tables...
Loading  Visualization	

ango and HighCharts

d Create the Charts and Tables...
Lessons Learned	

• Date Time managed as two fields, Datetime and Date	

• Aggregating and upserting documents as events are received works for us	

•  Real-time Map-Reduce in pyMongo - too slow, don’t do this.	

	

• Django-noRel - Unstable, use Django and configure MongoDB as a
      datastore only	


• Memcached on Django is good enough (at the moment) - use django-celery
      with rabbitmq to pre-cache all data after data loading	


•  HighCharts is buggy - considering D3  other libraries	

• Don’t need to retrieve data directly from MongoDB to Django, perhaps
      provide all data via a service layer (at the expense of ever-additional
      features in pyMongo)
Next Steps	

• A/B testing framework, experiments and variances	

•  Unauthenticated / Authenticated user tracking	

•  Provide data async over service layer	

• Segmentation with graphical libraries like D3  Cross-Filter (
http://square.github.com/crosslter/)	


• Saving Query Criteria, expanding out BI tools for internal users	

• MongoDB Connector, Hadoop and Hive (maybe Tableau and other tools)	

• Storm / Kafka for real-time analytics processing	

• Shard the Replica-Set, looking into Gizzard as the middleware
Hrishi Dixit	

  Chief Technology Ofcer	

                                                       
                                             Kevin Connelly	

                                         Director of Engineering	

                 Will Larche	

                                          kevin@learnvest.com	

   hrishi@learnvest.com	

                                  	

                                  	

                                                                     	

                                                                                Lead IOS Developer	

                                                                                will@learnvest.com	


                                  	

                                  	

                                  	

                                                  	

                   	

                                                                        	

                                                                        	

                                  	

                                   	

                                  	

                                   	

                                                    	

                 	

                                                    	

                 	

              	

                                             Cameron Sim	

                             	

       Jeremy Brennan	

                                        Director of Analytics Tech	

           your name here	

Director of UI/UX Technology	

                                        cameron@learnvest.com	

              New Awesome Develope
   jeremy@learnvest.com	

                                  	

                                           you@learnvest.com	

              	

                                  	

             	

                                             	

                        	

                                                                               HIR

Weitere ähnliche Inhalte

Was ist angesagt?

Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
Christopher Choi
 
파이썬+주요+용어+정리 20160304
파이썬+주요+용어+정리 20160304파이썬+주요+용어+정리 20160304
파이썬+주요+용어+정리 20160304
Yong Joon Moon
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)
맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)
맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)
if kakao
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 

Was ist angesagt? (20)

HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Content Management with MongoDB by Mark Helmstetter
 Content Management with MongoDB by Mark Helmstetter Content Management with MongoDB by Mark Helmstetter
Content Management with MongoDB by Mark Helmstetter
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
JDBC - JPA - Spring Data
JDBC - JPA - Spring DataJDBC - JPA - Spring Data
JDBC - JPA - Spring Data
 
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
파이썬+주요+용어+정리 20160304
파이썬+주요+용어+정리 20160304파이썬+주요+용어+정리 20160304
파이썬+주요+용어+정리 20160304
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
FIWARE Wednesday Webinars - FIWARE Overview
FIWARE Wednesday Webinars - FIWARE OverviewFIWARE Wednesday Webinars - FIWARE Overview
FIWARE Wednesday Webinars - FIWARE Overview
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)
맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)
맵매칭 (부정확한 GPS포인트들로부터 경로 추정하기)
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 

Andere mochten auch

Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 

Andere mochten auch (6)

MongoDB ClickStream and Visualization
MongoDB ClickStream and VisualizationMongoDB ClickStream and Visualization
MongoDB ClickStream and Visualization
 
Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customers
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
 
Web log & clickstream
Web log & clickstream Web log & clickstream
Web log & clickstream
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 

Ähnlich wie Implementing and Visualizing Clickstream data with MongoDB

Open analytics | Cameron Sim
Open analytics | Cameron SimOpen analytics | Cameron Sim
Open analytics | Cameron Sim
Open Analytics
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
FIWARE
 
Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22
FrĂŠdĂŠric Harper
 
Taking Web Apps Offline
Taking Web Apps OfflineTaking Web Apps Offline
Taking Web Apps Offline
Pedro Morais
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WARE
Fermin Galan
 
Engage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data CollectionEngage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data Collection
Webtrends
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB Stitch
MongoDB
 
Practical AngularJS
Practical AngularJSPractical AngularJS
Practical AngularJS
Wei Ru
 

Ähnlich wie Implementing and Visualizing Clickstream data with MongoDB (20)

Open analytics | Cameron Sim
Open analytics | Cameron SimOpen analytics | Cameron Sim
Open analytics | Cameron Sim
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22
Firefox OS: HTML5 sur les stĂŠroĂŻdes - HTML5mtl - 2014-04-22
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Taking Web Apps Offline
Taking Web Apps OfflineTaking Web Apps Offline
Taking Web Apps Offline
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WARE
 
Engage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data CollectionEngage 2013 - Multi Channel Data Collection
Engage 2013 - Multi Channel Data Collection
 
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
HTML for the Mobile Web, Firefox OS - All Things Open - 2014-10-22
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
Firefox OS, une plateforme à découvrir - IO Saglac - 2014-09-09
 
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
Firefox OS, HTML5 to the next level - Python Montreal - 2014-05-12
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB Stitch
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
 
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
[Serverless Meetup Tokyo #3] Serverless in Azure (Azure Functionsのアップデート、事例、デ...
 
HTML5 on Mobile
HTML5 on MobileHTML5 on Mobile
HTML5 on Mobile
 
Practical AngularJS
Practical AngularJSPractical AngularJS
Practical AngularJS
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrĂŠdient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrĂŠdient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrĂŠdient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrĂŠdient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

KĂźrzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

KĂźrzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Implementing and Visualizing Clickstream data with MongoDB

  • 1. Implementing and Visualizing Click- Stream Data with MongoDB Jan 22, 2013 - New York MongoDB User Group Cameron Sim - LearnVest.com
  • 2. Agenda •  About LearnVest •  HL Application Architecture •  Data Capture •  Event Packaging •  MongoDB Data Warehousing •  Loading & Visualization •  Finishing up
  • 3. LearnVest Inc. www.learnvest.com Mission Statement Aiming to making Financial Planning as accessible as having a gym membership Company Key Products nded in 2008 by Alexa Von Tobel, CEO Account Aggregation and Managem (Bank, Credit, Loan, Investment, Mort 50+ People and Growing rapidly Based in NYC Original and Syndicated Newsletter Co Platforms Financial Planning Web iPhone (tiered product offering) Stack Analytics Operational MongoDB 2.2.0 (3-node replica-set Wordpress, Backbone.js, Node.js Java 6, Spring 3 ava Spring 3, Redis, Memcached,
  • 5. LearnVest.com IPhone
  • 6. High Level Architecture Production Analytics elivery Services Services Loaders Dashbo HTTPS pyMongo
  • 7. ure Everything Collection -Driven events over web and mobile m-level exceptions ything else porary Data ok’ with approximate data rational Databases are the system of record egate events as they come in ove the overhead of basic metrics (counts, sums) on core events p by user unique id and increment counts per event, over time-dimensions eek-ending, month, year)
  • 8. Data Capture OS (void) sendAnalyticEventType:(NSString*)eventType object:(NSString*)object name:(NSString*)name page:(NSString*)page source:(NSString*)source; NSMutableDictionary *eventData = [NSMutableDictionary dictionary]; if (eventType!=nil) [params setObject:eventType forKey:@eventType]; if (object!=nil) [eventData setObject:object forKey:@object]; if (name!=nil) [eventData setObject:name forKey:@name]; if (page!=nil) [eventData setObject:page forKey:@page]; if (source!=nil) [eventData setObject:source forKey:@source]; if (eventData!=nil) [params setObject:eventData forKey:@eventData]; [[LVNetworkEngine sharedManager] analytics_send:params];
  • 9. Data Capture WEB (JavaScript) unction internalTrackPageView() { var cookie = { userContext: jQuery.cookie('UserContextCookie'), }; var trackEvent = { eventType: pageView, eventData: { page: window.location.pathname + window.location.search } }; // AJAX jQuery.ajax({ url: /api/track, type: POST, dataType: json, data: JSON.stringify(trackEvent), // Set Request Headers beforeSend: function (xhr, settings) { xhr.setRequestHeader('Accept', 'application/json'); xhr.setRequestHeader('User-Context', cookie.userContext) if(settings.type === 'PUT' || settings.type === 'POST') xhr.setRequestHeader('Content-Type', 'application/js } } });
  • 10. Bus Event Packaging ng 3 RESTful service layer, controller methods dene the eventCode via @tracki otation tom Intercepter class extends HandlerInterceptorAdapter and implements Handle() (for each event) to invoke calls via Spring @async to an EventPublisher ntPublisher publishes to common event bus queue with multiple subscribers, one o kages the eventPayload MapString, Object object and forwards to Analytics Rest
  • 11. Bus Event Packaging ing RestController Methods ace estMapping(value = /user/login, method = RequestMethod.POST, rs=Accept=application/json) c MapString, Object userLogin(@RequestBody MapString, Object event, ervletRequest request); ete/Impl Class ride king(user.login) c MapString, Object userLogin(@RequestBody MapString, Object event, ervletRequest request){ /Implementation eturn event;
  • 12. Bus Event Packaging stom Intercepter class extends HandlerInterceptorAdapter cted void handleTracking(String trackingCode, MapString, Object modelMap ervletRequest request) { MapString, Object responseModel = new HashMapString, Object(); // remove non-serializables copy over data from modelMap try { this.eventPublisher.publish(trackingCode, responseModel, request); } catch (Exception e) { log.error(Error tracking event ' + trackingCode + ' : + ExceptionUtils.getStackTrace(e)); }
  • 13. Bus Event Packaging stom Intercepter class extends HandlerInterceptorAdapter c void publish (String eventCode, MapString,Object eventData, HttpServletRequest request MapString,Object payload = new HashMapString,Object(); String eventId=UUID.randomUUID().toString(); MapString, String requestMap = HttpRequestUtils.getRequestHeaders(reques //Normalize message payload.put(eventType, eventData.get(eventType)); payload.put(eventData, eventData.get(eventType)); payload.put(version, eventData.get(eventType)); payload.put(eventId, eventId); payload.put(eventTime, new Date()); payload.put(request, requestMap); . . . //Send to the Analytics Service for MongoDB persistence c void sendPost(EventPayload payload){ HttpEntity request = new HttpEntity(payload.getEventPayload(), headers) Map m = restTemplate.postForObject(endpoint, request, java.util.Map.class)
  • 14. Bus Event Packaging erialized Json (User Action) tCode” : “user.login”, tType” : “login”, ion” : “1.0”, tTime” : “1358603157746”, tData” : { “” : “”, “” : “”, “” : “” }, est” : { “call-source” : “WEB”, “user-context” : “00002b4f1150249206ac2b692e48ddb3”, “user.agent” : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/ 23.0.1271.101 Safari/537.11”, “cookie” : “size=4; CP.mode=B; PHPSESSID=c087908516 ee2fae50cef6500101dc89; resolution=1920; JSESSIONID=56EB165266A2C4AFF9 46F139669D746F; csrftoken=73bdcd ddf151dc56b8020855b2cb10c8, content-length : 204, accept-encoding : gzip,deflate,sdch”, }
  • 15. Bus Event Packaging erialized Json (Generic Event) tCode” : “generic.ui”, tType” : “pageView”, ion” : “1.0”, tTime” : “1358603157746”, tData” : { “page” : “/learnvest/moneycenter/inbox”, “section” : “transactions”, “name” : “view transactions” “object” : “page” }, est” : { “call-source” : “WEB”, “user-context” : “00002b4f1150249206ac2b692e48ddb3”, “user.agent” : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/ 23.0.1271.101 Safari/537.11”, “cookie” : “size=4; CP.mode=B; PHPSESSID=c087908516 ee2fae50cef6500101dc89; resolution=1920; JSESSIONID=56EB165266A2C4AFF9 46F139669D746F; csrftoken=73bdcd ddf151dc56b8020855b2cb10c8, content-length : 204, accept-encoding : gzip,deflate,sdch”, }
  • 16. MongoDB Data Warehousing goDB Information 0 de replica-set rge (primary), 2x Medium (secondary) AWS Amazon-Linux machines with single 500GB EBS volumes mounted to /opt/data goDB Cong File = /opt/data/mongodb/datarest = truereplSet = voyager mes vents daily on web, ~600K on mobile B per day at start, slowed to ~1GB per day ntly at 78GB (collecting since August 2012) re Scaling Strategy p 2nd Replica-Set d replica-sets to n at 60% / 250GB per EBS volume d key probably based on sequential mix of email_address additional string
  • 17. MongoDB Data Warehousing OBILE ist all events, bucketed by source, event code and time:- EB/MOBILE er.login e (day, week-ending, month, year) ert into collection e_web / e_mobile sert into:- web_user_login_day web_user_login_week web_user_login_month web_user_login_year dictable model for scaling and measuring business growth
  • 18. MongoDB Data Warehousing DBObject newDocument = new BasicDBObject().append($inc new BasicDBObject().append(count, 1)); ate day dimension ction_day.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_day.format(d)),newDocument, true, false ate week dimension ction_week.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_day.format(w)), newDocument, true, fals ate month dimension ction_month.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_month.format(d)), newDocument, true, fa ate month dimension ction_year.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_year.format(d)), newDocument, true, fal
  • 19. MongoDB Data Warehousing ount_addManual_weeke_web_account_addManual_year _user_login_day _user_login_week _user_login_month _user_login_yeare_mobile_generic_ui_daye_mobile_generic_ui_monthe_mobile_g weeke_mobile_generic_ui_year e_web_user_login_day.find() d : ObjectId(50e4b9871b36921910222c42), count : 5, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50cd6cfcb9a80a2b4ee21422), count : 7, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50cd6e51b9a80a2b4ee21427), count : 2, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50e4b9871b36921910222c42), count : 3, date : 01/03, -context : 50e49a561b36921910222c33 }
  • 20. MongoDB Data Warehousing 1, accept-charset : ISO-8859-1,utf-8;q=0.7,*;q=0.3, cookie : size= de=B; PHPSESSID=c087908516ee2fae50cef6500101dc89; resolution=1920; IONID=56EB165266A2C4AFF946F139669D746F; oken=73bdcdddf151dc56b8020855b2cb10c8, content-length : 255, accept- ing : gzip,deflate,sdch }, eventType : flick, eventData : { obje on, name : split transaction button, page : #inbox/79876/, secti saction_river_details } }
  • 21. MongoDB Data Warehousing xing Strategy xes on core collections (e_web and e_mobile) come in under 3GB on 7.5GB Large ce and 3.75GB on Medium instances datetime in two elds and compound index on date with other elds like eventTyp unique id (user-context) vy insertion rates, much lower read rates....so less indexes the better
  • 22. MongoDB Data Warehousing ing Strategy e_web.getIndexes()[ v : 1, key : { request.user-contex created_date : 1 }, ns : ycenter.e_web, name : request.user-context_1_created_date_ v : 1, key : { eventData.name : 1 created_date : 1 }, ns : moneycenter.e_web name : eventData.name_1_created_date_1 }]
  • 23. jective Loading Visualization how historic and intraday stats on core use cases (logins, conversions) how user funnel rates on conversion pages how general usability - how do users really use the Web and IOS platforms? on-Functionals traday doesn’t need to be “real-time”, polling is good enough for now Overnight batch job for historic must scale horizontally neral Implementation Strategy o all heavy lifting object manipulation, UI should just display graph or table Modularize the service to be able to regenerate any graphs/tables without a full load
  • 24. Loading Visualization va Batch Service a Mongo library to query key collections and return user counts and sum of events ursor webUserLogins = c.find( new BasicDBObject(date, sdf.format(new Date()))); vate HashMapString, Object getSumAndCount(DBCursor cursor){ HashMapString, Object m = new HashMapString, Object(); int sum=0; int count=0; DBObject obj; while(cursor.hasNext()){ obj=(DBObject)cursor.next(); count++; sum=sum+(Integer)obj.get(count); } m.put(sum, sum); m.put(count, count); m.put(average, sdf.format(new Float(sum)/count)); return m;
  • 25. Loading Visualization va Batch Service e Aggregation Framework where required on core collections (e_web) and externa reate aggregation objects bject project = new BasicDBObject($project, new BasicDBObject(day_value, fields) ); bject day_value = new BasicDBObject( day_value, $day_value); bject groupFields = new BasicDBObject( _id, day_value); reate the fields to group by, in this case “number” upFields.put(number, new BasicDBObject( $sum, 1)); reate the group bject group = new BasicDBObject($group, groupFields); xecute regationOutput output = mycollection.aggregate( project, group ); (DBObject obj : output.results()){
  • 26. Loading Visualization va Batch Service ngoDB Command Line example on aggregation over a time period, e.g. month b.e_web.aggregate( [ { $match : { created_date : { $gt : Date(2012-10-25T00:00:00)}}}, { $project : { day_value : {day dayOfMonth : $created_date }, month:{ $month : reated_date }} }}, { $group : { _id : {day_value:$day_value} number : { $sum : 1 } } }, { $sort : { day_value : -1 } } ])
  • 27. Loading Visualization va Batch Service sisting events into graph and table collections .homeGraphs.find() _id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54, counts_total : 54, date : ISODate(2011-02-06T05:00:00Z), linked_rate .96, premium_rate : 0, str_date : 2011,01,06, upgrade_rate : 0 ers_avg_linked : 3.43, users_linked : 7 } _id : ObjectId(50f57b5c1d4e714b581674e3), accounts_natural : 144, counts_total : 144, date : ISODate(2011-02-07T05:00:00Z), linked_rat .11, premium_rate : 0, str_date : 2011,01,07, upgrade_rate : 0 ers_avg_linked : 4, users_linked : 16 } _id : ObjectId(50f57b5c1d4e714b581674e4), accounts_natural : 119, counts_total : 119, date : ISODate(2011-02-08T05:00:00Z), linked_rat .13, premium_rate : 0, str_date : 2011,01,08, upgrade_rate : 0 ers_avg_linked : 4.5, users_linked : 18 }
  • 28. 17) Loading Visualization day numbers try: conn = pymongo.Connection('localhost', db = conn['lvanalytics'] accountmetrics.find( cursor = {date : {$gte : dt_from, $lte : dt_to}}).sort(date) urn buildMetricsDict(cursor) except Exception as e: ger.error(e.message) urn the graph object (as a list or a dict of lists) to the view that called the thod edata={} edata['accountsGraph']=mongodb_home.getHomeChart() urn render_to_response('home.html',{'pagedata': pagedata}, text_instance=RequestContext(request)) .homeGraphs.find() _id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
  • 29. Loading Visualization ango and HighCharts pulate the series.. (JavaScript with Django templating) iesOptions[0] = { id: 'naturalAccounts', name: Natural Accounts, data: [ {% for n pagedata.metrics.accounts_natural %} {% if not forloop.first {% endif %} [Date.UTC({{a.0}}),{{a.1}}] {% endfor ], tooltip: { valueDecimals: 2 } };
  • 30. Loading Visualization ango and HighCharts d Create the Charts and Tables...
  • 31. Loading Visualization ango and HighCharts d Create the Charts and Tables...
  • 32. Lessons Learned • Date Time managed as two elds, Datetime and Date • Aggregating and upserting documents as events are received works for us •  Real-time Map-Reduce in pyMongo - too slow, don’t do this. • Django-noRel - Unstable, use Django and congure MongoDB as a datastore only • Memcached on Django is good enough (at the moment) - use django-celery with rabbitmq to pre-cache all data after data loading •  HighCharts is buggy - considering D3 other libraries • Don’t need to retrieve data directly from MongoDB to Django, perhaps provide all data via a service layer (at the expense of ever-additional features in pyMongo)
  • 33. Next Steps • A/B testing framework, experiments and variances •  Unauthenticated / Authenticated user tracking •  Provide data async over service layer • Segmentation with graphical libraries like D3 Cross-Filter ( http://square.github.com/crosslter/) • Saving Query Criteria, expanding out BI tools for internal users • MongoDB Connector, Hadoop and Hive (maybe Tableau and other tools) • Storm / Kafka for real-time analytics processing • Shard the Replica-Set, looking into Gizzard as the middleware
  • 34. Hrishi Dixit Chief Technology Ofcer Kevin Connelly Director of Engineering Will Larche kevin@learnvest.com hrishi@learnvest.com Lead IOS Developer will@learnvest.com Cameron Sim Jeremy Brennan Director of Analytics Tech your name here Director of UI/UX Technology cameron@learnvest.com New Awesome Develope jeremy@learnvest.com you@learnvest.com HIR