SlideShare ist ein Scribd-Unternehmen logo
1 von 17
PAYPAL - BEHAVIORAL TRACKING ON HADOOP

ANIL MADAN
DIRECTOR OF ENGINEERING , MARKETING & ANALYTICS
PAYPAL'S VISION




           Delivering the future of money today…
   An essential part of our customer's financial and business
lives, enabling secure commerce anywhere, anytime, any way

     110 million active accounts , 190 markets , 25 currencies
                                                                 2
BEHAVIORAL TRACKING VISION
  Understand our      anytime, anywhere, any way       to drive desirable
customer’s behavior                                    outcomes for our
  and experience                                   customers and for PayPal.




                                                      Enable self-service
                                                      analytics for our
                                                      product and
                            Ensure                    marketing teams
Ensure privacy,
                            instrumentation
security and trust
                            standardization                                     3
for our customers
                            across channels
                                                                            3
TRACKING PLATFORM OVERVIEW


    Direct/            Transaction          Email            Display             Search
   Home Page              Emails           Marketing       Advertising           Engine
                                                                                Marketing




   Metadata                     Tracking Servers                   Real Time Systems
  Tracking Metadata                                                         Marketing
                                Tracking Event Service
         Tool
                                                                          Segmentation
                                     Tracking Validation
              Tag
Taxonomy                                  Service                        Experimentation
             Catalog


                              Big Data
 Reporting/Visualization               Digital Metrics                    Attribution
                                                                                            4
METADATA - ENTITY MODEL

 LAYOUT                   PAGE




   ELEMENTS                      LINK



   COMPONENTS


                                    5
METADATA - EVENT MODEL


                                      Tracking
                                       Event




                  Impression                                 Reaction                Conversion
                    Event                                     Event                    Event



Component       Page              Ad             Click       Click-Through     Mouse-over
Impression    Impression       Impression        Event           Event           Event
  Event         Event            Event


    Client Page            Server Page                   Entry                Exit
    Impression              Impression                   Event               Event
       Event                  Event


                                                                                             6
ATTRIBUTION MODEL

         Channel           Impression       Click   Open
                       Client      Server
Direct                   ✓          ✓
Organic Search           ✓
Paid Search                                  ✓
Display Offers                      ✓        ✓
Onsite Offers                       ✓        ✓

Transactional Emails                         ✓       ✓
Marketing Emails                             ✓       ✓




                                                           7
LOGICAL ARCHITECTURE
             Onsite Channels                                          Marketing Channels

                        Mobile                Search                                                    Display
  Web Tracking                                                 Social       Email          Onsite
                   Instrumentation            Engine                                                   Advertising
      JS                                                      Marketing    Marketing      Marketing
                         API                 Marketing
                                            Instrumentation


                               Tracking            Tracking                            Message Delivery Services
      Metadata                 Servers             Service
        Tool                                                                                          Marketing
                                                                                  Segmentation
                                                  Active MQ                                            Offers
                                                                                    Service
                                                  Producer                                             Service

      Tracking
      Metadata                                    Active
      Service                                      MQ                                       Hadoop Cluster



                               Tracking    Active MQ           Active MQ
                               Collector   Consumer            Consumer
                                                                                    Customer          Operational
                                                                                   Intelligence        Metrics
 Metadata      Tag
Repository    Catalog                      NAS Filer          NAS Filer             Behavioral
                                                                                   Intelligence       Reporting

                                                 Aggregation/                     Sessionization       Identity
                                Tracking
                                                 Compression                       Bot Flagging        Mapping
                                Batch
                                                                                                          8
DATA INGEST PIPELINE

                   Raw Event
  PRE-PROCESS




                                    Map Reduce                                    Map Reduce
                    Gzip Text
                                                               Deduped                              Enriched
                                      Validate/                 Event             Join Client &      Event
                                    Dedup Events               Gzip block         Server Events     Gzip block
                                                              compressed                           compressed
                   Raw Event
                                                             SequenceFile                         SequenceFile
                    Gzip Text


                                                             CHAIN REDUCER
  SESSIONIZATION




                                     Map Reduce                 Mapper              Mapper

                    Enriched         Sessionization             Geo Lookup         Bot Flagging     Sessions
                     Event


                                                                   Geo               Bot Data/
                                                                   Data               Rules

                                  Map Reduce       Map Reduce

                                                                     Behavioral       Reporting
GENERATION




                    Sessions        Stage 1           Stage 2
                                                                      Metrics          MySQL
METRICS




                                                       Pig
                    Enriched
                     Event                         Adhoc Metrics
SESSIONIZATION
                       Events                                                        VisitContainer
Visitor      Session     Timestamp        Event                Visitor     Session                 Payload
ID           ID                           Payload              ID          ID

                                                                  V1          S1      ie, winnt, {flash, quicktime},
   V1            S1      2012-05-24           E1
                                                                                      {ca, usa}, 480 secs,….
                         05:12
                                                                                                     E1
   V2            S2      2012-05-24           E2
                         05:14                                                                       E3
   V1            S1      2012-05-24           E3                                                     E4
                         05:15
                                                                  V2          S2      ff, winxp, {acrobat,
   V1            S1      2012-05-24           E4                                      mediaplayer}. {wb, in}, 420
                         05:20                                                        secs…..
   V2            S2      2012-05-24           E5                                                     E2
                         05:21
                                                                                                     E5
   V1            S3      2012-05-24           E6
                         07:25                                    V1          S3      sf, macos, {quicktime, java},
                                                                                      {on, ca}, 60 secs
   V1            S3      2012-05-24           E7
                         07:26                                                                       E6
                                                                                                     E7
•  Chronologically sort events using secondary sort
        •  SortComparator on visitorid, sessionid and timestamp
        •    Partitioner & Grouping comparator on visitorid and sessionid
•  Normalize data and store it against the session record                                                              10
        •    Browser, os, plugins, geo-location, duration, bot-flag etc.
DIMENSIONS & METRICS

    Dimension          Metrics
  Page            Visitors
  PageFlow        Sessions
  Country         Bounce Rate
  CountryRegion   Page Views
  Plugins
  VisitDepth
  VisitDuration     Time Period
  VisitByHour     Hourly
  SearchEngine    Daily
  OS              Weekly
  Browser         Monthly


                                  11
METRICS GENERATION
          Mapper Input                   Mapper Output
                                                                             Reducer Output
Visitor     Session       Browser      Key           Value
  ID          ID                    (visitorid,   (#sessions)                Key           Value
                                    browser)                              (visitorid,   (#sessions)
                                                                          browser)                    Compute
  V1          S1            IE        V1,IE            1
                                                                            V1,IE           2
                                                                                                      sessions sorted
  V1          S2            IE        V1,IE            1                                              by visitor,
                                                                            V2,FF           1         dimension
  V2          S3            FF        V2,FF            1        STAGE 1
                                                                            V3,IE           1         (browser)
  V3          S4            IE        V3,IE            1
                                                                            V4,FF           1
  V4          S5            FF        V4,FF            1



       Mapper Input                  Mapper Output

   Key            Value             Key              Value                   Reducer Output           Compute
(visitorid,    (#sessions)       (browser)        (#sessions,                                         metrics
browser)                                           #visitors)                Key           Value
                                                                          (browser)     (#sessions,
                                                                                                      by
                                                                                         #visitors)   dimension
  V1,IE               2             IE                2,1
                                                                              IE            4,3
  V2,FF               1             IE                1,1
                                                                STAGE 2       FF            1,1
  V3,IE               1             FF                1,1

  V4,FF               1             IE                1,1                                                         12
PIG – ADHOC QUERIES
/* EventLoader - custom loader ; Exposes correct data-types using metadata for each field*/

grunt> data = LOAD '/paypal/event' USING
>> com.paypal.EventLoader(
>> 'visitor_id, session_id, page_name, event_type, event_timestamp');

grunt> describe data;
data: {visitor_id: chararray, session_id: chararray, page_name: chararray,
event_type: chararray, event_timestamp: long }

grunt> events = FILTER data BY event_timestamp >= 1337583600000L and
event_timestamp < 1337587200000L;

grunt> grouped = group events by (page_name, event_type) parallel 20;
grunt> result = foreach grouped {
>>      visitors = distinct events.visitor_id;
>>      sessions = distinct events.session_id;
>>      generate group, COUNT(visitors), COUNT(sessions), COUNT(events);
>> };

grunt> dump result;
((My Account Overview, im), 117875L,119343L,230216L)
((mktg:xsell:merchant::home-inside, im), 462L,466L,655L)                                      13
PIG – ADHOC QUERIES
/* VisitContainerLoader custom loader - Tuple ( Tuple, Bag (Tuple) )*/

grunt> data = LOAD '/paypal/visitcontainer'
>> USING com.paypal.VisitContainerLoader(
>> '{"visit":["visitor_id",”session_id","session_start", "session_end", "browser_type"],
"events":["page_name", "event_type"]}');

grunt> describe data;
data: {visit: (visitor_id: chararray, session_id: chararray, session_start: long, session_end:
long, browser_type: chararray),
        events: {event: (page_name: chararray, event_type: chararray)}}

grunt> flattened = foreach data generate FLATTEN(visit), FLATTEN(events);
grunt> impression = FILTER flattened BY event_type == 'im' and session_start >=
1339045200000L and session_end < 1339063200000L;
grunt> grouped = group impression by (page_name, browser_type) parallel 20;
grunt> result = foreach grouped {
>> visitors = distinct impression.visitor_id;
>> sessions = distinct impression.session_id;
>> generate group, COUNT(visitors), COUNT(sessions), COUNT(impression);
>> };

grunt> dump result;
((Account History:Request Money Details, chrome), 522L,528L,726L)
                                                                                                 14
((Account History:Request Money Details, msie), 706L,716L,967L)
REPORTING




            15
THANK YOU


We Are Hiring!
•  San Jose
•  Boston
•  Bangalore
•  Shanghai
Sessions will resume at 4:30pm




                             Page 17

Weitere ähnliche Inhalte

Was ist angesagt?

MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
Rahul Bhatia
 

Was ist angesagt? (20)

Telecom Churn Prediction
Telecom Churn PredictionTelecom Churn Prediction
Telecom Churn Prediction
 
Route Optimization Algorithm..
Route Optimization Algorithm..Route Optimization Algorithm..
Route Optimization Algorithm..
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Telecom Subscription, Churn and ARPU Analysis
Telecom Subscription, Churn and ARPU AnalysisTelecom Subscription, Churn and ARPU Analysis
Telecom Subscription, Churn and ARPU Analysis
 
A chart of the big data ecosystem
A chart of the big data ecosystemA chart of the big data ecosystem
A chart of the big data ecosystem
 
Marketing Strategy of Maersk Line
Marketing Strategy of Maersk LineMarketing Strategy of Maersk Line
Marketing Strategy of Maersk Line
 
PM Gatishakti scheme
PM Gatishakti schemePM Gatishakti scheme
PM Gatishakti scheme
 
Importance of predictive analytics to business agility
Importance of predictive analytics to business agilityImportance of predictive analytics to business agility
Importance of predictive analytics to business agility
 
The Digital Customer Experience: Why the Future of the Communications Industr...
The Digital Customer Experience: Why the Future of the Communications Industr...The Digital Customer Experience: Why the Future of the Communications Industr...
The Digital Customer Experience: Why the Future of the Communications Industr...
 
Anaplan Webinar | Connected planning with Deloitte
 Anaplan Webinar | Connected planning with Deloitte Anaplan Webinar | Connected planning with Deloitte
Anaplan Webinar | Connected planning with Deloitte
 
Alibaba Cloud AI Solutions - ET Brain
Alibaba Cloud AI Solutions - ET BrainAlibaba Cloud AI Solutions - ET Brain
Alibaba Cloud AI Solutions - ET Brain
 
Paperless Supply Chain Collaboration at DuluxGroup - SID 51254
Paperless Supply Chain Collaboration at DuluxGroup - SID 51254Paperless Supply Chain Collaboration at DuluxGroup - SID 51254
Paperless Supply Chain Collaboration at DuluxGroup - SID 51254
 
Public Bicycle Scheme in India
Public Bicycle Scheme in IndiaPublic Bicycle Scheme in India
Public Bicycle Scheme in India
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Transportation Management System
Transportation Management SystemTransportation Management System
Transportation Management System
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Urban Logistics
Urban LogisticsUrban Logistics
Urban Logistics
 
Mobility or Accessibility?
Mobility or Accessibility?Mobility or Accessibility?
Mobility or Accessibility?
 
Business model of ola cabs
Business model of ola cabsBusiness model of ola cabs
Business model of ola cabs
 
Tactical Brand Marketing Plan - UBER Munich, Germany
Tactical Brand Marketing Plan - UBER Munich, GermanyTactical Brand Marketing Plan - UBER Munich, Germany
Tactical Brand Marketing Plan - UBER Munich, Germany
 

Andere mochten auch

Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
GurinderG
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using Hadoop
DataWorks Summit
 
Paypal Platform: Evolving for simplicity and reach - IBM Silicon Valley Lab
Paypal Platform: Evolving for simplicity and reach - IBM Silicon Valley LabPaypal Platform: Evolving for simplicity and reach - IBM Silicon Valley Lab
Paypal Platform: Evolving for simplicity and reach - IBM Silicon Valley Lab
Deepak Nadig
 
eCommerce and ePayments markets in Russia : trends , analytics , perspect...
eCommerce and  ePayments markets in  Russia :  trends ,  analytics , perspect...eCommerce and  ePayments markets in  Russia :  trends ,  analytics , perspect...
eCommerce and ePayments markets in Russia : trends , analytics , perspect...
Data Insight
 

Andere mochten auch (20)

Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Big Data: It's More Than Volume, Paypal
Big Data: It's More Than Volume, PaypalBig Data: It's More Than Volume, Paypal
Big Data: It's More Than Volume, Paypal
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using Hadoop
 
Big- Data and Risk Management - Ido Lustig, PayPal
Big- Data and Risk Management - Ido Lustig, PayPalBig- Data and Risk Management - Ido Lustig, PayPal
Big- Data and Risk Management - Ido Lustig, PayPal
 
Importance of connecting CRM with ERP
Importance of connecting CRM with ERPImportance of connecting CRM with ERP
Importance of connecting CRM with ERP
 
Paypal Platform: Evolving for simplicity and reach - IBM Silicon Valley Lab
Paypal Platform: Evolving for simplicity and reach - IBM Silicon Valley LabPaypal Platform: Evolving for simplicity and reach - IBM Silicon Valley Lab
Paypal Platform: Evolving for simplicity and reach - IBM Silicon Valley Lab
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Cloud Integration Services on SAP HANA Cloud Platform
Cloud Integration Services on SAP HANA Cloud PlatformCloud Integration Services on SAP HANA Cloud Platform
Cloud Integration Services on SAP HANA Cloud Platform
 
Innovating to Real-Time using SAP BusinessObjects & SAP HANA
Innovating to Real-Time using SAP BusinessObjects & SAP HANAInnovating to Real-Time using SAP BusinessObjects & SAP HANA
Innovating to Real-Time using SAP BusinessObjects & SAP HANA
 
Self-service BI for SAP and HANA – Dream or Reality?
Self-service BI for SAP and HANA – Dream or Reality?Self-service BI for SAP and HANA – Dream or Reality?
Self-service BI for SAP and HANA – Dream or Reality?
 
Baan
BaanBaan
Baan
 
PayPal Real Time Analytics
PayPal  Real Time AnalyticsPayPal  Real Time Analytics
PayPal Real Time Analytics
 
Cio forum s4hana
Cio forum s4hanaCio forum s4hana
Cio forum s4hana
 
SAP C4C overview
SAP C4C overviewSAP C4C overview
SAP C4C overview
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
Hadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache GiraphHadoop Graph Processing with Apache Giraph
Hadoop Graph Processing with Apache Giraph
 
Big data, Behavioral Change and IOT Architecture
Big data, Behavioral Change and IOT ArchitectureBig data, Behavioral Change and IOT Architecture
Big data, Behavioral Change and IOT Architecture
 
Software Consultancy (CRM-ERP-EPM-SCM-SOCIAL CLOUD)
Software Consultancy (CRM-ERP-EPM-SCM-SOCIAL CLOUD)Software Consultancy (CRM-ERP-EPM-SCM-SOCIAL CLOUD)
Software Consultancy (CRM-ERP-EPM-SCM-SOCIAL CLOUD)
 
eCommerce and ePayments markets in Russia : trends , analytics , perspect...
eCommerce and  ePayments markets in  Russia :  trends ,  analytics , perspect...eCommerce and  ePayments markets in  Russia :  trends ,  analytics , perspect...
eCommerce and ePayments markets in Russia : trends , analytics , perspect...
 
Sap hybris overview
Sap hybris overviewSap hybris overview
Sap hybris overview
 

Ähnlich wie PayPal Behavioral Analytics on Hadoop

Ad ecosystem-slides
Ad ecosystem-slidesAd ecosystem-slides
Ad ecosystem-slides
Eric Picard
 
Cloud Service Providers and OpenStack
Cloud Service Providers and OpenStackCloud Service Providers and OpenStack
Cloud Service Providers and OpenStack
Open Stack
 
1. sugarcrm social crm editions comparison 2011
1. sugarcrm social crm editions comparison 20111. sugarcrm social crm editions comparison 2011
1. sugarcrm social crm editions comparison 2011
Friedel Jonker
 
Innerworkings Pitch - Think Small to Get Big 3-4-13
Innerworkings Pitch - Think Small to Get Big 3-4-13Innerworkings Pitch - Think Small to Get Big 3-4-13
Innerworkings Pitch - Think Small to Get Big 3-4-13
PrestonPate
 
3 forrester - tag management state of the union
3   forrester - tag management state of the union3   forrester - tag management state of the union
3 forrester - tag management state of the union
Ensighten
 
Mastering the customer engagement ecosystem with CQ5
Mastering the customer engagement ecosystem with CQ5Mastering the customer engagement ecosystem with CQ5
Mastering the customer engagement ecosystem with CQ5
Lars Trieloff
 
Introduction Force.com-Platform / Salesforce.com
Introduction Force.com-Platform / Salesforce.comIntroduction Force.com-Platform / Salesforce.com
Introduction Force.com-Platform / Salesforce.com
Aptly GmbH
 
Online Business : Optimizing your sales channels, AT INTERNET eCommretail Li...
Online Business : Optimizing  your sales channels, AT INTERNET eCommretail Li...Online Business : Optimizing  your sales channels, AT INTERNET eCommretail Li...
Online Business : Optimizing your sales channels, AT INTERNET eCommretail Li...
AT Internet
 
Dm arts d1-workshop-steffen ehrhardt-google-innovations in display
Dm arts d1-workshop-steffen ehrhardt-google-innovations in displayDm arts d1-workshop-steffen ehrhardt-google-innovations in display
Dm arts d1-workshop-steffen ehrhardt-google-innovations in display
Digital Marketing Arts
 

Ähnlich wie PayPal Behavioral Analytics on Hadoop (20)

How Hansa Cequity can help you enrich your Customer Equity?
How Hansa Cequity can help you enrich your Customer Equity?How Hansa Cequity can help you enrich your Customer Equity?
How Hansa Cequity can help you enrich your Customer Equity?
 
DPS: Operative Spotlight on the Changing Face of Digital Publishing Operations
DPS: Operative Spotlight on the Changing Face of Digital Publishing OperationsDPS: Operative Spotlight on the Changing Face of Digital Publishing Operations
DPS: Operative Spotlight on the Changing Face of Digital Publishing Operations
 
Ad ecosystem-slides
Ad ecosystem-slidesAd ecosystem-slides
Ad ecosystem-slides
 
Evolving analytics at ebay - 2012 Tableau Customer Conference
Evolving analytics at ebay - 2012 Tableau Customer ConferenceEvolving analytics at ebay - 2012 Tableau Customer Conference
Evolving analytics at ebay - 2012 Tableau Customer Conference
 
Microsoft Media Platform Overview
Microsoft Media Platform OverviewMicrosoft Media Platform Overview
Microsoft Media Platform Overview
 
Cloud Service Providers and OpenStack
Cloud Service Providers and OpenStackCloud Service Providers and OpenStack
Cloud Service Providers and OpenStack
 
1. sugarcrm social crm editions comparison 2011
1. sugarcrm social crm editions comparison 20111. sugarcrm social crm editions comparison 2011
1. sugarcrm social crm editions comparison 2011
 
Innerworkings Pitch - Think Small to Get Big 3-4-13
Innerworkings Pitch - Think Small to Get Big 3-4-13Innerworkings Pitch - Think Small to Get Big 3-4-13
Innerworkings Pitch - Think Small to Get Big 3-4-13
 
Java micro-services
Java micro-servicesJava micro-services
Java micro-services
 
The power of digital CRM
The power of digital CRMThe power of digital CRM
The power of digital CRM
 
3 forrester - tag management state of the union
3   forrester - tag management state of the union3   forrester - tag management state of the union
3 forrester - tag management state of the union
 
GRS Market Research
GRS Market ResearchGRS Market Research
GRS Market Research
 
About Our Recommender System
About Our Recommender SystemAbout Our Recommender System
About Our Recommender System
 
Mastering the customer engagement ecosystem with CQ5
Mastering the customer engagement ecosystem with CQ5Mastering the customer engagement ecosystem with CQ5
Mastering the customer engagement ecosystem with CQ5
 
Introduction Force.com-Platform / Salesforce.com
Introduction Force.com-Platform / Salesforce.comIntroduction Force.com-Platform / Salesforce.com
Introduction Force.com-Platform / Salesforce.com
 
Online Business : Optimizing your sales channels, AT INTERNET eCommretail Li...
Online Business : Optimizing  your sales channels, AT INTERNET eCommretail Li...Online Business : Optimizing  your sales channels, AT INTERNET eCommretail Li...
Online Business : Optimizing your sales channels, AT INTERNET eCommretail Li...
 
Testing solutions for internet industry.
Testing solutions for internet industry.Testing solutions for internet industry.
Testing solutions for internet industry.
 
The Digital Intelligence Imperative — Driving Digital Customer Experiences W...
 The Digital Intelligence Imperative — Driving Digital Customer Experiences W... The Digital Intelligence Imperative — Driving Digital Customer Experiences W...
The Digital Intelligence Imperative — Driving Digital Customer Experiences W...
 
Admonsters OPS Mobile Keynote Presentation
Admonsters OPS Mobile Keynote PresentationAdmonsters OPS Mobile Keynote Presentation
Admonsters OPS Mobile Keynote Presentation
 
Dm arts d1-workshop-steffen ehrhardt-google-innovations in display
Dm arts d1-workshop-steffen ehrhardt-google-innovations in displayDm arts d1-workshop-steffen ehrhardt-google-innovations in display
Dm arts d1-workshop-steffen ehrhardt-google-innovations in display
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

PayPal Behavioral Analytics on Hadoop

  • 1. PAYPAL - BEHAVIORAL TRACKING ON HADOOP ANIL MADAN DIRECTOR OF ENGINEERING , MARKETING & ANALYTICS
  • 2. PAYPAL'S VISION Delivering the future of money today… An essential part of our customer's financial and business lives, enabling secure commerce anywhere, anytime, any way 110 million active accounts , 190 markets , 25 currencies 2
  • 3. BEHAVIORAL TRACKING VISION Understand our anytime, anywhere, any way to drive desirable customer’s behavior outcomes for our and experience customers and for PayPal. Enable self-service analytics for our product and Ensure marketing teams Ensure privacy, instrumentation security and trust standardization 3 for our customers across channels 3
  • 4. TRACKING PLATFORM OVERVIEW Direct/ Transaction Email Display Search Home Page Emails Marketing Advertising Engine Marketing Metadata Tracking Servers Real Time Systems Tracking Metadata Marketing Tracking Event Service Tool Segmentation Tracking Validation Tag Taxonomy Service Experimentation Catalog Big Data Reporting/Visualization Digital Metrics Attribution 4
  • 5. METADATA - ENTITY MODEL LAYOUT PAGE ELEMENTS LINK COMPONENTS 5
  • 6. METADATA - EVENT MODEL Tracking Event Impression Reaction Conversion Event Event Event Component Page Ad Click Click-Through Mouse-over Impression Impression Impression Event Event Event Event Event Event Client Page Server Page Entry Exit Impression Impression Event Event Event Event 6
  • 7. ATTRIBUTION MODEL Channel Impression Click Open Client Server Direct ✓ ✓ Organic Search ✓ Paid Search ✓ Display Offers ✓ ✓ Onsite Offers ✓ ✓ Transactional Emails ✓ ✓ Marketing Emails ✓ ✓ 7
  • 8. LOGICAL ARCHITECTURE Onsite Channels Marketing Channels Mobile Search Display Web Tracking Social Email Onsite Instrumentation Engine Advertising JS Marketing Marketing Marketing API Marketing Instrumentation Tracking Tracking Message Delivery Services Metadata Servers Service Tool Marketing Segmentation Active MQ Offers Service Producer Service Tracking Metadata Active Service MQ Hadoop Cluster Tracking Active MQ Active MQ Collector Consumer Consumer Customer Operational Intelligence Metrics Metadata Tag Repository Catalog NAS Filer NAS Filer Behavioral Intelligence Reporting Aggregation/ Sessionization Identity Tracking Compression Bot Flagging Mapping Batch 8
  • 9. DATA INGEST PIPELINE Raw Event PRE-PROCESS Map Reduce Map Reduce Gzip Text Deduped Enriched Validate/ Event Join Client & Event Dedup Events Gzip block Server Events Gzip block compressed compressed Raw Event SequenceFile SequenceFile Gzip Text CHAIN REDUCER SESSIONIZATION Map Reduce Mapper Mapper Enriched Sessionization Geo Lookup Bot Flagging Sessions Event Geo Bot Data/ Data Rules Map Reduce Map Reduce Behavioral Reporting GENERATION Sessions Stage 1 Stage 2 Metrics MySQL METRICS Pig Enriched Event Adhoc Metrics
  • 10. SESSIONIZATION Events VisitContainer Visitor Session Timestamp Event Visitor Session Payload ID ID Payload ID ID V1 S1 ie, winnt, {flash, quicktime}, V1 S1 2012-05-24 E1 {ca, usa}, 480 secs,…. 05:12 E1 V2 S2 2012-05-24 E2 05:14 E3 V1 S1 2012-05-24 E3 E4 05:15 V2 S2 ff, winxp, {acrobat, V1 S1 2012-05-24 E4 mediaplayer}. {wb, in}, 420 05:20 secs….. V2 S2 2012-05-24 E5 E2 05:21 E5 V1 S3 2012-05-24 E6 07:25 V1 S3 sf, macos, {quicktime, java}, {on, ca}, 60 secs V1 S3 2012-05-24 E7 07:26 E6 E7 •  Chronologically sort events using secondary sort •  SortComparator on visitorid, sessionid and timestamp •  Partitioner & Grouping comparator on visitorid and sessionid •  Normalize data and store it against the session record 10 •  Browser, os, plugins, geo-location, duration, bot-flag etc.
  • 11. DIMENSIONS & METRICS Dimension Metrics Page Visitors PageFlow Sessions Country Bounce Rate CountryRegion Page Views Plugins VisitDepth VisitDuration Time Period VisitByHour Hourly SearchEngine Daily OS Weekly Browser Monthly 11
  • 12. METRICS GENERATION Mapper Input Mapper Output Reducer Output Visitor Session Browser Key Value ID ID (visitorid, (#sessions) Key Value browser) (visitorid, (#sessions) browser) Compute V1 S1 IE V1,IE 1 V1,IE 2 sessions sorted V1 S2 IE V1,IE 1 by visitor, V2,FF 1 dimension V2 S3 FF V2,FF 1 STAGE 1 V3,IE 1 (browser) V3 S4 IE V3,IE 1 V4,FF 1 V4 S5 FF V4,FF 1 Mapper Input Mapper Output Key Value Key Value Reducer Output Compute (visitorid, (#sessions) (browser) (#sessions, metrics browser) #visitors) Key Value (browser) (#sessions, by #visitors) dimension V1,IE 2 IE 2,1 IE 4,3 V2,FF 1 IE 1,1 STAGE 2 FF 1,1 V3,IE 1 FF 1,1 V4,FF 1 IE 1,1 12
  • 13. PIG – ADHOC QUERIES /* EventLoader - custom loader ; Exposes correct data-types using metadata for each field*/ grunt> data = LOAD '/paypal/event' USING >> com.paypal.EventLoader( >> 'visitor_id, session_id, page_name, event_type, event_timestamp'); grunt> describe data; data: {visitor_id: chararray, session_id: chararray, page_name: chararray, event_type: chararray, event_timestamp: long } grunt> events = FILTER data BY event_timestamp >= 1337583600000L and event_timestamp < 1337587200000L; grunt> grouped = group events by (page_name, event_type) parallel 20; grunt> result = foreach grouped { >> visitors = distinct events.visitor_id; >> sessions = distinct events.session_id; >> generate group, COUNT(visitors), COUNT(sessions), COUNT(events); >> }; grunt> dump result; ((My Account Overview, im), 117875L,119343L,230216L) ((mktg:xsell:merchant::home-inside, im), 462L,466L,655L) 13
  • 14. PIG – ADHOC QUERIES /* VisitContainerLoader custom loader - Tuple ( Tuple, Bag (Tuple) )*/ grunt> data = LOAD '/paypal/visitcontainer' >> USING com.paypal.VisitContainerLoader( >> '{"visit":["visitor_id",”session_id","session_start", "session_end", "browser_type"], "events":["page_name", "event_type"]}'); grunt> describe data; data: {visit: (visitor_id: chararray, session_id: chararray, session_start: long, session_end: long, browser_type: chararray), events: {event: (page_name: chararray, event_type: chararray)}} grunt> flattened = foreach data generate FLATTEN(visit), FLATTEN(events); grunt> impression = FILTER flattened BY event_type == 'im' and session_start >= 1339045200000L and session_end < 1339063200000L; grunt> grouped = group impression by (page_name, browser_type) parallel 20; grunt> result = foreach grouped { >> visitors = distinct impression.visitor_id; >> sessions = distinct impression.session_id; >> generate group, COUNT(visitors), COUNT(sessions), COUNT(impression); >> }; grunt> dump result; ((Account History:Request Money Details, chrome), 522L,528L,726L) 14 ((Account History:Request Money Details, msie), 706L,716L,967L)
  • 15. REPORTING 15
  • 16. THANK YOU We Are Hiring! •  San Jose •  Boston •  Bangalore •  Shanghai
  • 17. Sessions will resume at 4:30pm Page 17