SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Bullseye P13n Platform
April 7, 2014
Charles Bracher
Bullseye Dev Manager
Ranjan Sinha, PhD
Lead Research Scientist
Bullseye	
  
Outline
P13n Platform
Why Cassandra?
Cassandra Setup
Cassandra Usage
Cassandra Issues and Resolutions
Hand over to Ranjan for the Data Science Perspective
Bullseye
Bullseye
Bullseye Functional Architecture
Offline AnalysisOffline Database/
Batch Processing
Recent User Data
1-5 days
(Cassandra)
Real Time Model
Evaluation & Caching
(sharded/full user state
in memory)
Client
Access
Near Real Time
Event Collection
Tracking
Long Term
User Data
(Local SSD)
Why Cassandra?
Great write performance
Great replication performance
Reasonable read performance
Reasonable cost
Client controlled consistency settings
Bullseye
Cassandra Setup
Cassandra Version 1.2.9
We use Replication
–  Cassandra rings deployed to 3 datacenters
Cassandra clients
–  We use both the Datastax Java and C++ Beta clients
Using CQL Table specifications and commands
Not on SSDs
Bullseye
Cassandra Usage
Column Family Design:
– Avoid Tombstones
– Avoid Compaction
With Focus on Short Term Storage:
– Turn off automatic compaction / only manual compaction
– Use unique column key names to avoid tombstones
– Clear out old data with truncation
Bullseye
Cache Miss Flow (New Session)
Bullseye
CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT,
RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME));
Write to active day column family with key user id.
Truncate the oldest day column family.
When going from one day to the next, do a manual compaction for the old day.
On read, pull user id info from all col. families newer than the local SSD data.
Queuing Flow (Ongoing Activity)
Bullseye
CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT,
RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME));
Read/Write from active hour with key timestamp rounded to nearest second
Store the column family one hour old to offline DB
Truncate the column family two hours old
Do async probe of record for current second as well as recent seconds till
state is captured. Data may be read 1-3 times. More if replication is lagging.
Cassandra Issues and Resolutions
Issues with C++ Datastax Cassandra beta client
– open sourced, so could apply fixes
Performance issues with the cache miss query
– increased heap size
– reduced replication factor
– turned off cross colo read repair
– deployed data center aware policy for C++
Bullseye
Personalization Applications
Ranjan Sinha, PhD
Lead Research Scientist
April 7, 2014
Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.
Outline
Why Personalize?
P13N Platform
– Introduction
– Conceptual architecture
– Modeling stages
P13N Applications
– User badges
– Search ranking
– Contextual models
– Deals
Personalization Applications
Why Personalize?
Enable more relevant experience
Retention of existing users
New user acquisition
Reactivating churned users
Increasing activity per user
Improving conversion from visits to transactions
Personalization Applications
P13N Platform: Introduction
Maintains activity timeline information
Enables event processing at near real-time
Enables in-session personalization
Provides environment for predictive model evaluation
Backup and restore to and from Hadoop/HBase
Personalization Applications
P13N Platform: Conceptual Architecture
Personalization Applications
Tracking Event
Source
m1 m3m2

.
Model Executor
Filters and forwards
events
Activity
Timeline
+
User Badges
In-memory
Cache +
Model
Evaluation
CEP Processor
Client Access
Hadoop/
HBase
Offline Modeling
Platform
User Badges
mn
Cassandra
P13N Platform: Modeling stages
Realtime
– In-session user intent
– Contextual Models
Nearline
– Update propensity models (aka User Badges)
Offline
– Bootstrap propensity models by mining long-term behavior history
Personalization Applications
Application (1): User Badges
Personalization Applications
Name Description
SaleType Auction vs. Buy-it-now
ItemCondition New vs. Used
Category Preference of categories
Price Price range of purchasing activity
Deals Propensity to purchase deals
Social Share Propensity to share items in social media
Profile based on long-term behavior
Application (2): Search Ranking 

Should all queries be personalized in the same manner?
– For some queries (ebay or google), everyone would like the same results
– For other queries, different people may want completely different results
Personalization Applications
Query: “big ben puzzles”
Not_P13N
Rank
P13N
Rank
Sold IsNew Title
1 1 No No
LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500
PUZZLES EUC
2 3 No Yes
1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan
North Cascades National Park WA
3 2 Yes No
COMPLETE Fishing Village,Smalls Island MB Big
Ben Puzzle 1000 Piece Puzzle Size!
User: always buys used items
Application (3): Contextual models 

Personalization Applications
Infer categories that user is interested in within the current session
Long and Short term behavior
– Historic behavior may provide benefits at the start of the session
– Short-term behavior may contribute gains in an extended search session
– Combination of session and historic behavior may outperform using either alone
e2
t
Nearline, after session expiry
Online, in-session
Offline, historical
e3e1 
events
 e1
Event
source
Application (4): Deals
Personalization Applications
Personalize
categories
Personalize
modules
Personalize
tabs
Personalize
items
fin
Personalization Applications

Weitere Àhnliche Inhalte

Mehr von DataStax Academy

Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra DriverDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core ConceptsDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 

Mehr von DataStax Academy (20)

Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

KĂŒrzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

KĂŒrzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

  • 1. Bullseye P13n Platform April 7, 2014 Charles Bracher Bullseye Dev Manager Ranjan Sinha, PhD Lead Research Scientist Bullseye  
  • 2. Outline P13n Platform Why Cassandra? Cassandra Setup Cassandra Usage Cassandra Issues and Resolutions Hand over to Ranjan for the Data Science Perspective Bullseye
  • 3. Bullseye Bullseye Functional Architecture Offline AnalysisOffline Database/ Batch Processing Recent User Data 1-5 days (Cassandra) Real Time Model Evaluation & Caching (sharded/full user state in memory) Client Access Near Real Time Event Collection Tracking Long Term User Data (Local SSD)
  • 4. Why Cassandra? Great write performance Great replication performance Reasonable read performance Reasonable cost Client controlled consistency settings Bullseye
  • 5. Cassandra Setup Cassandra Version 1.2.9 We use Replication –  Cassandra rings deployed to 3 datacenters Cassandra clients –  We use both the Datastax Java and C++ Beta clients Using CQL Table specifications and commands Not on SSDs Bullseye
  • 6. Cassandra Usage Column Family Design: – Avoid Tombstones – Avoid Compaction With Focus on Short Term Storage: – Turn off automatic compaction / only manual compaction – Use unique column key names to avoid tombstones – Clear out old data with truncation Bullseye
  • 7. Cache Miss Flow (New Session) Bullseye CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME)); Write to active day column family with key user id. Truncate the oldest day column family. When going from one day to the next, do a manual compaction for the old day. On read, pull user id info from all col. families newer than the local SSD data.
  • 8. Queuing Flow (Ongoing Activity) Bullseye CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME)); Read/Write from active hour with key timestamp rounded to nearest second Store the column family one hour old to offline DB Truncate the column family two hours old Do async probe of record for current second as well as recent seconds till state is captured. Data may be read 1-3 times. More if replication is lagging.
  • 9. Cassandra Issues and Resolutions Issues with C++ Datastax Cassandra beta client – open sourced, so could apply fixes Performance issues with the cache miss query – increased heap size – reduced replication factor – turned off cross colo read repair – deployed data center aware policy for C++ Bullseye
  • 10. Personalization Applications Ranjan Sinha, PhD Lead Research Scientist April 7, 2014 Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.
  • 11. Outline Why Personalize? P13N Platform – Introduction – Conceptual architecture – Modeling stages P13N Applications – User badges – Search ranking – Contextual models – Deals Personalization Applications
  • 12. Why Personalize? Enable more relevant experience Retention of existing users New user acquisition Reactivating churned users Increasing activity per user Improving conversion from visits to transactions Personalization Applications
  • 13. P13N Platform: Introduction Maintains activity timeline information Enables event processing at near real-time Enables in-session personalization Provides environment for predictive model evaluation Backup and restore to and from Hadoop/HBase Personalization Applications
  • 14. P13N Platform: Conceptual Architecture Personalization Applications Tracking Event Source m1 m3m2 
. Model Executor Filters and forwards events Activity Timeline + User Badges In-memory Cache + Model Evaluation CEP Processor Client Access Hadoop/ HBase Offline Modeling Platform User Badges mn Cassandra
  • 15. P13N Platform: Modeling stages Realtime – In-session user intent – Contextual Models Nearline – Update propensity models (aka User Badges) Offline – Bootstrap propensity models by mining long-term behavior history Personalization Applications
  • 16. Application (1): User Badges Personalization Applications Name Description SaleType Auction vs. Buy-it-now ItemCondition New vs. Used Category Preference of categories Price Price range of purchasing activity Deals Propensity to purchase deals Social Share Propensity to share items in social media Profile based on long-term behavior
  • 17. Application (2): Search Ranking 
 Should all queries be personalized in the same manner? – For some queries (ebay or google), everyone would like the same results – For other queries, different people may want completely different results Personalization Applications Query: “big ben puzzles” Not_P13N Rank P13N Rank Sold IsNew Title 1 1 No No LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500 PUZZLES EUC 2 3 No Yes 1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan North Cascades National Park WA 3 2 Yes No COMPLETE Fishing Village,Smalls Island MB Big Ben Puzzle 1000 Piece Puzzle Size! User: always buys used items
  • 18. Application (3): Contextual models 
 Personalization Applications Infer categories that user is interested in within the current session Long and Short term behavior – Historic behavior may provide benefits at the start of the session – Short-term behavior may contribute gains in an extended search session – Combination of session and historic behavior may outperform using either alone e2 t Nearline, after session expiry Online, in-session Offline, historical e3e1 
events
 e1 Event source
  • 19. Application (4): Deals Personalization Applications Personalize categories Personalize modules Personalize tabs Personalize items