B2B sales intelligence has become an integral part of LinkedIn’s business to help companies optimize resource allocation and design effective sales and marketing strategies. This new trend of data-driven approaches has “sparked” a new wave of AI and ML needs in companies large and small. Given the tremendous complexity that arises from the multitude of business needs across different verticals and product lines, Apache Spark, with its rich machine learning libraries, scalable data processing engine and developer-friendly APIs, has been proven to be a great fit for delivering such intelligence at scale.
See how Linkedin is utilizing Spark for building sales intelligence products. This session will introduce a comprehensive B2B intelligence system built on top of various open source stacks. The system puts advanced data science to work in a dynamic and complex scenario, in an easily controllable and interpretable way. Balancing flexibility and complexity, the system can deal with various problems in a unified manner and yield actionable insights to empower successful business. You will also learn about some impactful Spark-ML powered applications such as prospect prediction and prioritization, churn prediction, model interpretation, as well as challenges and lessons learned at LinkedIn while building such platform.
1. Songtao Guo, Wei Di
Transforming B2B Sales
with Spark Powered Sales
Intelligence
2. Songtao Guo
Sr. Staff & Tech Lead
Data Analytics Data Mining at
LinkedIn
soguo@linkedin.com
Wei Di
Staff Data Scientist
Data Analytics Data Mining at
LinkedIn
wdi@linkedin.com
3. Outline
Overview of
Sales
Intelligence
❏ B2B landscape
❏ Challenges
❏ One-stop solution overview
❏ Derived insightful features from LinkedIn Graph
❏ Centralized Data Mart
❏ B2B Intelligent Engine
❏ Model learning
❏ Model reasoning
❏ Model management
B2B Intelligent
Framework
Overview &
Details
❏ Problem definition
❏ Labelling Logic & Interpretation
❏ Performance & Lessons learned
Case Study
4. Demand
Generation
Seal the
deal
Onboard &
Retention
Business
Growth
Business
Needs
B2B
Predictive
Modeling
B2B Analytics Landscape
How to prioritize marketing leads?
How much is the potential spending from a
client if acquired?
Who are more likely to become customers?
Why?
What is the best channel to acquire?
What is the market segmentation? How are new customers onboarding?
How are the customers engaged with
products based on different segments?
Which customer is going to renew less, why
and what can we do?
Which customer has upsell potential, why
and what product to buy?
Acquire New Customers Empower Existing Customers
5. Challenges of B2B Data & Application
● Data quality issues
○ Incomplete, sparse, noisy and dynamic over time
○ Missing historical data
● Lack of centralized data covering various needs
● Unclear source of truth
● From score to actionable insights
● Unify existing solutions
○ Multiple owners
○ Multiple predictive scores exist for similar purposes
○ Inconsistent quality
● Scale model building
6. Change Landscape with Spark
Simplicity is the ultimate sophistication.
Leonardo da Vinci
● Chaos of data
● Redundant/repeating work
● No monitoring for feature evolvement
and model degradation
● No scalability and slow iteration
The Old Landscape
One Stop Solution
● Semi-automatic with flexibility of self-
configuration
● Scalable and fast iteration
● Monitoring with closed feedback loop
The New Paradigm
MLlib
8. Data Layer
Feature Mart Feature Management Label Management
• Provide a central feature mart that serves as source
of truth for key account level metrics
• Scale feature generation
• Ensure high reliability and consistency (source
control, peer-review)
• Cover all the company entities and sufficient long time
series
• A standardized feature onboarding process
• Unified metastore to support feature governance
and feature application
• Feature search and feature profiling (Spark SQL)
• Standardize problem definition
• Scale label preparation
• Gold dataset management
• Support continuous performance
monitoring
Feature Mart
Canonical Data
Sources
Member, Company,
Job, Tracking,
SFDC, ...
Service Log
Prioritizer,...
External Data
Sources
SEC, ...
Label Mart
Source Data
label generation
pipelines
Human Labeler
Feature Feature Batch derived
labels
Survey
labels
Feature Metastore
9. Intelligence Layer
Workflows Feature Engineering Algorithms
• Highly configurable model building
Azkaban workflow
Spark MLPipelines
Spark ParamGridBuider
• Highly configurable scoring Azkaban
workflow
• Integrate features from various of sources:
company level feature mart and user
experimental features
• Support feature representation for different
entity types
• Adapt different feature granularities: daily,
weekly, monthly, etc.
• Regression
RandomForest, GradientBoostedTree,
XGBoostTree, LinearRegression
• Classification
DecisionTree, RandomForest,
GradientBoostedTree, XGBoostTree,
LogisticRegression,
MultilayerPerceptronClassification
Intelligence Engine
Model Selection
Score Reasoning
Feature Grouping
Feature Integration
Feature Engineering
Model Validation Model Chaining
Model Ensemble
Model Parameter Tuning
11. Outline
Overview of
Sales
Intelligence
❏ B2B landscape
❏ Challenges
❏ One-stop solution overview
❏ Derived insightful features from LinkedIn Graph
❏ Centralized Data Mart
❏ B2B Intelligent Engine
❏ Model learning
❏ Model reasoning
❏ Model management
B2B Intelligent
Framework
Overview &
Details
❏ Problem definition
❏ Labelling Logic & Interpretation
❏ Performance & Lessons learned
Case Study
12. Derive Features from IN Graph
Activity
SocialMember
Profile
Company
Growth
Product
Booking
Product
Usage
Product
Whitespace
Product
Performance
Company
Profile
Sales
Marketer
Decision maker
Recruiter
Sales
Talent
professional
Decision maker
Sales
Sales
Employee
13. Centralized Feature/Label Mart
Feature Coverage:
● derived company level knowledge
● generic member/company level
● internal & external data sources
● monthly/daily granularity
● region/functionality segmentation
● enriched meta information
● traceback to years early
Feature Management:
● feature profiling
● visualized monitoring system
● whitelist/blacklist features as
needed
● self-serve onboarding platform
Label Management
● Unified label generation logic
for same type of problems
across different verticals
● Vertical independent label
generation pipeline
14. B2B Intelligence Engine
Label
Data
New
Data
Meta &
Data Store
Feature
Engineering
Feature
Integration/Select
Intelligence
Engine
Train/Validation/Test
Dataset
Model
Interpretation
Model Building
Feature
Engineering
Scoring
Model Management &
Update
New Model
Previous
Models
Predictive
Solution
Production Deployment
Reasoning
Model Building
● leverage Spark/ML: wide
coverage of modeling approaches
for regression, classification, and
clustering
● do parallel multi-level model
training
● fast iteration
● self-serve configuration/tuning
● various evaluation methods cater
to business requirement
15. Model Interpretation
Master Model Component Models
SOCIAL
E-LEARNING
MARKETING
AFFILIATION
ENGAGEMENT
SPENDING
GROWTH
PRODUCT
USAGE
(low)
Provide high-level overview and insights
through component scores
Deep dive to
finer-level
predictors
Estimate overall
conversion
probability from
master score
● Monthly active seats
● Avg days to onboard
● Average CRM usage
● Mobile view ratio
● Percent of active seat-
holders who use advanced
features
● Percent of active seat-
holders who imported sf
contacts
…...
Low
High
Low
Medium
Medium
Medium
+
16. Model Management/Monitoring
● Monitor both feature/model performance
changes over time
● Centralized model repo with standard format
● Feed in new training data to generate
“challenger models” to compete
● Pay attention to feature upstream change
● Business customers evolve
dynamically
● Products update periodically
● Performance degradation
● Failure/Outlier examples
● Feature statistics over time:
○ non-null count, sum, medium
○ coefficient of variation for
volatility evaluation
● Model refresh
● Feature diagnosis
feature
monitoring
performance
monitoring
inherent temporal nature
17. Outline
Overview of
Sales
Intelligence
❏ B2B landscape
❏ Challenges
❏ One-stop solution overview
❏ Derived insightful features from LinkedIn Graph
❏ Centralized Data Mart
❏ B2B Intelligent Engine
❏ Model learning
❏ Model reasoning
❏ Model management
B2B Intelligent
Framework
Overview &
Details
❏ Problem definition
❏ Labelling Logic & Interpretation
❏ Performance & Lessons learned
Case Study
18. Case Study
Account Propensity Score
(APS)
Upsell Propensity Score
(UPS)
definition
Which enterprise accounts have higher chances
of buying the product & why?
Which existing enterprise customers have
upsell potential & why ?
common
● business varies significant across region/product
● data evolves dynamically, time series events
● binary propensity model
● actionable insights on dissect aspects
● handle ‘confuse’ samples correctly (p = 0.5?)
differences
● data sparse, noisy
● region level, regions that are very small need
to borrow information from other regions/
global
● score accuracy is important for the whole
spectrum
● evaluation on all tiers
● data is more reliable given rich product
usage features
● product level
● score accuracy is important for the top
ones
● evaluation on recommended potentials
p ( rmaster | f,..f )
w1 * p ( rcompo. |
f1,..fq )
w2* p ( rcompo. | fq+1,..f
q+n) ….
19. Label Generation: APS
Label is defined at (account + region/etc) level
● Positive – 1: closed won opportunity.
● Negative – 0: closed disengaged opportunity
Opportunity evolves: time series events
● lookback from opportunity creation time
● find explicit negatives
● try until won: flip opportunities
● differentiate mid vs. year-end modeling
re-touch
label
20. Label Generation: UPSLabel is defined at (account + product) level for two
scenarios: Add-On and Renewal
● Positives – 1: increased sell & continue contract
● Negatives – 0: other cases
● lookback from upsell creation time
● focus on good quality positives
● consider contract renewal/churn for
add-on case
$500
+$300
Contract 1 Churned
$500 $500 upRenewal
scenario
$500
$500 and
less
Contract 1
Contract 1
label
Add-On
scenario
$500
+$600
$500
-$200
Contract 1
Contract 1
Renewed
Renewed
21. Fast growth reflects potential demands,
find out how our product can help the
customer further grow and increase
social selling & affinity.
Upsell Propensity Score
Acct. A Acct. B Acct. C
Score 90 75 30
Comp. &
Growth
Fast member
growth
Hot industry Edu./Gov
Booking Product A:
$1000
Product B:
$500
Product B:
$700
Usage Highly Engaged Engaged Less Engaged
Perf. Good hiring
perf.
White Space $2,000 $500 $100
Modeling & Interpretation
Account Propensity Score
Acct. A Acct. B Acct. C
Score 92 65 30
Comp. &
Growth
Top Industry Fast Growth Decline
Affinity High Medium Low
Booking High Medium low Low
Social Selling High Medium High Medium low
Better onboarding & training to
boost product usage and
performance.
22. Model Performance & Business Impact
Model numerical validation
• Account Propensity model: AUROC of 0.64; High vs. Avg: 1.7x;
• Product Upsell model: AUROC of 0.84; High vs. Avg: 2.7x; Medium vs. Avg: 1.7x.
Business impact (Prioritization & Marketing)
• Account Propensity: 17Q1: 5.3pp win-rate boost compared to no prioritization.
• Product Upsell:
• Marketing: 2.0x open rate, and 3.3x CTR compared to baseline.
• SMB sales: generated 22% ~ 46% more opportunities with customers.
Business Impact
High Medium
High
Medium Low Avg
1.7x likely to be
upsold than avg.
2.7x likely
to be upsold
than avg.
Account
Propensity
Product
Upsell
Tier 1 Tier 2 Tier 3 Avg
1.7x likely
to be won
than avg.
23. Summary
End-to-end B2B solution
● Centralized large-scale data & advanced intelligence
● Speed up individual model development cycles
● Drive monetization impact and business performance
24. Thank You.
We are Hiring!
Songtao Guo, Wei Di
soguo,wdi@linkedin.com