SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Data Refinement:
The missing link between data collection and decisions

Stephen H. Yu
Data Strategy & Analytics Consultant
What we will cover
•
•
•
•
•
•

Database Marketing Landscape
Analytics and Models
“Model-Ready” Environment
Data Summarization & Categorization
Delivery: Scoring & QC
Closing the Loop

1
Big Data, Small Data, Neat Data, Messy Data
How is the "Big Data” working out for you?
2.5 quintillion bytes collected per “day”
1 quintillion (exabytes) = 1 billion gigabytes

• Did all this data improve your decision making process?
• Do you have the results to show for?
• Information Overload? You bet!
 Harness insights, drop the noise

2
Database Marketing Landscape
• No guessing game – You MUST know your target
• Vast amount of online & offline data collected 
But are they being used properly?
• Analytics play a huge roles in prospecting & CRM
• Short paced marketing cycle getting shorter
• Huge difference between advanced marketers
and those who are falling behind

Winners are the ones who know how to wield the power
of all available data faster.

3
Insights, not Raw Data
• Database marketers must excel in:

Collection

Refinement

Delivery

Size & speed
matters

Get to answers, not
just ingredients

For consumption
by end-users

• Must provide “marketing answers” via advanced analytics,
not just bits and pieces of data
• Insight does not come from data, it is derived from data

4
Refined Answers
Raw Data

Marketing Answers

•
•
•
•
•
•
•
•
•

• Likely to buy a luxury car
• Likely to take a foreign vacation
• Likely to response to free
shipping offer
• Likely to be a high value customer
• Likely to be qualified for credit
• Likely to upgrade
• Likely to leave
• Likely to come back

Demographic / Firmographic
RFM
Products & Services Used
Promotion / Response History
Lifestyle / Survey Responses
Delinquent history
Call / Communication Log
Movement Data
Sentiments

5
Different Types of Analytics
“Analytics” means different things…
• BI (Business Intelligence) Reporting: Display of success metrics, dashboard
reporting
• Descriptive Analytics: Profiling, segmentation, clustering
• Predictive Modeling: Response models, cloning Models, lifetime
value, revenue models, etc.
• Optimization: Channel optimization, marketing spending
analysis, econometrics models

Predictive Modeling for 1-to-1 Marketing

6
Why model?
• Increase Targeting Accuracy
• Reduce costs by contacting less/smart
• Stay relevant
• Consistent results
• Reveal hidden patterns in data

• Repeatable – key for automation
• Expandable
• “Supposedly” save time and effort
Models summarize complex data into simple-to-use “scores”
7
Why NOT model?

• Universe is too small
• Predictable data not available
• 1-to-1 marketing channels not in plan
• Tight budget
• Lack of resources

8
Custom Models for every stage of Marketing Lifecycle

9
Any pain implementing models?
• Not easy to find “Best” customers

• Modelers are fixing data all the time
• Rely on a few popular variables
• Always need more variables

• Takes too long to build models and score
• Inconsistencies shown when scored
• Disappointing results

10
What does your database support?
If you have a database…
• Order Fulfillment
• Contact Management
• Standard Reports
• Ad hoc Reports and Queries
• Name Selections
• Response Analysis
• Trend Analysis

But does it support predictive
modeling and scoring?

11
For modeling, clean the data first
“Garbage-in, garbage-out”
• Most data sets are messy & “unstructured”
• Over 70-80% of model development time
goes to data prep work
• Most databases are NOT model-ready
• Modeling & Scoring
o Extension of database work
o Consistency is “the” key

12
Predictive Modeling is all about “Ranking”

2

1

3

Ultimately, Models
must properly “Rank”

Determine the level of
data accordingly

• Households
• Individuals
• Products

• Relational or unstructured
databases won’t cut it
• Must create “Descriptors” that
fit the level that needs to be
ranked

13
Unstructured to Structured
• Most modern databases optimized for massive storage and
rapid retrieval, not necessarily for predictive analytics
o Relational databases
o NoSQL databases

• Need “Analytical Sandbox” (or Database/Data-mart)
o Structured & de-normalized
o Variables as descriptors of model targets
o Common analytical language (SAS, R, SPSS)
o Must support “in-database” scoring

14
Analytical Sandbox – “Model-Ready” Environment

From “all” types of data collection to decisions. Then repeat.
15
Why Front-end DP Important?
Inexperienced analysts spend majority of time doing DP work
 Modeling work at the last minute!

Creative variables enhance models

Inconsistent data creates a chain reaction to melt-downs

Data append/match becomes ineffective

16
First, Define Analytical Goals
• Rank & select prospect names
• Cross-sell/Up-sell

• Segment the universe for messaging strategy
• Pinpoint attrition point
• Assign lifetime values
• Optimize media/channel spending
• Create product packages
• Project customer value
• Detect fraud

17
3 Major Types of Data for Marketers

Descriptive Data
• Demographic Data
• Firmographic Data
• Geo-demographic Data

Transaction Data /
Behavioral Data
• Transaction data

Attitudinal Data
• Surveys
• Sentiments

• Compiled / Co-op data
• Lifestyle data
• Online behavioral data

3-Dimensions in Predictive Analytics
18
Data Inventory
• “Modeling is making the best of
what we know”
• Beyond obvious RFM data
• Get Deeper
o Product/Service Level Data
o Historical Data
o Channel Data – Inbound & Outbound
o Online activities, sentiments,
unstructured data

• External Data
19
Create Data Menu
• Base it on Companywide Need-Analysis
• Ask the Analysts first:
What type of models are in the plan?
o Affinity/Look-alike Models
o Promotion/Response Models
o Time-series Models

o Attrition Models

• Consider non-analytical departments
• Maintain the ones that fit the objective
 Don’t be afraid to throw out “noises”

20
Data Menu (continued)

Check the ingredients

Cost - Can you afford to maintain it?

• What do you have today?
• What can be bought?
• What can be created?

• Storage/Platform – Consider the
scoring part, too
• Programming/Processing Time
• Software
• Update
• External Data
21
Check Your Data Inventory
You may have more than you thought, so let’s start
with what you have:
•
•

Name & Address: Key to Geo/Demographic Data
Order Transaction Data: “RFM”, Payment Methods

•

Item/SKU Level Data: Products, Price, Units

•

Promotion/Response History: Source, Channel, Offer

•

Life-to-Date/Past “x” Months Summary Data

•

Customer Level Status Flags: Active, Dormant, Delinquent

•

Surveys/Product Registration Forms : Attitudinal/Lifestyle

•

Customer Communication History Data: Call-center, Web

•

Social Media, Click-through, Page views : Sentiment/Intentions

Need Conversion, Categorization, & Summarization
22
Maximize the Power of Transaction Data

Most databases describe shopping baskets
 Start describing your targets
• RFM Data must be Summarized (or Denormalized)
• Turn RFM data into individual / household level
“Descriptors”
• Combine with essential categorical variables
(e.g., product, offer, channel, etc.)

23
Data Summarization – Matching the level of Data
Order Table
Cust ID

Order #

Order Summary Table

Order Date

$ Amount

000123

100011

2009-05-06

$199.99

000123

100128

2010-08-30

$50.49

000123

103082

2011-12-21

100036

2010-06-06

$43.99

003859

101658

2011-01-20

102189

2011-04-15

$119.45

003859

106458

2012-02-18

104535

2012-07-30

$354.72

016899

107296

2011-07-14

102982

2010-09-07

$128.60

019872

103826

2011-04-30

$499.99

019872

109056

2012-03-12

Last Order
Date

$199.99

019872

First Order
Date

$43.99

004593

$ Total

$43.99

003859

#
Orders

$128.60

003859

Cust ID

$59.99

000123

3

$379.08

2009-05-06

2011-12-21

003859

4

$251.42

2010-06-06

2012-02-18

004593

1

$354.72

2012-07-30

2012-07-30

016899

1

$199.99

2011-07-14

2011-07-14

019872

3

$688.58

2010-09-07

2012-03-12

24
Sample Variables after Summarization
Before

Recency

Frequency

Monetary

After Summarization
• Weeks since last online purchase
• Years since member sign up
• Days since last delinquent date
• Months since last response date
• Orders by offer type
• Orders by product/service type
• Payments by pay method
• Average days between transactions

• Total $ past 24 months
• Life-to-date spending
• Average dollars by channel
• Average dollars by product type

25
RFM Data Summary – Timeline
Life-to-date Summary provides
the historical view

May create bias towards
tenured customers

Put time limit on variables (e.g.
12-month, 24-month, etc.)

May require higher number of
variables and complicate the
process

For Lifetime Value & Time
Series Models

Must create historical arrays
(daily, weekly, monthly counts
of events)

26
Who does the summary work?
Answer: Not the statistician!
Key Takeaway

The data variables must be consistent
everywhere in the “Analytical Sandbox”
• Main analytical database
• Model development sample
• Pool of records to be scored

Pre-built summary variables in the
database

27
Data Categorization
Free-form data comes to life through categorization

Don’t Give Up!
• Hidden data in:
o Product, service, offer, channel, source, status, titles, surveys, etc.

• Have categorization guideline?

• Who will do it?
o Consider text mining techniques

• What to throw out?
o Keep data that matters in predictive modeling

28
Categorical Data
Any Non-numeric Data
• Product
• Service
• Offer
• Channel
• Source
• Market
• Region
• Business Title
• Member Status
• Payment Status
• etc…

Offer Code Example:
• Flat Dollar Discount
• % Discount
• Buy 1, Get 1 Free
• Free Shipping
• No Payment Until…
• Free Gift
• etc…

Categorize as much as possible at the data collection stage
29
Categorization Guidelines
Be consistent throughout
•
•
•
•
•
•

Survey Form
Data Entry
Inventory Database
Data Collection & Compilation
Summarization
Modeling, and Scoring

Create “Code” structure
• NEVER allow free-form answers

30
Categorization Guidelines (continued)
Create Rules and DON’T Deviate
from them

Training & Automation

More specific the better

Combine them later

But, don’t allow too many
variations (over 20) in one code

Break into multiple
codes if necessary

Don’t forget the end goals and
don’t over-do it

Must be “relevant”

31
Data Hygiene & Data Append
Do you know how many customers you
have in your database?

• Data conversion
o
o
o
o

Create consistency
Standardization
Edit
Purge

• Cover all bases – PII & RFM Data
• Create rules and be consistent

32
PII – Gateway to External Data
What is hidden behind simple name &
address?
• Standardize Name & Address first
o Maintain PII (Personally Identifiable Information)
o Hygiene via periodic NCOA and standardization

• First & Last Name – Ethnic, Gender
• Name, Address, Email – Demographic Data
• Address – Geo-demographic, Census Data
• Zip – County, Market Region, DMA

33
External Data
• Always consider buying data
before collecting and building
• Compiled Demographic /
Firmographic Data
• Behavioral / Transaction / Co-op
Data
• Lifestyle / Attitudinal / Survey
• Census / Geo-Demographic Data

34
External Data Check List
• Test multiple data sources
1.
2.
3.
4.
5.
6.

Depth of information / Uniqueness
Coverage / Match Rate
Consistency / Update Cycle
Price
User-friendliness
Delivery Options – Real-time?

• Learn about the data sources
o What’s real and what’s imputed?
o Don’t stop at Demographic: always consider
“Behavioral” data

35
Missing Values
“Missing Data can be meaningful.”
• For Numeric Data (e.g., $, Counters, Dates, etc.)
o Incalculable vs. Data-append Non-matches
o Missing is missing: DO NOT fill in with 0’s

$

• For Categorical Data (e.g., Codes, Text, etc.)
o Leave room for “N/A” (e.g., blank, “N/A”, “0”, “.”,
etc.)
o Code “Non-matches” to external files differently

And yet, unstructured database only store
what’s available.

36
More on missing data
• Agree on Imputation Rules
o Do it upfront
o Must be part of scoring codes

Data

• Educate non-analysts
o Hard to undo when combined with
other values
o Train vendors

• Always check % missing
o Development Sample vs. Life
Databases

37
Scoring – Sample vs. Database
• Development Sample vs. Live Database
o
o
o
o

Database Structure
Variable List/Name
Variable Value
Imputation Assumption

Lead to disasters if “anything” is different
• Do NOT play with model groups that are set in the
development sample

38
Scoring QC
Most troubles happen after the models are built…
Check:
•
•
•
•
•
•

Model Group Distributions
Variable distributions (values and indices)
Missing Values
Match rate for appended data
Scoring codes, including score breaks
Compare to previous runs – Check Deterioration

Set parameters for acceptable differences and Enforce

39
Score installment in the database
• Plan ahead to store model
scores in the database and datamarts
 Sync with the main database
• Store raw scores, not just model
groups
• Match the levels of scores
o
o
o
o

Household
Individual
Email
Product

40
Back-end Analysis
Close the Loop Properly!
• 1-to-1 MKT 101:
“Learn from past campaigns”
• Must Plan ahead
o No excuse for not doing it
o Schedule ahead & budget properly
o Keep historical promo/response data (even
without marketing databases)

• Set Metrics Upfront
o List/Data Source
o By Offer, Creative, Season, Product, etc.

41
Response Reports & BI Analytics
• Start with “Canned” Reports and BI Dashboards
from vendors
• Don’t insist real-time for every report
• Get ready to create “Custom” reports and
dashboard views
o Prioritize what you want
• Format and Delivery
• Timing and Interval
• Timeline to be covered (YTD, 12-mo, etc.)

42
Key ROI Metrics
Set ROI Metrics, such as:
• Open, Click-through, Conversion Rates
o “Denominator” in each?

• Revenue
o Per 1,000 mailed/calls
o Per Order
o Per Display, Email, Click-through, Conversion

• By
o Source, Campaign, Time Period, Model
Group, Offer, Creative, Targeting
Criteria, Channel (in & outbound), Ad
server, Publisher, Key word, Script, Daypart, etc.

• Key variables must reside in the database
o Keep them in “ready-to-use” format
43
Where to Begin with Analytical Sandbox
Spec it out:
• Project Goal

• Data Source List (as detailed as possible)
• Final Variable List
• Project Flow:
o Collection

o Scoring

o Conversion & Categorization

o Storage

o Summarization

o Backend Analysis

o Matching / Data Append
o Development Sample

44
Who will build the sandbox?
• In-house vs. Out-sourcing;
Must consider
o Platform
o Software
o Programming
o Staffing

• Cost it out
o Don’t forget the update cost

• Back to the analysts for variable list
review
• Don’t be shy and ask for help from
consultants
45
10 essential items to consider when outsourcing analytics
It is not just about modeling, but all surrounding services as well
10 essential items to consider when outsourcing analytical projects
1. Consulting capability: Translate marketing goals into mathematics
2. Data processing: Conversion, edit, summarization, data-append, etc.
3. Pricing structure: Model development is only one part; hidden fees?
4. Track record in the industry: Not in rocket science, but in marketing

5. Types of models supported: Watch out for one-trick ponies
6. Speed of execution: Turnaround time measured in days, not weeks
7. Documentation: Full disclosure of algorithms, charts and reports
8. Scoring validation: Job not done until fully scored and validated
9. Back-end analysis: For true “Closed-loop” marketing
10. Ongoing Support: Periodic review and update

46
Scope it out
• Know what you need, but don’t
over do it
“Modeling is making the best of
what’s available”
• Take a phased approach
o If budget is tight, start with low
hanging fruits
o Proof of concepts without full
database commitment in the
beginning
o Maintain consistency
o Keep Historical Data

47
Key Takeaways
•
•
•
•

•
•
•
•

Invest in analytics: It is about the insights, not just the data
Set business & analytical goals first
Advanced analytics requires its own sandbox
Ensure consistent data every step of the way: From
sampling to scoring
Check every data source, but don’t wait for a perfect data
set
Match the levels of data (Data Summary)
Don’t over-do it – employ phased approach
Ask for help
Stephen H. Yu · Data Strategy & Analytics Consultant· stephenyu1210@gmail.com· 201.218.2068

Weitere ähnliche Inhalte

Was ist angesagt?

Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Project analytics in Project Management
Project analytics in Project ManagementProject analytics in Project Management
Project analytics in Project ManagementKetan Gandhi
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Erik Fransen
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernAmin Chowdhury
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsPerficient, Inc.
 
INTRODUCTION TO BUSINESS ANALYTICS
INTRODUCTION TO BUSINESS ANALYTICSINTRODUCTION TO BUSINESS ANALYTICS
INTRODUCTION TO BUSINESS ANALYTICSAninditaGogoi5
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Business Over Broadway
 
Harnessing the power of analytics
Harnessing the power of analyticsHarnessing the power of analytics
Harnessing the power of analyticsYogesh Dandawate
 
1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptopRising Media, Inc.
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikhNikhat Fatma Mumtaz Husain Shaikh
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
 
Modeling nordstrom fit_solutions_sf-apr2015
Modeling nordstrom fit_solutions_sf-apr2015Modeling nordstrom fit_solutions_sf-apr2015
Modeling nordstrom fit_solutions_sf-apr2015Edward Mabanglo
 
Marketers Flunk The Big Data Text
Marketers Flunk The Big Data TextMarketers Flunk The Big Data Text
Marketers Flunk The Big Data TextShaun Kollannur
 
Avesh sh
Avesh shAvesh sh
Avesh shAvesh G
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
 

Was ist angesagt? (20)

Data Quality
Data QualityData Quality
Data Quality
 
Project analytics in Project Management
Project analytics in Project ManagementProject analytics in Project Management
Project analytics in Project Management
 
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
Best Practices: Datawarehouse Automation Conference September 20, 2012 - Amst...
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
SPI IQ for Retailers
SPI IQ for RetailersSPI IQ for Retailers
SPI IQ for Retailers
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
 
INTRODUCTION TO BUSINESS ANALYTICS
INTRODUCTION TO BUSINESS ANALYTICSINTRODUCTION TO BUSINESS ANALYTICS
INTRODUCTION TO BUSINESS ANALYTICS
 
Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...Improving the customer experience using big data customer-centric measurement...
Improving the customer experience using big data customer-centric measurement...
 
Harnessing the power of analytics
Harnessing the power of analyticsHarnessing the power of analytics
Harnessing the power of analytics
 
Big data@work
Big data@workBig data@work
Big data@work
 
Business process based analytics
Business process based analyticsBusiness process based analytics
Business process based analytics
 
1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikh
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Modeling nordstrom fit_solutions_sf-apr2015
Modeling nordstrom fit_solutions_sf-apr2015Modeling nordstrom fit_solutions_sf-apr2015
Modeling nordstrom fit_solutions_sf-apr2015
 
Marketers Flunk The Big Data Text
Marketers Flunk The Big Data TextMarketers Flunk The Big Data Text
Marketers Flunk The Big Data Text
 
Avesh sh
Avesh shAvesh sh
Avesh sh
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 

Ähnlich wie Data Refinement: The missing link between data collection and decisions

Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 
Data Refinement
Data RefinementData Refinement
Data RefinementVivastream
 
BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1Kelvin Chan
 
Data Detectives - Presentation
Data Detectives - PresentationData Detectives - Presentation
Data Detectives - PresentationClint Campbell
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big dataSeta Wicaksana
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Business Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeBusiness Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeTatiana Ivanova
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Marketing Mix Modelling - Marketing Analytics Summit
Marketing Mix Modelling - Marketing Analytics SummitMarketing Mix Modelling - Marketing Analytics Summit
Marketing Mix Modelling - Marketing Analytics SummitPetri Mertanen
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017Prashant Bhatmule
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatisticsAaron Sankey
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...Databricks
 
Engineering Perspectives on Business
Engineering Perspectives on Business Engineering Perspectives on Business
Engineering Perspectives on Business Michael Zargham
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysBusiness Over Broadway
 

Ähnlich wie Data Refinement: The missing link between data collection and decisions (20)

Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 
Data Refinement
Data RefinementData Refinement
Data Refinement
 
BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1BI Knowledge Sharing Session 1
BI Knowledge Sharing Session 1
 
Data Detectives - Presentation
Data Detectives - PresentationData Detectives - Presentation
Data Detectives - Presentation
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data mining
Data miningData mining
Data mining
 
Digital Economics
Digital EconomicsDigital Economics
Digital Economics
 
Chapter 10 supporting decision making
Chapter 10  supporting decision makingChapter 10  supporting decision making
Chapter 10 supporting decision making
 
Business Intelligence and OLAP Practice
Business Intelligence and OLAP PracticeBusiness Intelligence and OLAP Practice
Business Intelligence and OLAP Practice
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Marketing Mix Modelling - Marketing Analytics Summit
Marketing Mix Modelling - Marketing Analytics SummitMarketing Mix Modelling - Marketing Analytics Summit
Marketing Mix Modelling - Marketing Analytics Summit
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatistics
 
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...No REST till Production – Building and Deploying 9 Models to Production in 3 ...
No REST till Production – Building and Deploying 9 Models to Production in 3 ...
 
Engineering Perspectives on Business
Engineering Perspectives on Business Engineering Perspectives on Business
Engineering Perspectives on Business
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and Surveys
 

Mehr von Vivastream

Exchange Solutions Datasheet_Ecommerce
Exchange Solutions Datasheet_EcommerceExchange Solutions Datasheet_Ecommerce
Exchange Solutions Datasheet_EcommerceVivastream
 
Exchange Solutions Datasheet_Customer Engagement Roadmap
Exchange Solutions Datasheet_Customer Engagement RoadmapExchange Solutions Datasheet_Customer Engagement Roadmap
Exchange Solutions Datasheet_Customer Engagement RoadmapVivastream
 
Vivastream Poster
Vivastream PosterVivastream Poster
Vivastream PosterVivastream
 
Vivastream Poster
Vivastream PosterVivastream Poster
Vivastream PosterVivastream
 
Breaking Up is Hard to Do: Small Businesses’ Love Affair with Checks
Breaking Up is Hard to Do: Small Businesses’ Love Affair with ChecksBreaking Up is Hard to Do: Small Businesses’ Love Affair with Checks
Breaking Up is Hard to Do: Small Businesses’ Love Affair with ChecksVivastream
 
EY Smart Commerce Report
EY Smart Commerce ReportEY Smart Commerce Report
EY Smart Commerce ReportVivastream
 
EY Global Consumer Banking Survey 2014
EY Global Consumer Banking Survey 2014EY Global Consumer Banking Survey 2014
EY Global Consumer Banking Survey 2014Vivastream
 
EY Global Consumer Banking Survey
EY Global Consumer Banking SurveyEY Global Consumer Banking Survey
EY Global Consumer Banking SurveyVivastream
 
Automation for RDC and Mobile
Automation for RDC and MobileAutomation for RDC and Mobile
Automation for RDC and MobileVivastream
 
Healthcare Payments Automation Center
Healthcare Payments Automation CenterHealthcare Payments Automation Center
Healthcare Payments Automation CenterVivastream
 
Next Generation Recognition Solutions
Next Generation Recognition SolutionsNext Generation Recognition Solutions
Next Generation Recognition SolutionsVivastream
 
Automation Services
Automation ServicesAutomation Services
Automation ServicesVivastream
 
Company Overview
Company OverviewCompany Overview
Company OverviewVivastream
 

Mehr von Vivastream (20)

Exchange Solutions Datasheet_Ecommerce
Exchange Solutions Datasheet_EcommerceExchange Solutions Datasheet_Ecommerce
Exchange Solutions Datasheet_Ecommerce
 
Exchange Solutions Datasheet_Customer Engagement Roadmap
Exchange Solutions Datasheet_Customer Engagement RoadmapExchange Solutions Datasheet_Customer Engagement Roadmap
Exchange Solutions Datasheet_Customer Engagement Roadmap
 
Test
TestTest
Test
 
Tcap
TcapTcap
Tcap
 
SQA
SQASQA
SQA
 
Jeeva jessf
Jeeva jessfJeeva jessf
Jeeva jessf
 
Vivastream Poster
Vivastream PosterVivastream Poster
Vivastream Poster
 
Vivastream Poster
Vivastream PosterVivastream Poster
Vivastream Poster
 
APEX
APEXAPEX
APEX
 
Breaking Up is Hard to Do: Small Businesses’ Love Affair with Checks
Breaking Up is Hard to Do: Small Businesses’ Love Affair with ChecksBreaking Up is Hard to Do: Small Businesses’ Love Affair with Checks
Breaking Up is Hard to Do: Small Businesses’ Love Affair with Checks
 
EY Smart Commerce Report
EY Smart Commerce ReportEY Smart Commerce Report
EY Smart Commerce Report
 
EY Global Consumer Banking Survey 2014
EY Global Consumer Banking Survey 2014EY Global Consumer Banking Survey 2014
EY Global Consumer Banking Survey 2014
 
EY Global Consumer Banking Survey
EY Global Consumer Banking SurveyEY Global Consumer Banking Survey
EY Global Consumer Banking Survey
 
Serano
SeranoSerano
Serano
 
Accura XV
Accura XVAccura XV
Accura XV
 
Automation for RDC and Mobile
Automation for RDC and MobileAutomation for RDC and Mobile
Automation for RDC and Mobile
 
Healthcare Payments Automation Center
Healthcare Payments Automation CenterHealthcare Payments Automation Center
Healthcare Payments Automation Center
 
Next Generation Recognition Solutions
Next Generation Recognition SolutionsNext Generation Recognition Solutions
Next Generation Recognition Solutions
 
Automation Services
Automation ServicesAutomation Services
Automation Services
 
Company Overview
Company OverviewCompany Overview
Company Overview
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Data Refinement: The missing link between data collection and decisions

  • 1. Data Refinement: The missing link between data collection and decisions Stephen H. Yu Data Strategy & Analytics Consultant
  • 2. What we will cover • • • • • • Database Marketing Landscape Analytics and Models “Model-Ready” Environment Data Summarization & Categorization Delivery: Scoring & QC Closing the Loop 1
  • 3. Big Data, Small Data, Neat Data, Messy Data How is the "Big Data” working out for you? 2.5 quintillion bytes collected per “day” 1 quintillion (exabytes) = 1 billion gigabytes • Did all this data improve your decision making process? • Do you have the results to show for? • Information Overload? You bet!  Harness insights, drop the noise 2
  • 4. Database Marketing Landscape • No guessing game – You MUST know your target • Vast amount of online & offline data collected  But are they being used properly? • Analytics play a huge roles in prospecting & CRM • Short paced marketing cycle getting shorter • Huge difference between advanced marketers and those who are falling behind Winners are the ones who know how to wield the power of all available data faster. 3
  • 5. Insights, not Raw Data • Database marketers must excel in: Collection Refinement Delivery Size & speed matters Get to answers, not just ingredients For consumption by end-users • Must provide “marketing answers” via advanced analytics, not just bits and pieces of data • Insight does not come from data, it is derived from data 4
  • 6. Refined Answers Raw Data Marketing Answers • • • • • • • • • • Likely to buy a luxury car • Likely to take a foreign vacation • Likely to response to free shipping offer • Likely to be a high value customer • Likely to be qualified for credit • Likely to upgrade • Likely to leave • Likely to come back Demographic / Firmographic RFM Products & Services Used Promotion / Response History Lifestyle / Survey Responses Delinquent history Call / Communication Log Movement Data Sentiments 5
  • 7. Different Types of Analytics “Analytics” means different things… • BI (Business Intelligence) Reporting: Display of success metrics, dashboard reporting • Descriptive Analytics: Profiling, segmentation, clustering • Predictive Modeling: Response models, cloning Models, lifetime value, revenue models, etc. • Optimization: Channel optimization, marketing spending analysis, econometrics models Predictive Modeling for 1-to-1 Marketing 6
  • 8. Why model? • Increase Targeting Accuracy • Reduce costs by contacting less/smart • Stay relevant • Consistent results • Reveal hidden patterns in data • Repeatable – key for automation • Expandable • “Supposedly” save time and effort Models summarize complex data into simple-to-use “scores” 7
  • 9. Why NOT model? • Universe is too small • Predictable data not available • 1-to-1 marketing channels not in plan • Tight budget • Lack of resources 8
  • 10. Custom Models for every stage of Marketing Lifecycle 9
  • 11. Any pain implementing models? • Not easy to find “Best” customers • Modelers are fixing data all the time • Rely on a few popular variables • Always need more variables • Takes too long to build models and score • Inconsistencies shown when scored • Disappointing results 10
  • 12. What does your database support? If you have a database… • Order Fulfillment • Contact Management • Standard Reports • Ad hoc Reports and Queries • Name Selections • Response Analysis • Trend Analysis But does it support predictive modeling and scoring? 11
  • 13. For modeling, clean the data first “Garbage-in, garbage-out” • Most data sets are messy & “unstructured” • Over 70-80% of model development time goes to data prep work • Most databases are NOT model-ready • Modeling & Scoring o Extension of database work o Consistency is “the” key 12
  • 14. Predictive Modeling is all about “Ranking” 2 1 3 Ultimately, Models must properly “Rank” Determine the level of data accordingly • Households • Individuals • Products • Relational or unstructured databases won’t cut it • Must create “Descriptors” that fit the level that needs to be ranked 13
  • 15. Unstructured to Structured • Most modern databases optimized for massive storage and rapid retrieval, not necessarily for predictive analytics o Relational databases o NoSQL databases • Need “Analytical Sandbox” (or Database/Data-mart) o Structured & de-normalized o Variables as descriptors of model targets o Common analytical language (SAS, R, SPSS) o Must support “in-database” scoring 14
  • 16. Analytical Sandbox – “Model-Ready” Environment From “all” types of data collection to decisions. Then repeat. 15
  • 17. Why Front-end DP Important? Inexperienced analysts spend majority of time doing DP work  Modeling work at the last minute! Creative variables enhance models Inconsistent data creates a chain reaction to melt-downs Data append/match becomes ineffective 16
  • 18. First, Define Analytical Goals • Rank & select prospect names • Cross-sell/Up-sell • Segment the universe for messaging strategy • Pinpoint attrition point • Assign lifetime values • Optimize media/channel spending • Create product packages • Project customer value • Detect fraud 17
  • 19. 3 Major Types of Data for Marketers Descriptive Data • Demographic Data • Firmographic Data • Geo-demographic Data Transaction Data / Behavioral Data • Transaction data Attitudinal Data • Surveys • Sentiments • Compiled / Co-op data • Lifestyle data • Online behavioral data 3-Dimensions in Predictive Analytics 18
  • 20. Data Inventory • “Modeling is making the best of what we know” • Beyond obvious RFM data • Get Deeper o Product/Service Level Data o Historical Data o Channel Data – Inbound & Outbound o Online activities, sentiments, unstructured data • External Data 19
  • 21. Create Data Menu • Base it on Companywide Need-Analysis • Ask the Analysts first: What type of models are in the plan? o Affinity/Look-alike Models o Promotion/Response Models o Time-series Models o Attrition Models • Consider non-analytical departments • Maintain the ones that fit the objective  Don’t be afraid to throw out “noises” 20
  • 22. Data Menu (continued) Check the ingredients Cost - Can you afford to maintain it? • What do you have today? • What can be bought? • What can be created? • Storage/Platform – Consider the scoring part, too • Programming/Processing Time • Software • Update • External Data 21
  • 23. Check Your Data Inventory You may have more than you thought, so let’s start with what you have: • • Name & Address: Key to Geo/Demographic Data Order Transaction Data: “RFM”, Payment Methods • Item/SKU Level Data: Products, Price, Units • Promotion/Response History: Source, Channel, Offer • Life-to-Date/Past “x” Months Summary Data • Customer Level Status Flags: Active, Dormant, Delinquent • Surveys/Product Registration Forms : Attitudinal/Lifestyle • Customer Communication History Data: Call-center, Web • Social Media, Click-through, Page views : Sentiment/Intentions Need Conversion, Categorization, & Summarization 22
  • 24. Maximize the Power of Transaction Data Most databases describe shopping baskets  Start describing your targets • RFM Data must be Summarized (or Denormalized) • Turn RFM data into individual / household level “Descriptors” • Combine with essential categorical variables (e.g., product, offer, channel, etc.) 23
  • 25. Data Summarization – Matching the level of Data Order Table Cust ID Order # Order Summary Table Order Date $ Amount 000123 100011 2009-05-06 $199.99 000123 100128 2010-08-30 $50.49 000123 103082 2011-12-21 100036 2010-06-06 $43.99 003859 101658 2011-01-20 102189 2011-04-15 $119.45 003859 106458 2012-02-18 104535 2012-07-30 $354.72 016899 107296 2011-07-14 102982 2010-09-07 $128.60 019872 103826 2011-04-30 $499.99 019872 109056 2012-03-12 Last Order Date $199.99 019872 First Order Date $43.99 004593 $ Total $43.99 003859 # Orders $128.60 003859 Cust ID $59.99 000123 3 $379.08 2009-05-06 2011-12-21 003859 4 $251.42 2010-06-06 2012-02-18 004593 1 $354.72 2012-07-30 2012-07-30 016899 1 $199.99 2011-07-14 2011-07-14 019872 3 $688.58 2010-09-07 2012-03-12 24
  • 26. Sample Variables after Summarization Before Recency Frequency Monetary After Summarization • Weeks since last online purchase • Years since member sign up • Days since last delinquent date • Months since last response date • Orders by offer type • Orders by product/service type • Payments by pay method • Average days between transactions • Total $ past 24 months • Life-to-date spending • Average dollars by channel • Average dollars by product type 25
  • 27. RFM Data Summary – Timeline Life-to-date Summary provides the historical view May create bias towards tenured customers Put time limit on variables (e.g. 12-month, 24-month, etc.) May require higher number of variables and complicate the process For Lifetime Value & Time Series Models Must create historical arrays (daily, weekly, monthly counts of events) 26
  • 28. Who does the summary work? Answer: Not the statistician! Key Takeaway The data variables must be consistent everywhere in the “Analytical Sandbox” • Main analytical database • Model development sample • Pool of records to be scored Pre-built summary variables in the database 27
  • 29. Data Categorization Free-form data comes to life through categorization Don’t Give Up! • Hidden data in: o Product, service, offer, channel, source, status, titles, surveys, etc. • Have categorization guideline? • Who will do it? o Consider text mining techniques • What to throw out? o Keep data that matters in predictive modeling 28
  • 30. Categorical Data Any Non-numeric Data • Product • Service • Offer • Channel • Source • Market • Region • Business Title • Member Status • Payment Status • etc… Offer Code Example: • Flat Dollar Discount • % Discount • Buy 1, Get 1 Free • Free Shipping • No Payment Until… • Free Gift • etc… Categorize as much as possible at the data collection stage 29
  • 31. Categorization Guidelines Be consistent throughout • • • • • • Survey Form Data Entry Inventory Database Data Collection & Compilation Summarization Modeling, and Scoring Create “Code” structure • NEVER allow free-form answers 30
  • 32. Categorization Guidelines (continued) Create Rules and DON’T Deviate from them Training & Automation More specific the better Combine them later But, don’t allow too many variations (over 20) in one code Break into multiple codes if necessary Don’t forget the end goals and don’t over-do it Must be “relevant” 31
  • 33. Data Hygiene & Data Append Do you know how many customers you have in your database? • Data conversion o o o o Create consistency Standardization Edit Purge • Cover all bases – PII & RFM Data • Create rules and be consistent 32
  • 34. PII – Gateway to External Data What is hidden behind simple name & address? • Standardize Name & Address first o Maintain PII (Personally Identifiable Information) o Hygiene via periodic NCOA and standardization • First & Last Name – Ethnic, Gender • Name, Address, Email – Demographic Data • Address – Geo-demographic, Census Data • Zip – County, Market Region, DMA 33
  • 35. External Data • Always consider buying data before collecting and building • Compiled Demographic / Firmographic Data • Behavioral / Transaction / Co-op Data • Lifestyle / Attitudinal / Survey • Census / Geo-Demographic Data 34
  • 36. External Data Check List • Test multiple data sources 1. 2. 3. 4. 5. 6. Depth of information / Uniqueness Coverage / Match Rate Consistency / Update Cycle Price User-friendliness Delivery Options – Real-time? • Learn about the data sources o What’s real and what’s imputed? o Don’t stop at Demographic: always consider “Behavioral” data 35
  • 37. Missing Values “Missing Data can be meaningful.” • For Numeric Data (e.g., $, Counters, Dates, etc.) o Incalculable vs. Data-append Non-matches o Missing is missing: DO NOT fill in with 0’s $ • For Categorical Data (e.g., Codes, Text, etc.) o Leave room for “N/A” (e.g., blank, “N/A”, “0”, “.”, etc.) o Code “Non-matches” to external files differently And yet, unstructured database only store what’s available. 36
  • 38. More on missing data • Agree on Imputation Rules o Do it upfront o Must be part of scoring codes Data • Educate non-analysts o Hard to undo when combined with other values o Train vendors • Always check % missing o Development Sample vs. Life Databases 37
  • 39. Scoring – Sample vs. Database • Development Sample vs. Live Database o o o o Database Structure Variable List/Name Variable Value Imputation Assumption Lead to disasters if “anything” is different • Do NOT play with model groups that are set in the development sample 38
  • 40. Scoring QC Most troubles happen after the models are built… Check: • • • • • • Model Group Distributions Variable distributions (values and indices) Missing Values Match rate for appended data Scoring codes, including score breaks Compare to previous runs – Check Deterioration Set parameters for acceptable differences and Enforce 39
  • 41. Score installment in the database • Plan ahead to store model scores in the database and datamarts  Sync with the main database • Store raw scores, not just model groups • Match the levels of scores o o o o Household Individual Email Product 40
  • 42. Back-end Analysis Close the Loop Properly! • 1-to-1 MKT 101: “Learn from past campaigns” • Must Plan ahead o No excuse for not doing it o Schedule ahead & budget properly o Keep historical promo/response data (even without marketing databases) • Set Metrics Upfront o List/Data Source o By Offer, Creative, Season, Product, etc. 41
  • 43. Response Reports & BI Analytics • Start with “Canned” Reports and BI Dashboards from vendors • Don’t insist real-time for every report • Get ready to create “Custom” reports and dashboard views o Prioritize what you want • Format and Delivery • Timing and Interval • Timeline to be covered (YTD, 12-mo, etc.) 42
  • 44. Key ROI Metrics Set ROI Metrics, such as: • Open, Click-through, Conversion Rates o “Denominator” in each? • Revenue o Per 1,000 mailed/calls o Per Order o Per Display, Email, Click-through, Conversion • By o Source, Campaign, Time Period, Model Group, Offer, Creative, Targeting Criteria, Channel (in & outbound), Ad server, Publisher, Key word, Script, Daypart, etc. • Key variables must reside in the database o Keep them in “ready-to-use” format 43
  • 45. Where to Begin with Analytical Sandbox Spec it out: • Project Goal • Data Source List (as detailed as possible) • Final Variable List • Project Flow: o Collection o Scoring o Conversion & Categorization o Storage o Summarization o Backend Analysis o Matching / Data Append o Development Sample 44
  • 46. Who will build the sandbox? • In-house vs. Out-sourcing; Must consider o Platform o Software o Programming o Staffing • Cost it out o Don’t forget the update cost • Back to the analysts for variable list review • Don’t be shy and ask for help from consultants 45
  • 47. 10 essential items to consider when outsourcing analytics It is not just about modeling, but all surrounding services as well 10 essential items to consider when outsourcing analytical projects 1. Consulting capability: Translate marketing goals into mathematics 2. Data processing: Conversion, edit, summarization, data-append, etc. 3. Pricing structure: Model development is only one part; hidden fees? 4. Track record in the industry: Not in rocket science, but in marketing 5. Types of models supported: Watch out for one-trick ponies 6. Speed of execution: Turnaround time measured in days, not weeks 7. Documentation: Full disclosure of algorithms, charts and reports 8. Scoring validation: Job not done until fully scored and validated 9. Back-end analysis: For true “Closed-loop” marketing 10. Ongoing Support: Periodic review and update 46
  • 48. Scope it out • Know what you need, but don’t over do it “Modeling is making the best of what’s available” • Take a phased approach o If budget is tight, start with low hanging fruits o Proof of concepts without full database commitment in the beginning o Maintain consistency o Keep Historical Data 47
  • 49. Key Takeaways • • • • • • • • Invest in analytics: It is about the insights, not just the data Set business & analytical goals first Advanced analytics requires its own sandbox Ensure consistent data every step of the way: From sampling to scoring Check every data source, but don’t wait for a perfect data set Match the levels of data (Data Summary) Don’t over-do it – employ phased approach Ask for help
  • 50. Stephen H. Yu · Data Strategy & Analytics Consultant· stephenyu1210@gmail.com· 201.218.2068