2. Outline of the Talk
• Introduction to the need for causal analytics in retail and telco settings:
identification and explanation of anomalies
• Discussion on the challenges and opportunities
• Challenges in automating causal relationships only using observational data
• Using external data for additional signals in the retail setting
• Overview of possible approaches and expected outcomes
3. Introduction: Anomalies
• Anomalies are ‘points of interest’ in the data: effects for which we want to
know the causes
• May or may not have a fixed definition
• Identification of anomalies: rule based or data driven
• Data driven anomalies: Outlier detection with unknown no. of outliers
• Training data without outliers might not be available
4. Introduction: Anomalies in Retail
• Potential variable of interest: sales of a product
• Not clearly defined unless scope is limited
• Sales fluctuate based on season, weekday/weekend, promotions
• Need a combination of rule based and data driven, with rules being
automatically derived from the data + domain expertise
• Example: Sales of a particular product might see a surge during all weekends
but since it is expected, it might not really be an effect of interest
5. Introduction: Anomalies in Telco
• Potential variables of interest: call quality, call drop rate
• Usually clearly defined, completely rule based
• Data is distributed across multiple data systems
• Big data problem
• Causes can be hierarchical, need to consider propagation of causal effect
6. Challenges and Opportunities
Challenge Opportunity
Reliable anomaly detection [Retail] Pointing out non-trivial anomalies in the data
Data consolidation [Telco] Providing insights across systems that can map to
real impact
External/hidden causes Integrating other data signals into the causal
analysis
Volume of data Make it feasible to extract and analyze all possible
insights and then highlight the most relevant
7. Challenges in Automated Causal Analytics
• Limited ground truth information
• Difficult to evaluate approach without manual verification
• Data might not capture the “real” causes
• The higher the number of potential factors, the more difficult to determine
causal direction
• All existing approaches are limited by set assumptions
8. Using External Data for Additional Signals in
Retail
• Product sales are not independent variables
• Impossible to analyze the interaction of every product with every other
product manually
• How to decide if two products are inter-connected?
• Data might be limited due to varying degree of record-keeping
• Can we use real-world unstructured data to our benefit?
9. Potential Contributions of Unstructured Data
• Assess the real-world connection between products and product categories
• Assess and account for public perception and social media trends
• Support mechanism for relations mined with automated
statistical/probabilistic approaches limited to structured data
11. Overview of Possible Approaches: A Starting
Point
• Gold standard: controlled experiments -> not possible for many real world cases
• Using external causal ground truth data to train a model followed by domain
adaptation
• Using the difference in X->Y and Y->X modeling as features
• Classification into three categories: 1 (positive), -1 (negative), 0 (neutral/undetermined)
• LiNGAM: Non-Gaussian Linear Causal Models
• Non-linear relationships with Additive Noise Models
12. Expected Outcomes and Impact
• Increased revenue by smarter stocking/promotion decisions
• Increased customer satisfaction
• Prevention of major inventory issues
• Targeted advertising
• Increased revenue by focusing on the right problem areas