This document summarizes a presentation on big data analytics lessons from pioneers. The presentation covers what big data is in terms of volume, variety and velocity. It discusses challenges of big data, complex computation, enterprise readiness, speed and production efficiency, and talent. Stories of successes and failures with big data are provided, such as identifying adverse drug effects, improving airline operations, and detecting flu outbreaks. Recommendations include changing culture, failing fast and learning, engaging broader audiences, treating predictions as products, and moving quickly. The goal is to help new leaders learn from both successes and mistakes of big data pioneers.
2013 10 cu leeds school big data conference - bill jacobs - revolution analytics
1. Big Data Analytics:
Lessons From The Pioneers
Recommendations For New Leaders
CU Leeds School of Business Analytics Conference
September 2013
Boulder, Colorado
#LeedsAnalytics
Bill Jacobs
Director, Product Marketing - Revolution Analytics
@bill_jacobs
1
2. My Talk Today:
Big Data and Big Analytics
War Stories: Good, Bad and Ugly
Lessons and Recommendations To Consider
5. What is Big Data?
Volume Variety
Confidential to Revolution Analytics
Velocity
5
6. What is Big Data?
Big Data is big.
Data set so large it cannot be managed in conventional database
with acceptable performance and at acceptable cost.
Volume
Confidential to Revolution Analytics
6
7. What is Big Data?
Big Data is messy.
70-90% of all data generated lacks predefined structure or is
difficult to map into a conventional data model.
Variety
Confidential to Revolution Analytics
7
8. What is Big Data?
Big Data moves.
ICU: predict patient events
FICO: flag suspect transactions
Oreo: Superbowl ad from Tweets
Retail: push in-store offers
Velocity
Confidential to Revolution Analytics
8
9. Big Data meets Big Math =
New Business Outcomes
THE PERFECT STORM
+ Computing Power
+ Data
+ Pace of Business
+ Customer Expectations
+ Data Science
+ Computer Science
+ Management Science
Confidential to Revolution Analytics
Better Business
Decisions
New
Business
Outcomes
9
10. Second generation predictive analytics
2nd Generation Predictive
Analytics
Big Data
Machine Learning
Real Time / Nr Real Time
Quick to Fail / Experimentation
Continuous Model
Improvement = Value
11. Big Data vs. Big Data Analytics
Volume Variety
Velocity
The More Important V’s:
Veracity while delivering Value, and
embracing of Volatility.
Assuring
Confidential to Revolution Analytics
11
12. Typical Challenges Facing Analytical Organizations
Big Data
• New Data
Sources
• Data Variety
& Velocity
• Data
Movement,
Memory
Limits
Complex
Computation
Enterprise
Readiness
• Innovative
Models
• Experiments
• Many Small
Models
• Ensemble
Methods
• Simulation
• Many platform
choices
• Production
Support
• Deploy to
Business
Users
Confidential to Revolution Analytics
Speed &
Production
Efficiency
• Model Life
• Many Models
• Long Cycle
Time
• Faster
Decisions
• Big Hardware
Talent
• Finding data
scientists
• Training
• Creating an
Analytical
culture
12
13. Analytical Competitors of Tomorrow
Sustainability Analytics
Customer / Marketing Analytics
Parts Optimization / Pricing
HR Analytics
Big Data &
Big
Analytics
Kaizen Process Excellence
Better
Decisions
Warranty Analytics
More Models
More Quickly
Predictive Asset Analytics
Supply Chain Analytics
14. Tools: Incredible Visualization, Descriptive and
Predictive Statistics, and Machine Learning
Machine Learning Algorithms in R
Confidential to Revolution Analytics
14
15. Stories: The Bad, The Ugly and The Good
The Ugly: Abuse.
The Bad: Missteps and Missed Opportunities.
The Good: Big Analytics Doing Good
15
16. The Ugly: Governmental Overreach Using Big Data
WikiLeaks, Edward Snowden, NSA…
And now the CBP:
Customs and Border Protection are Stopping and
Searching Private Flights.
Aircraft interceptions & searches after the flights
stopped in Colorado [where Pot has been legalized].
Was a law broken? Was an unreasonable search
conducted? How were the flights selected?
Bigger Question: Was the Data Legally Obtained?
16
17. The Ugly: Commercially - Even LinkedIn!
“”When users sign up for LinkedIn they are
required to provide an external email address
as their username and to setup a new
password for their LinkedIn account. LinkedIn
uses this information to hack into the user’s
external email account and extract email
addresses. LinkedIn is able to download these
addresses without requesting the password
for the external email accounts or obtaining
user’ consent.”
17
19. Stories: Big Analytics Doing Good in the World
The Ugly.
The Bad.
The Good: Big Analytics Doing Good
– Kaiser and Vioxx
– Google Flu
– Medicare and the Big Insurers
– Jepessen and Cost Containment in Airline Operations
– NYC Building Inspectors Save First Responder Lives
– Netflix and My Movie Watching
– Identity Resolution & Healthcare Fraud
19
21. The Good: Addressing Drug Outcomes & Side
Effects Retroactively
Vioxx and Celebrex were both approved medications
Kaiser Permanente Studied Outcomes for 1.4M Members
Vioxx was proven to be linked with increased heart attacks
– 27,000 Heart Attacks over 4 years.
Result: Vioxx Pulled from Market. Lives Saved.
21
22. The Good: Center for Medicare 5 Star Program
Incents Big Data Analysis To Huge Gains
Improvement Incentives + Business Gains Projected to Equal
CMS Incentives Pay Higher Rates for Programs with Higher Satisfaction
Ratings.
Major Insurer Estimates $20B Revenue Improvement for a ½ Star
Increase.
22
23. The Good: Making Air Travel More Cost Effective
Jeppesen Tail Assignment
Automated Aircraft Routing and Assignment
Found $10M In First Analysis of One Airline’s Data
Optimize Aircraft Assignment:
Fuel Costs
Fuel Consumption
Maintenance Needs
Operational Profile
Passenger Traffic
Additional Opportunities:
Predictive Maintenance
Speed vs. Cost Planning
Regulatory Compliance
Maintenance Period
Adjustments
23
24. Stories: Big Analytics Doing Good in the World
The Ugly.
The Bad.
The Good: Big Analytics Doing Good
– Vioxx, Celebrex in the Court of Kaiser Permanente
– Google Flu
– Medicare and the Big Insurers
– Jepessen and Cost Containment in Airline Operations
– NYC Building Inspectors Save First Responder Lives
– Netflix and My Movie Watching
– Identity Resolution & Healthcare Fraud
24
25. Lessons
Big Gets Bigger.
New Data Sets, New Methods, New
Audiences
Today: Social Networks and Media,
Tomorrow: Internet of Everything
No Business Is Immune
Diverse Businesses Are Capitalizing
from Big Data Analytics
Veracity Demands Vigilance;
Volatility Demands Investment
Stale Predictions Put Companies On
The Line
Humans Often Represent The Greatest
Inertia
Regulation Trails Abuses
HR Has a Huge Challenge
NSA on FISA Wiretaps: “We Only
Collect Metadata”
Talent Pool Governs Outcomes
Organizational Change Is Critical
Attraction, Cultivation and Retention of
Once-Obscure Talents
Build a Shared Big Data Culture
Adapt Business & IT Practices Accord
25
26. Recommendations
Change Your Culture
Fail Fast; Learn From Failure
Engage a Broader Audience
– Identify Profiles of Stakeholders & Adapt To Them
– Develop a Career Path For Prediction’s Stakeholders
Treat Predictions as Products; Data Infrastructure As a Prediction Factory
Life’s Too Short…
26
27. Thank you
Revolution Analytics is the leading commercial
provider of software and support for the
popular open source R statistics language.
www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR
27
Hinweis der Redaktion
We’re in the midst of a period of disruption where we’re transitioning from the first generation of predictive analytics which was dominated by SAS & SPSS into second generation of predictive analytics where the leader is open source R
In this 2nd Generation, we’re busting through 1st generation barriers. We’re moving into using:massive data setsmachine learningreal-time decision makingMoving from stable models to continually improving models for additional liftMoving to quick-to-fail and experimental design approaches to incorporate learning and new information
In this 2nd Generation, we’re busting through 1st generation barriers. We’re moving into using:massive data setsmachine learningreal-time decision makingMoving from stable models to continually improving models for additional liftMoving to quick-to-fail and experimental design approaches to incorporate learning and new informationAn example: Hyundai – 3 cars with telematics on the road. Already spotted a huge warranty issue – but weren’t prepared to act on the data they already had. Result? Nightmarish shortage of parts.
R has extensive graphics capabilities and can produce stunning 2D and 3D images. It is used by many organisations such as Google, Facebook and media organisations. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. The black example near bottom above the Facebook connections image is the graphical representation of a tree cricket calling song (Oecanthuspellucens
Examples of New Data Sets:Sensor data and mountain climbing analogy. EMC prediction of 300x data growth Not 300% but 30000%Clothing Retailer Use of Video to Pre-Market CustomersWith huge data comes new methods – human-less Machine Learning.No Business Immune: Monsanto, Crop Production and Seed RecommendationNew York Building InspectorsHR: WalMart – 2 years retention max.Gartner Analyst Merv Adrian story about CIO’s Blood Money hiring.Veracity, Vigilance: Model quality must become a continuous process.Humans are often the impediment
Predictions as factory: Call it Center of Excellence, Call it Tiger Team. But build it to last.Factory Analogy:Data = Raw MaterialPredictive Models = The ProductPredictive Analytics = Production Know-HowData Scientists, Business Analysts = Production CapacityModel Lifecycle = Production CapacityModel Accuracy = Product QualityLife’s too short – join the community of business leaders, analysts, or encourage your staff to do so.