SlideShare ist ein Scribd-Unternehmen logo
1 von 88
1 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
BigData, Old Data, to AllData:
Predictive Analytics in a Changing Data Landscape
Usama Fayyad, Ph.D.
Chief Data Officer – Barclays
Twitter: @usamaf
Talk at IIT Madras
Chennai, India– March 27, 2015
2 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Outline
• Big Data all around us
• The CDO role and Data Axioms
• Some of the issues in BigData
• Introduction to Data Mining and Predictive
Analytics Over BigData
• Case studies
• Summary and conclusions
3 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
What Matters in the Age of Analytics?
1.Being Able to exploit all the data that is available
• not just what you've got available
• what you can acquire and use to enhance your actions
2. Proliferating analytics throughout the organization
• make every part of your business smarter
• Actions and not just insights
3. Driving significant business value
• embedding analytics into every area of your business can
significantly drive top line revenues and/or bottom line
cost efficiencies
4 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Why Big Data?
A new term, with associated “Data Scientist” positions:
• Big Data: is a mix of structured, semi-structured, and
unstructured data:
– Typically breaks barriers for traditional RDB storage
– Typically breaks limits of indexing by “rows”
– Typically requires intensive pre-processing before each query
to extract “some structure” – usually using Map-Reduce type
operations
• Above leads to “messy” situations with no standard
recipes or architecture: hence the need for “data
scientists”
– conduct “Data Expeditions”
– Discovery and learning on the spot
5 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
The 4-V’s of “Big Data”
• Big Data is Characterized by the 3-V’s:
– Volume: larger than “normal” – challenging to load/process
• Expensive to do ETL
• Expensive to figure out how to index and retrieve
• Multiple dimensions that are “key”
– Velocity: Rate of arrival poses real-time constraints on what
are typically “batch ETL” operations
• If you fall behind catching up is extremely expensive (replicate very
expensive systems)
• Must keep up with rate and service queries on-the-fly
– Variety: Mix of data types and varying degrees of structure
• Non-standard schema
• Lots of BLOB’s and CLOB’s
• DB queries don’t know what to do with semi-structured and
unstructured data.
6 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Male, age 32
Lives in SF
Lawyer
Searched on
from London
last week
Searched on:
“Italian
restaurant
Palo Alto”
Checks Yahoo!
Mail daily via
PC & Phone
Has 25 IM Buddies,
Moderates 3 Y!
Groups, and hosts a
360 page viewed by
10k people
Searched on:
“Hillary Clinton”
Clicked on
Sony Plasma TV
SS ad
Registration Campaign Behavior Unknown
Spends 10 hour/week
On the internet Purchased Da
Vinci Code
from Amazon
“Classic” Data: e.g. Yahoo! User DNA
7 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Male, age 32
Lives in SF
Lawyer
Searched on from
London last week
Searched on:
“Italian
restaurant
Palo Alto”
Checks Yahoo! Mail
daily via PC & Phone
Has 25 IM Buddies,
Moderates 3 Y! Groups, and
hosts a 360 page viewed by
10k people
Searched on:
“Hillary Clinton”
Clicked on
Sony Plasma TV
SS ad
Spends 10 hour/week
On the internet Purchased Da Vinci
Code from Amazon
How Data Explodes: really big
Social Graph (FB)
Likes &
friends likes
Professional netwk
- reputation
Web searches on
this person,
hobbies, work,
locationMetaData on everything
Blogs, publications,
news, local papers,
job info, accidents
8 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
The Distinction between “Classic Data” and
“Big Data” is fast disappearing
• Most real data sets nowadays come with a serious mix
of semi-structured and unstructured components:
– Images
– Video
– Text descriptions and news, blogs, etc…
– User and customer commentary
– Reactions on social media: e.g. Twitter is a mix of data
anyway
• Using standard transforms, entity extraction, and new
generation tools to transform unstructured raw data
into semi-structured analyzable data
9 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Text Data: The Big Driver
• We speak of “big data” and the “Variety” in 3-V’s
• Reality: biggest driver of growth of Big Data has been
text data
– Most work on analysis of “images” and “video” data has
really been reduced to analysis of surrounding text
Nowhere more so than on the internet
• Map-Reduce popularized by Google to address the
problem of processing large amounts of text data:
– Many operations with each being a simple operation but
done at large scale
– Indexing a full copy of the web
– Frequent re-indexing
10 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
IT Log & Security
Forensics & Analytics
Automated Device Data
Analytics
Failure Analysis
Proactive Fixes
Product Planning
Advertising
Analytics
Segmentation
Recommendation
Social Media
Big Data
Warehouse Analytics
Cost Reduction Ad Hoc Insight
Predictive Analytics
Hadoop + MPP + EDW
Find New Signal Predict Events
100% Capture
Big Data Applications and Uses
11 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
A few words on:
The Chief Data Officer
Why are companies creating this
position?
12 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Why a Chief Data Officer?
• There is a fundamental realisation that Data needs to
become a primary value driver at organizations
• We have lots of Data
• We spend much on it: in technology and people
• We are not realising the value we expect from it
• A strong business need to create the CDO role:
• Traditional companies are not following, but adopting the
model that actually works in other data-intensive industries
• CDO has a seat at executive table: the voice of Data
• Data done right is an essential element to unify large
enterprises to unlock value form business synergies
13 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Why data is important
Every customer interaction is an opportunity to capture data, learn, and act
Data
capture/
opportunity
to learn
Actions
instant car loan
product offering is
displayed in the app
Connie logs into mobile
App to check her
balance. We know she
is looking for a car loan
CRM data is used to
pre-calculate
Connie’s borrowing
limit for a car loan
Internal and external
data sources, predictive
models identify cross-
sell opportunities
Connie is offered a
competitively priced
‘bespoke’ offer for car
servicing / MOT
Experimenting with
different tools for
different groups of
Connie gets a better
user experience in real-
time during every
session
personalised journey
based on pre-
calculated limit for
the car loan amount
Customer Interactions (branch visits, telephone calls, digital, mobile, sales)
Big Data platform
Customer
Interaction CRM
Predictive
Analytics
Multivariate
Testing
Targeted offers
during browsing
Product
Discovery
Cross-sell Measure feedback
per session
Millions of customer
interactions per day
14 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Fundamental Data Principles to
Support Analytics
Usama’s Obvious Data Axioms
15 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
1. Data gains value exponentially when
integrated and coalesced.
– When fragmented: dramatic value loss takes place;
– increased costs;
– reduced utility/integrity;
– and increased security risks
16 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
2. Fusing Data together from
disparate/independent sources is difficult
to achieve and impossible to maintain
Hence only viable approach is:
• Intercepting and documenting at the source
• fusing at the source
• controlling lifecycle and flow
17 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
3. Standardisation is essential
• for sustained ability to integrate data sources
and hence growing value;
• for simplifying down-stream systems and apps
• For enforcing discipline as a firm increases its
data sources
18 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
4. Data governance and policy must be
centralised
• needs to be enforced strongly else we slip into
chaos and a Babylon of terms/languages
• An Enterprise Data Architecture spanning
structured and unstructured data
19 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
5. Recency Matters  data streaming in
modelling and scoring
• Often, accuracy of prediction drops quickly
with time (e.g. consumer shopping)
• Value of alerts drop exponentially with time…
• Ability to trigger responses based on real-
time scoring critical
• Streaming, real-time model updates, real-
time scoring
20 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
6. Data Infrastructure Needs:
• rapid renewal & modernization: the pace of
change and development of technology are very
rapid
– Design for migration and infrastructure replacement
via abstraction layers that remove tech
dependencies
• Encryption and Masking: Persisting
unencrypted confidential and secret data (even
within secure firewalls) is an invitation for
problems and risks
21 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Axioms
7. Data is a primary competency and not
a side-activity supporting other
processes
• Hence specialized skills and know-how are a
must
• Generalists will create a hopeless mess
• Data is difficult: modelling, architecture, and
design to support analytics
22 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Reality Check
So I am a marketer, how do I use
BigData for my business?
Social Media? Sentiment Analysis?
23 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
24 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
25 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 © LUMA Partners LLC 2014
Gamification
SOCIAL LUMAscape
Social Data
Community
Platforms
Social/Mobile
Apps & Games
Social Networks - Other
Social Search & Browsing
Social Commerce
Platforms
Analytics
Social Promotion Platforms
Social Publishing Platforms
Social Marketing Management Twitter Apps
Facebook Gaming
Facebook Apps
Content Curation
URL Shorteners Stream Platforms
Traditional Publishers
Social Business Software
Content Sharing (Reviews/Q&A/Docs)
Image/Video Sharing
Blogging
Platforms
External (Customer) Facing
Internal (Employee) Facing
Social Ad Networks
Social Intelligence Social
Scoring
Social TV
Social Referral
M
A
R
K
E
T
E
R
C
O
N
S
U
M
E
RSocial Shopping
Social Advertising Platforms
Social Login/Sharing
Social Content & Forums
Denotes acquired company Denotes shuttered company
Advocate Platforms
Social Branded Video
26 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Reality Check
So what should Users of Analytics in
Big Data World Do?
27 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Load the Data into a Data Lake
Your life will be so much easier as you can now do
Data Acrobatics an other amazing data feats…
28 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
The Data Lake -- according to Waterline
We loaded the Data!
Congratulations
Now What?
29 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
From a Data Lake to Amazon Browser
Amazon Simple Search
Amazon gives you facets
Product details
30 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Reality Check
So where do analysts and data
Scientists spend all their time?
31 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Data Analysis Vs. Data Prep
Let’s Mine
the Data
32 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Reality Check
So what do technology people worry
about these days?
33 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
To Hadoop or not to Hadoop?
when to use techniquesrequiring Map-Reduce
and grid computing?
• Typically organizations try to use Map-Reduce
for everything to do with Big Data
– This is actually very inefficient and often irrational
– Certain operations require specialized storage
• Updating segment memberships over large numbers of
users
• Defining new segments on user or usage data
34 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Drivers of Hadoop in Large
Enterprises
Cost of Storage
• Fastest growing demand is more storage
• Data in Data Warehouses have traditionally required
expensive storage technology:
–$100K per terabyte per year – cost of
Teradata storage
– $2.5K per terabyte – much lower per year –
cost of Hadoop on commodity storage
35 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Analysis & Programming Software
PIG
HIPI
36 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Hadoop Stack
37 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
38 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
39 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
40 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
41 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Reality: If Storage is Biggest Driver of
Hadoop Adoption; What is the next biggest?
ETL
• Replaces expensive licenses
• Much higher performance with lower infrastructure
costs (processors, memory)
• Flexibility in changing schema and representation
• Flexibility on taking on unstructured and semi-
structured data
• Plus suite of really cool tools…
42 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Turning the three Vs of Big Data into Value
Understand context and content
• What are appropriate actions?
• Is it Ok to associate my brand with this content?
• Is content sad?, happy?, serious?, informative?
Understand community sentiment
• What is the emotion?
• Is it negative or positive?
• What is the health of my brand online?
Understand customer intent?
• What is each individual trying to achieve?
• Can we predict what to do next?
• Critical in cross-sell, personalization, monetization,
advertising, etc…
43 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Many Business Uses of Predictive Analytics
Analytic technique Uses in business
Marketing and sales Identify potential customers; establish the
effectiveness of a campaign
Understanding customer behavior model churn, affinities, propensities, …
Web analytics & metrics model user preferences from data, collaborative
filtering, targeting, etc.
Fraud detection Identify fraudulent transactions
Credit scoring Establish credit worthiness of a customer requesting a
loan
Manufacturing process analysis Identify the causes of manufacturing problems
Portfolio trading optimize a portfolio of financial instruments by
maximizing returns & minimizing risks
Healthcare Application fraud detection, cost optimization, detection of events
like epidemics, etc...
Insurance fraudulent claim detection, risk assessment
Security and Surveillance intrusion detection, sensor data analysis, remote
sensing, object/person detection, link analysis, etc...
Case Studies:
1. Context Analysis (unstructured data)
2. Yahoo! predictive modeling
45 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Understanding Context
46 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Reality Check
So who is the company we think is
best at handling BigData?
47 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Biggest BigData in Advertising?
Understanding Context for Ads
48 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
The Display Ads Challenge Today
What Ad would you
place here?
49 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
The Display Ads Challenge Today
Damaging to Brand?
50 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
The Display Ads Challenge Today
What Ad
would you
place here?
CONFIDENTIAL
Why did Google Serve
this Ad?
51
this is how NetSeer
actually sees this
content
NetSeer:
SOLVING ACCURACY ISSUES | AMBIGUITY, WASTE, BRAND SAFETY
51
CONFIDENTIAL
high MPG
ford
low emission
fuel efficiency
ECONOMY CARS
economy
vehicles
microscope
lenses
reading
glasses
autofocus
bifocal
refraction
VISION TOOLS
eye chart
focus groups
A/B testing
consumer study
surveying
blind study
analytics
MARKET RESEARCH
~ ~ ~ ~ ~
~ ~ ~ ~ ~
~ ~ ~ ~
~ ~ ~ ~ ~
WEBSITE.COM
~ ~ ~ ~ ~
~ ~ ~ ~
~ ~ ~ ~ ~
~ ~ ~ ~ ~
~ ~ ~ ~
electric
vehicles
service
record
safety rating
NetSeer – How it works
52
focus
<CONCEPT>
DISCERNS AND MONETIZES HUMAN
INTENT
+ Identifies Concepts expressed on
a page
+ Disambiguates language
+ Builds increasingly rich profile over
52M
2.3B
CONCEPTS
RELATIONSHIPS
BETWEEN
CONCEPTS
53 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
NetSeer: Intent for Display
• Currently Processing 4 Billion Impressions per Day
54 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
Problem: Hard to Understand User
Intent
Contextual Ad served by Google What NetSeer Sees:
CONFIDENTIAL
http://diabetes.webmd.com/default.htm
NETSEER CONNECTING THE DOTS
55
Understanding intent of user online actions and connecting
them to define user intent profile over time
CONFIDENTIAL
http://www.healthline.com/health/type
-2-diabetes/insulin-pumps
NETSEER CONNECTING THE DOTS
56
Understanding intent of user online actions and connecting
them to define user intent profile over time
CONFIDENTIAL
Search: Sulfonylureas
NETSEER CONNECTING THE DOTS
57
Understanding intent of user online actions and connecting
them to define user intent profile over time
Case Studies:
1. Context Analysis (unstructured data)
2. Yahoo! Predictive Modeling
Yahoo! – One of Largest Destinations on the Web
80% of the U.S. Internet population uses Yahoo!
– Over 600 million users per month globally!
Global network of content, commerce, media, search and
access products
100+ properties including mail, TV, news, shopping, finance,
autos, travel, games, movies, health, etc.
25+ terabytes of data collected each day
• Representing 1000’s of cataloged consumer behaviors
More people visited
Yahoo! in the past
month than:
• Use coupons
• Vote
• Recycle
• Exercise regularly
• Have children
living at home
• Wear sunscreen
regularly
Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
Data is used to develop content, consumer, category and campaign
insights for our key content partners and large advertisers
Yahoo! Big Data – A league of its
own…
Terrabytes of Warehoused Data
25 49 94 100
500
1,000
5,000
Amazon
Korea
Telecom
AT&T
Y!LiveStor
Y!Panama
Warehouse
Walmart
Y!Main
warehouse
GRAND CHALLENGE PROBLEMS OF DATA PROCESSING
TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET
Y! Data Challenge Exceeds others by 2 orders of magnitude
Millions of Events Processed Per Day
50 120 225
2,000
14,000
SABRE VISA NYSE YSM Y! Global
Behavioral Targeting (BT)
Search
Ad Clicks
Content
Search Clicks
BT
Targeting ads to
consumers whose recent
behaviors online indicate
which product category is
relevant to them
Male, age 32
Lives in SF
Lawyer
Searched on
from London
last week
Searched on:
“Italian
restaurant
Palo Alto”
Checks Yahoo!
Mail daily via
PC & Phone
Has 25 IM Buddies,
Moderates 3 Y!
Groups, and hosts a
360 page viewed by
10k people
Searched on:
“Hillary Clinton”
Clicked on
Sony Plasma TV
SS ad
Registration Campaign Behavior Unknown
Spends 10 hour/week
On the internet Purchased Da
Vinci Code
from Amazon
Yahoo! User DNA
• On a per consumer basis: maintain a behavioral/interests profile and
profitability (user value and LTV) metrics
How it works | Network + Interests +
Modelling
Analyze predictive patterns for purchase
cycles in over 100 product categories
In each category, build models to describe
behaviour most likely to lead to an ad
response (i.e. click).
Score each user for fit with every
category…daily.
Target ads to users who get highest
‘relevance’ scores in the targeting
categories
Varying Product Purchase CyclesMatch Users to the ModelsRewarding Good BehaviourIdentify Most Relevant Users
Recency Matters, So Does Intensity
Active now… …and with feeling
Differentiation | Category specific
modelling
time
intensityscore
time
intensityscore
IntenseClickZone
Example 1: Category Automotive Example 2: Category Travel/Last Minute
Different models allow us to weight and determine intensity and recency
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
IntenseClickZone
Differentiation | Category specific
modelling
time
intensityscore
Intense Click Zone
Example 1: Category Automotive
Different models allow us to weight and determine intensity and recency
with no further activity, decay takes effect
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
user is in the Intense Click Zone
Automobile Purchase Intender Example
A test ad-campaign with a major Euro automobile manufacturer
 Designed a test that served the same ad creative to test and control groups
on Yahoo
 Success metric: performing specific actions on Jaguar website
Test results: 900% conversion lift vs. control group
 Purchase Intenders were 9 times more likely to configure a vehicle, request
a price quote or locate a dealer than consumers in the control group
 ~3x higher click through rates vs. control group
Mortgage Intender Example
We found:
1,900,000 people looking
for mortgage loans.
+122%
CTR Lift
Mortgages Home Loans Refinancing Ditech
Financing section in Real Estate
Mortgage Loans area in Finance
Real Estate section in Yellow Pages
+626%
Conv Lift
Example search terms qualified for this target:
Example Yahoo! Pages visited:
Source: Campaign Click thru Rate lift is determined by Yahoo! Internal
research. Conversion is the number of qualified leads from clicks over
number of impressions served. Audience size represents the audiencewithin
this behavioralinterestcategorythat has the highestpropensitytoengagewithabrandorproductandto
clickon anoffer.Date: March 2006
Results from a client campaign on Yahoo!
Network
Example: Mortgages
Experience summary at Yahoo!
• Dealing with one of the largest data sources (25
Terabyte per day)
• Behavioral Targeting business was grown from $20M
to > $400M in 3 years of investment!
• Yahoo! Specific? -- BigData critical to operations
– Ad targeting creates huge value
– Right teams to build technology (3 years of recruiting)
– Search is a BigData problem (but this has moved to
mainstream)
Lessons Learned
A lot more data than qualified talent
 Finding talent in BigData is very difficult
 Retaining talent in BigData is even harder
At Yahoo! we created central group that drove huge value to
company
Data people need to feel like they have critical mass
 Makes it easier to attract the right people
 Makes it easier to retain
Drive data efforts by business need, not by technology
priorities
 Chief Data Officer role at Yahoo! – now popular
RapidMiner in Big Data
Case Study
RapidMiner’s Strengths
7272
• Open Source Community & Marketplace – Crowd-sourced innovation,
quality assurance, market awareness.
• Fully-integrated Platform – Integrated, process-based business
analytics platform with focus on predictive analytics.
• No Programming Required – Easy-to-use, low maintenance costs,
standard platform for business analysts.
• Advanced Analytics at Every Scale – In-memory, in-database and in-
Hadoop analytics offer best option for every size of database.
• Connectivity – More than 60 connectors (incl. SAP & Hadoop), allowing
easy access to structured and unstructured data.
30,000+ Downloads per Month
SELECT LIST OF RECIPIENT ORGANIZATIONS
7373
Government & DefensePharma & Healthcare
Consulting
Oil & Gas, Chemicals
Financial ServicesSoftware & Analytics
Retail
Manufacturing
Business Services
Consumer ProductsAerospace
Technology
Entertainment Academia
74
PayPal
Who > world leading
online payment services
provider
Solution > Customer
feedback and voice of the
customer analysis, churn
prediction and prevention,
text mining and sentiment
analysis
SmartSoft
Who > provider of
solutions for preventing
fraud, money laundering,
and risks in financial
institutions
Solution > Integration of
Rapid-I’s predictive
analytics engine into their
solutions for fraud
detection and fraud
prevention for the
financial and telecom
sectors
Select Customer Stories
SO WHAT IS THE
BIG DATA STORY?
So the data is naturally moving to
Hadoop...
Situation:
–The data is moving to Hadoop for Cost (storage) and
Convenience (ETL) forces
–How do we get the value of predictive analytics to the data?
Rather than move the data out, move the analytics to the
data!
–Can we minimize the need for data movement?
–Data copies can become a management nightmare
–Analytics on a “Business As Usual” manner require
convenience
76
Radoop – RapidMiner on Hadoop
Opportunity:
–Avoid expensive data movement
–Leverage convenient data transformation
–Thousands of data connectors, many over semi-
structured and unstructured data
Why is this big news?
–Leverages a naturally occurring wave
–Analytics over a richer variety requires much more
processing
–The energy placed on data extraction and loading
moves to energy applied on actual analysis and
modelling
77
Big Picture on Big Data Analytics
Observations and
Concluding Remarks
Retaining New Yahoo! Mail Registrants
Often,
Simple is Very Powerful!
Integrating Mail and News
Data showed that users often check their mail
and news in the same session
–But no easy way to navigate to Y! News from Y! Mail
Mail users who also visit Y! News are 3X more
active on Yahoo
–Higher retention, repeat visits and time-spent
on Yahoo
“In the news” Module on Mail Welcome Page
Increased retention on Mail for light users by 40%!
–Est. Incremental revenue of $16m a year on Y! Mail alone
Nordstrom: Queries with No
Matches
Julie Bornstein, Web Marketing Director
–What are my customers looking for and not
finding?
–June 2002: queries for “belly button rings”
–returned no matches in store
–Why the sudden interest?
Nordstrom: Queries with No
Matches
Print Ad Campaign
Models happen to be sporting a
navel ring
Nordstrom does not sell navel
rings
What to do???
The early days of
mass auto
production
Today’s Auto:
It just works!
No need to understand what happens
when you turn on ignition
Very complex inside, but all simplicity
on the outside
Application Dates
Internships: Just launched
Contact : Faizan Chaudhary (faizan.chaudhary@barclays.com) and Tushar
Wadaskar (tushar.wadaskar@barclays.com)
A Sampling of Technology Opportunities at Barclays:
We seek world-class technologists, data scientists, problem-solvers ,Data
systems engineers-- the team that will re-invent Financial Services
Amer Sajed
CEO, Barclaycard US
Bassel Ojjeh
Chief Data Architect, DSI
Simon Gordon
MD, Head of Risk and Legal Technology - DSI
innovating Data and Analytics
solutions within the Financial
Markets industry ; understand and
influence the future of the global
Derivatives market. Risk
Technology within Barclays are
looking for world-class big data and
quantitative modeling skills.
A great opportunity to work on
some of the hardest technical
problems in the industry by using
open source to catch the bad guys
stealing money all the way up to
helping kids saving for their college
We at Barclays are unleashing the power of
disruptive technology. Innovation led by data
and design keeping our customers at the heart
of every product we build is our mantra. Come
join the revolution.
We have a challenge to quadruple
our business here in the US - and
that can only be delivered through
analytics - in marketing, risk, fraud,
and operations. And, Pune is the
undisputed center of the universe!
Faizan Chaudhary
Director, Data Systems & Insights
(DSI)
88 Imperial College – DSI Distinguished Lecture– Copyright Usama Fayyad © 2014
Usama Fayyad
Faizan.chaudhary@barclays.com
Tushar.Wadaskar@barclays.com
www.barclays.com/joinus
Thank You! & Questions
Twitter – @Usamaf

Weitere ähnliche Inhalte

Was ist angesagt?

The importance of data
The importance of dataThe importance of data
The importance of dataAPNIC
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data AnalyticsTUSHAR GARG
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Donghui Zhang
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data TechnologiesDATAVERSITY
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data LakeKaran Sachdeva
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014KMS Technology
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
A Big Data Journey
A Big Data JourneyA Big Data Journey
A Big Data JourneyPaul Boal
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesTony Pearson
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Homeland Security Research Corp.
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big DataFujitsu UK
 
Introduction to open data in DataOps
Introduction to open data in DataOpsIntroduction to open data in DataOps
Introduction to open data in DataOpsDataops Ghent Meetup
 
The Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big InformationThe Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big InformationDATAVERSITY
 

Was ist angesagt? (20)

The importance of data
The importance of dataThe importance of data
The importance of data
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
 
IBM Governed Data Lake
IBM Governed Data LakeIBM Governed Data Lake
IBM Governed Data Lake
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
A Big Data Journey
A Big Data JourneyA Big Data Journey
A Big Data Journey
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data
Big dataBig data
Big data
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
Big Data and Data Analytics in Homeland Security and Public Safety Market 201...
 
Structuring Big Data
Structuring Big DataStructuring Big Data
Structuring Big Data
 
Big Data
Big DataBig Data
Big Data
 
Introduction to open data in DataOps
Introduction to open data in DataOpsIntroduction to open data in DataOps
Introduction to open data in DataOps
 
The Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big InformationThe Need to Know for Information Architects: Big Data to Big Information
The Need to Know for Information Architects: Big Data to Big Information
 

Andere mochten auch

DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...
DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...
DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...Deltares
 
How to prepare data before a data migration
How to prepare data before a data migrationHow to prepare data before a data migration
How to prepare data before a data migrationETLSolutions
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataMohammed Guller
 
Data Strategy in 2016
Data Strategy in 2016Data Strategy in 2016
Data Strategy in 2016FairCom
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Finnish ITS and MaaS business: a landscape analysis
Finnish ITS and MaaS business: a landscape analysisFinnish ITS and MaaS business: a landscape analysis
Finnish ITS and MaaS business: a landscape analysisBusiness Finland
 
15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood
15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood
15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan OsgoodSalesforce Admins
 
Global Capital Markets & Industry Outlook
Global Capital Markets & Industry OutlookGlobal Capital Markets & Industry Outlook
Global Capital Markets & Industry OutlookRising Media, Inc.
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0Matt Turck
 
Top 5 ETL Tools for Salesforce Data Migration
Top 5 ETL Tools for Salesforce Data MigrationTop 5 ETL Tools for Salesforce Data Migration
Top 5 ETL Tools for Salesforce Data MigrationIntellipaat
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdpAIBDP
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data worldCraig Milroy
 
Developing a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business LeadersDeveloping a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business Leadersibi
 
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...Bob Samuels
 

Andere mochten auch (20)

BDtraining
BDtrainingBDtraining
BDtraining
 
DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...
DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...
DSD-INT 2014 - Delft-FEWS Users Meeting - Hydrological forecasting system in ...
 
Tanzania SBCC Landscape Analysis 2012
Tanzania SBCC Landscape Analysis  2012Tanzania SBCC Landscape Analysis  2012
Tanzania SBCC Landscape Analysis 2012
 
How to prepare data before a data migration
How to prepare data before a data migrationHow to prepare data before a data migration
How to prepare data before a data migration
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data Strategy in 2016
Data Strategy in 2016Data Strategy in 2016
Data Strategy in 2016
 
Data Mining- Big Data landscape
Data Mining- Big Data landscapeData Mining- Big Data landscape
Data Mining- Big Data landscape
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Finnish ITS and MaaS business: a landscape analysis
Finnish ITS and MaaS business: a landscape analysisFinnish ITS and MaaS business: a landscape analysis
Finnish ITS and MaaS business: a landscape analysis
 
15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood
15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood
15 Tips on Salesforce Data Migration - Naveen Gabrani & Jonathan Osgood
 
Global Capital Markets & Industry Outlook
Global Capital Markets & Industry OutlookGlobal Capital Markets & Industry Outlook
Global Capital Markets & Industry Outlook
 
Big data landscape version 2.0
Big data landscape version 2.0Big data landscape version 2.0
Big data landscape version 2.0
 
Top 5 ETL Tools for Salesforce Data Migration
Top 5 ETL Tools for Salesforce Data MigrationTop 5 ETL Tools for Salesforce Data Migration
Top 5 ETL Tools for Salesforce Data Migration
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data landscape map collection by aibdp
Big data landscape map collection by aibdpBig data landscape map collection by aibdp
Big data landscape map collection by aibdp
 
Data migration
Data migrationData migration
Data migration
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 
Developing a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business LeadersDeveloping a Data Strategy -- A Guide For Business Leaders
Developing a Data Strategy -- A Guide For Business Leaders
 
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...TechConnectr's Big Data Connection.  Digital Marketing KPIs, Targeting, Analy...
TechConnectr's Big Data Connection. Digital Marketing KPIs, Targeting, Analy...
 

Ähnlich wie Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Data: Predictive Analytics in a Changing Data Landscape

Managing Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using HadoopManaging Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using HadoopArvind Purushothaman
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsEmbarcadero Technologies
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...Innovative Management Services
 
Innovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataInnovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataCloudera, Inc.
 
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeBig Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeDataWorks Summit
 
Splunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitSplunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitIsaac Mosquera
 
Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0Isaac Mosquera
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationDavid Solomon
 
Leveraging Mainframe Machine and Log Data in Splunk Analytics
Leveraging Mainframe Machine and Log Data in Splunk AnalyticsLeveraging Mainframe Machine and Log Data in Splunk Analytics
Leveraging Mainframe Machine and Log Data in Splunk AnalyticsPrecisely
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
Business Agility Must Be Based on a New Flexible and Agile Data Approach
Business Agility Must Be Based on a New Flexible and Agile Data ApproachBusiness Agility Must Be Based on a New Flexible and Agile Data Approach
Business Agility Must Be Based on a New Flexible and Agile Data ApproachDenodo
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Taking Control of SharePoint in the Cloud
Taking Control of SharePoint in the CloudTaking Control of SharePoint in the Cloud
Taking Control of SharePoint in the CloudSherWeb
 
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...Boston Data Engineering
 

Ähnlich wie Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Data: Predictive Analytics in a Changing Data Landscape (20)

Managing Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using HadoopManaging Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using Hadoop
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data Assets
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
 
Innovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big DataInnovation Without Compromise: The Challenges of Securing Big Data
Innovation Without Compromise: The Challenges of Securing Big Data
 
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the EdgeBig Data at the Speed of Business: Lessons Learned from Leading at the Edge
Big Data at the Speed of Business: Lessons Learned from Leading at the Edge
 
Splunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop SummitSplunk/Socialize at Hadoop Summit
Splunk/Socialize at Hadoop Summit
 
Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0Hadoop summit socialize_v1.0
Hadoop summit socialize_v1.0
 
An AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven OrganizationAn AI Maturity Roadmap for Becoming a Data-Driven Organization
An AI Maturity Roadmap for Becoming a Data-Driven Organization
 
Leveraging Mainframe Machine and Log Data in Splunk Analytics
Leveraging Mainframe Machine and Log Data in Splunk AnalyticsLeveraging Mainframe Machine and Log Data in Splunk Analytics
Leveraging Mainframe Machine and Log Data in Splunk Analytics
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Business Agility Must Be Based on a New Flexible and Agile Data Approach
Business Agility Must Be Based on a New Flexible and Agile Data ApproachBusiness Agility Must Be Based on a New Flexible and Agile Data Approach
Business Agility Must Be Based on a New Flexible and Agile Data Approach
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Taking Control of SharePoint in the Cloud
Taking Control of SharePoint in the CloudTaking Control of SharePoint in the Cloud
Taking Control of SharePoint in the Cloud
 
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
Boston Data Engineering: Designing and Implementing Data Mesh at Your Company...
 

Kürzlich hochgeladen

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Data: Predictive Analytics in a Changing Data Landscape

  • 1. 1 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 BigData, Old Data, to AllData: Predictive Analytics in a Changing Data Landscape Usama Fayyad, Ph.D. Chief Data Officer – Barclays Twitter: @usamaf Talk at IIT Madras Chennai, India– March 27, 2015
  • 2. 2 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Outline • Big Data all around us • The CDO role and Data Axioms • Some of the issues in BigData • Introduction to Data Mining and Predictive Analytics Over BigData • Case studies • Summary and conclusions
  • 3. 3 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 What Matters in the Age of Analytics? 1.Being Able to exploit all the data that is available • not just what you've got available • what you can acquire and use to enhance your actions 2. Proliferating analytics throughout the organization • make every part of your business smarter • Actions and not just insights 3. Driving significant business value • embedding analytics into every area of your business can significantly drive top line revenues and/or bottom line cost efficiencies
  • 4. 4 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Why Big Data? A new term, with associated “Data Scientist” positions: • Big Data: is a mix of structured, semi-structured, and unstructured data: – Typically breaks barriers for traditional RDB storage – Typically breaks limits of indexing by “rows” – Typically requires intensive pre-processing before each query to extract “some structure” – usually using Map-Reduce type operations • Above leads to “messy” situations with no standard recipes or architecture: hence the need for “data scientists” – conduct “Data Expeditions” – Discovery and learning on the spot
  • 5. 5 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 The 4-V’s of “Big Data” • Big Data is Characterized by the 3-V’s: – Volume: larger than “normal” – challenging to load/process • Expensive to do ETL • Expensive to figure out how to index and retrieve • Multiple dimensions that are “key” – Velocity: Rate of arrival poses real-time constraints on what are typically “batch ETL” operations • If you fall behind catching up is extremely expensive (replicate very expensive systems) • Must keep up with rate and service queries on-the-fly – Variety: Mix of data types and varying degrees of structure • Non-standard schema • Lots of BLOB’s and CLOB’s • DB queries don’t know what to do with semi-structured and unstructured data.
  • 6. 6 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon “Classic” Data: e.g. Yahoo! User DNA
  • 7. 7 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon How Data Explodes: really big Social Graph (FB) Likes & friends likes Professional netwk - reputation Web searches on this person, hobbies, work, locationMetaData on everything Blogs, publications, news, local papers, job info, accidents
  • 8. 8 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 The Distinction between “Classic Data” and “Big Data” is fast disappearing • Most real data sets nowadays come with a serious mix of semi-structured and unstructured components: – Images – Video – Text descriptions and news, blogs, etc… – User and customer commentary – Reactions on social media: e.g. Twitter is a mix of data anyway • Using standard transforms, entity extraction, and new generation tools to transform unstructured raw data into semi-structured analyzable data
  • 9. 9 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Text Data: The Big Driver • We speak of “big data” and the “Variety” in 3-V’s • Reality: biggest driver of growth of Big Data has been text data – Most work on analysis of “images” and “video” data has really been reduced to analysis of surrounding text Nowhere more so than on the internet • Map-Reduce popularized by Google to address the problem of processing large amounts of text data: – Many operations with each being a simple operation but done at large scale – Indexing a full copy of the web – Frequent re-indexing
  • 10. 10 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 IT Log & Security Forensics & Analytics Automated Device Data Analytics Failure Analysis Proactive Fixes Product Planning Advertising Analytics Segmentation Recommendation Social Media Big Data Warehouse Analytics Cost Reduction Ad Hoc Insight Predictive Analytics Hadoop + MPP + EDW Find New Signal Predict Events 100% Capture Big Data Applications and Uses
  • 11. 11 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 A few words on: The Chief Data Officer Why are companies creating this position?
  • 12. 12 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Why a Chief Data Officer? • There is a fundamental realisation that Data needs to become a primary value driver at organizations • We have lots of Data • We spend much on it: in technology and people • We are not realising the value we expect from it • A strong business need to create the CDO role: • Traditional companies are not following, but adopting the model that actually works in other data-intensive industries • CDO has a seat at executive table: the voice of Data • Data done right is an essential element to unify large enterprises to unlock value form business synergies
  • 13. 13 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Why data is important Every customer interaction is an opportunity to capture data, learn, and act Data capture/ opportunity to learn Actions instant car loan product offering is displayed in the app Connie logs into mobile App to check her balance. We know she is looking for a car loan CRM data is used to pre-calculate Connie’s borrowing limit for a car loan Internal and external data sources, predictive models identify cross- sell opportunities Connie is offered a competitively priced ‘bespoke’ offer for car servicing / MOT Experimenting with different tools for different groups of Connie gets a better user experience in real- time during every session personalised journey based on pre- calculated limit for the car loan amount Customer Interactions (branch visits, telephone calls, digital, mobile, sales) Big Data platform Customer Interaction CRM Predictive Analytics Multivariate Testing Targeted offers during browsing Product Discovery Cross-sell Measure feedback per session Millions of customer interactions per day
  • 14. 14 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Fundamental Data Principles to Support Analytics Usama’s Obvious Data Axioms
  • 15. 15 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 1. Data gains value exponentially when integrated and coalesced. – When fragmented: dramatic value loss takes place; – increased costs; – reduced utility/integrity; – and increased security risks
  • 16. 16 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 2. Fusing Data together from disparate/independent sources is difficult to achieve and impossible to maintain Hence only viable approach is: • Intercepting and documenting at the source • fusing at the source • controlling lifecycle and flow
  • 17. 17 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 3. Standardisation is essential • for sustained ability to integrate data sources and hence growing value; • for simplifying down-stream systems and apps • For enforcing discipline as a firm increases its data sources
  • 18. 18 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 4. Data governance and policy must be centralised • needs to be enforced strongly else we slip into chaos and a Babylon of terms/languages • An Enterprise Data Architecture spanning structured and unstructured data
  • 19. 19 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 5. Recency Matters  data streaming in modelling and scoring • Often, accuracy of prediction drops quickly with time (e.g. consumer shopping) • Value of alerts drop exponentially with time… • Ability to trigger responses based on real- time scoring critical • Streaming, real-time model updates, real- time scoring
  • 20. 20 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 6. Data Infrastructure Needs: • rapid renewal & modernization: the pace of change and development of technology are very rapid – Design for migration and infrastructure replacement via abstraction layers that remove tech dependencies • Encryption and Masking: Persisting unencrypted confidential and secret data (even within secure firewalls) is an invitation for problems and risks
  • 21. 21 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Axioms 7. Data is a primary competency and not a side-activity supporting other processes • Hence specialized skills and know-how are a must • Generalists will create a hopeless mess • Data is difficult: modelling, architecture, and design to support analytics
  • 22. 22 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Reality Check So I am a marketer, how do I use BigData for my business? Social Media? Sentiment Analysis?
  • 23. 23 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
  • 24. 24 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
  • 25. 25 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 © LUMA Partners LLC 2014 Gamification SOCIAL LUMAscape Social Data Community Platforms Social/Mobile Apps & Games Social Networks - Other Social Search & Browsing Social Commerce Platforms Analytics Social Promotion Platforms Social Publishing Platforms Social Marketing Management Twitter Apps Facebook Gaming Facebook Apps Content Curation URL Shorteners Stream Platforms Traditional Publishers Social Business Software Content Sharing (Reviews/Q&A/Docs) Image/Video Sharing Blogging Platforms External (Customer) Facing Internal (Employee) Facing Social Ad Networks Social Intelligence Social Scoring Social TV Social Referral M A R K E T E R C O N S U M E RSocial Shopping Social Advertising Platforms Social Login/Sharing Social Content & Forums Denotes acquired company Denotes shuttered company Advocate Platforms Social Branded Video
  • 26. 26 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Reality Check So what should Users of Analytics in Big Data World Do?
  • 27. 27 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Load the Data into a Data Lake Your life will be so much easier as you can now do Data Acrobatics an other amazing data feats…
  • 28. 28 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 The Data Lake -- according to Waterline We loaded the Data! Congratulations Now What?
  • 29. 29 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 From a Data Lake to Amazon Browser Amazon Simple Search Amazon gives you facets Product details
  • 30. 30 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Reality Check So where do analysts and data Scientists spend all their time?
  • 31. 31 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Data Analysis Vs. Data Prep Let’s Mine the Data
  • 32. 32 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Reality Check So what do technology people worry about these days?
  • 33. 33 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 To Hadoop or not to Hadoop? when to use techniquesrequiring Map-Reduce and grid computing? • Typically organizations try to use Map-Reduce for everything to do with Big Data – This is actually very inefficient and often irrational – Certain operations require specialized storage • Updating segment memberships over large numbers of users • Defining new segments on user or usage data
  • 34. 34 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Drivers of Hadoop in Large Enterprises Cost of Storage • Fastest growing demand is more storage • Data in Data Warehouses have traditionally required expensive storage technology: –$100K per terabyte per year – cost of Teradata storage – $2.5K per terabyte – much lower per year – cost of Hadoop on commodity storage
  • 35. 35 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Analysis & Programming Software PIG HIPI
  • 36. 36 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Hadoop Stack
  • 37. 37 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
  • 38. 38 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
  • 39. 39 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
  • 40. 40 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014
  • 41. 41 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Reality: If Storage is Biggest Driver of Hadoop Adoption; What is the next biggest? ETL • Replaces expensive licenses • Much higher performance with lower infrastructure costs (processors, memory) • Flexibility in changing schema and representation • Flexibility on taking on unstructured and semi- structured data • Plus suite of really cool tools…
  • 42. 42 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Turning the three Vs of Big Data into Value Understand context and content • What are appropriate actions? • Is it Ok to associate my brand with this content? • Is content sad?, happy?, serious?, informative? Understand community sentiment • What is the emotion? • Is it negative or positive? • What is the health of my brand online? Understand customer intent? • What is each individual trying to achieve? • Can we predict what to do next? • Critical in cross-sell, personalization, monetization, advertising, etc…
  • 43. 43 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Many Business Uses of Predictive Analytics Analytic technique Uses in business Marketing and sales Identify potential customers; establish the effectiveness of a campaign Understanding customer behavior model churn, affinities, propensities, … Web analytics & metrics model user preferences from data, collaborative filtering, targeting, etc. Fraud detection Identify fraudulent transactions Credit scoring Establish credit worthiness of a customer requesting a loan Manufacturing process analysis Identify the causes of manufacturing problems Portfolio trading optimize a portfolio of financial instruments by maximizing returns & minimizing risks Healthcare Application fraud detection, cost optimization, detection of events like epidemics, etc... Insurance fraudulent claim detection, risk assessment Security and Surveillance intrusion detection, sensor data analysis, remote sensing, object/person detection, link analysis, etc...
  • 44. Case Studies: 1. Context Analysis (unstructured data) 2. Yahoo! predictive modeling
  • 45. 45 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Understanding Context
  • 46. 46 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Reality Check So who is the company we think is best at handling BigData?
  • 47. 47 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Biggest BigData in Advertising? Understanding Context for Ads
  • 48. 48 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today What Ad would you place here?
  • 49. 49 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today Damaging to Brand?
  • 50. 50 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 The Display Ads Challenge Today What Ad would you place here?
  • 51. CONFIDENTIAL Why did Google Serve this Ad? 51 this is how NetSeer actually sees this content NetSeer: SOLVING ACCURACY ISSUES | AMBIGUITY, WASTE, BRAND SAFETY 51
  • 52. CONFIDENTIAL high MPG ford low emission fuel efficiency ECONOMY CARS economy vehicles microscope lenses reading glasses autofocus bifocal refraction VISION TOOLS eye chart focus groups A/B testing consumer study surveying blind study analytics MARKET RESEARCH ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ WEBSITE.COM ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ electric vehicles service record safety rating NetSeer – How it works 52 focus <CONCEPT> DISCERNS AND MONETIZES HUMAN INTENT + Identifies Concepts expressed on a page + Disambiguates language + Builds increasingly rich profile over 52M 2.3B CONCEPTS RELATIONSHIPS BETWEEN CONCEPTS
  • 53. 53 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 NetSeer: Intent for Display • Currently Processing 4 Billion Impressions per Day
  • 54. 54 Keynote talk – DSAA 2014, Shanghai – Copyright Usama Fayyad © 2014 Problem: Hard to Understand User Intent Contextual Ad served by Google What NetSeer Sees:
  • 55. CONFIDENTIAL http://diabetes.webmd.com/default.htm NETSEER CONNECTING THE DOTS 55 Understanding intent of user online actions and connecting them to define user intent profile over time
  • 56. CONFIDENTIAL http://www.healthline.com/health/type -2-diabetes/insulin-pumps NETSEER CONNECTING THE DOTS 56 Understanding intent of user online actions and connecting them to define user intent profile over time
  • 57. CONFIDENTIAL Search: Sulfonylureas NETSEER CONNECTING THE DOTS 57 Understanding intent of user online actions and connecting them to define user intent profile over time
  • 58. Case Studies: 1. Context Analysis (unstructured data) 2. Yahoo! Predictive Modeling
  • 59. Yahoo! – One of Largest Destinations on the Web 80% of the U.S. Internet population uses Yahoo! – Over 600 million users per month globally! Global network of content, commerce, media, search and access products 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc. 25+ terabytes of data collected each day • Representing 1000’s of cataloged consumer behaviors More people visited Yahoo! in the past month than: • Use coupons • Vote • Recycle • Exercise regularly • Have children living at home • Wear sunscreen regularly Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005. Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers
  • 60. Yahoo! Big Data – A league of its own… Terrabytes of Warehoused Data 25 49 94 100 500 1,000 5,000 Amazon Korea Telecom AT&T Y!LiveStor Y!Panama Warehouse Walmart Y!Main warehouse GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude Millions of Events Processed Per Day 50 120 225 2,000 14,000 SABRE VISA NYSE YSM Y! Global
  • 61. Behavioral Targeting (BT) Search Ad Clicks Content Search Clicks BT Targeting ads to consumers whose recent behaviors online indicate which product category is relevant to them
  • 62. Male, age 32 Lives in SF Lawyer Searched on from London last week Searched on: “Italian restaurant Palo Alto” Checks Yahoo! Mail daily via PC & Phone Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Searched on: “Hillary Clinton” Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown Spends 10 hour/week On the internet Purchased Da Vinci Code from Amazon Yahoo! User DNA • On a per consumer basis: maintain a behavioral/interests profile and profitability (user value and LTV) metrics
  • 63. How it works | Network + Interests + Modelling Analyze predictive patterns for purchase cycles in over 100 product categories In each category, build models to describe behaviour most likely to lead to an ad response (i.e. click). Score each user for fit with every category…daily. Target ads to users who get highest ‘relevance’ scores in the targeting categories Varying Product Purchase CyclesMatch Users to the ModelsRewarding Good BehaviourIdentify Most Relevant Users
  • 64. Recency Matters, So Does Intensity Active now… …and with feeling
  • 65. Differentiation | Category specific modelling time intensityscore time intensityscore IntenseClickZone Example 1: Category Automotive Example 2: Category Travel/Last Minute Different models allow us to weight and determine intensity and recency Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click IntenseClickZone
  • 66. Differentiation | Category specific modelling time intensityscore Intense Click Zone Example 1: Category Automotive Different models allow us to weight and determine intensity and recency with no further activity, decay takes effect Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click user is in the Intense Click Zone
  • 67. Automobile Purchase Intender Example A test ad-campaign with a major Euro automobile manufacturer  Designed a test that served the same ad creative to test and control groups on Yahoo  Success metric: performing specific actions on Jaguar website Test results: 900% conversion lift vs. control group  Purchase Intenders were 9 times more likely to configure a vehicle, request a price quote or locate a dealer than consumers in the control group  ~3x higher click through rates vs. control group
  • 68. Mortgage Intender Example We found: 1,900,000 people looking for mortgage loans. +122% CTR Lift Mortgages Home Loans Refinancing Ditech Financing section in Real Estate Mortgage Loans area in Finance Real Estate section in Yellow Pages +626% Conv Lift Example search terms qualified for this target: Example Yahoo! Pages visited: Source: Campaign Click thru Rate lift is determined by Yahoo! Internal research. Conversion is the number of qualified leads from clicks over number of impressions served. Audience size represents the audiencewithin this behavioralinterestcategorythat has the highestpropensitytoengagewithabrandorproductandto clickon anoffer.Date: March 2006 Results from a client campaign on Yahoo! Network Example: Mortgages
  • 69. Experience summary at Yahoo! • Dealing with one of the largest data sources (25 Terabyte per day) • Behavioral Targeting business was grown from $20M to > $400M in 3 years of investment! • Yahoo! Specific? -- BigData critical to operations – Ad targeting creates huge value – Right teams to build technology (3 years of recruiting) – Search is a BigData problem (but this has moved to mainstream)
  • 70. Lessons Learned A lot more data than qualified talent  Finding talent in BigData is very difficult  Retaining talent in BigData is even harder At Yahoo! we created central group that drove huge value to company Data people need to feel like they have critical mass  Makes it easier to attract the right people  Makes it easier to retain Drive data efforts by business need, not by technology priorities  Chief Data Officer role at Yahoo! – now popular
  • 71. RapidMiner in Big Data Case Study
  • 72. RapidMiner’s Strengths 7272 • Open Source Community & Marketplace – Crowd-sourced innovation, quality assurance, market awareness. • Fully-integrated Platform – Integrated, process-based business analytics platform with focus on predictive analytics. • No Programming Required – Easy-to-use, low maintenance costs, standard platform for business analysts. • Advanced Analytics at Every Scale – In-memory, in-database and in- Hadoop analytics offer best option for every size of database. • Connectivity – More than 60 connectors (incl. SAP & Hadoop), allowing easy access to structured and unstructured data.
  • 73. 30,000+ Downloads per Month SELECT LIST OF RECIPIENT ORGANIZATIONS 7373 Government & DefensePharma & Healthcare Consulting Oil & Gas, Chemicals Financial ServicesSoftware & Analytics Retail Manufacturing Business Services Consumer ProductsAerospace Technology Entertainment Academia
  • 74. 74 PayPal Who > world leading online payment services provider Solution > Customer feedback and voice of the customer analysis, churn prediction and prevention, text mining and sentiment analysis SmartSoft Who > provider of solutions for preventing fraud, money laundering, and risks in financial institutions Solution > Integration of Rapid-I’s predictive analytics engine into their solutions for fraud detection and fraud prevention for the financial and telecom sectors Select Customer Stories
  • 75. SO WHAT IS THE BIG DATA STORY?
  • 76. So the data is naturally moving to Hadoop... Situation: –The data is moving to Hadoop for Cost (storage) and Convenience (ETL) forces –How do we get the value of predictive analytics to the data? Rather than move the data out, move the analytics to the data! –Can we minimize the need for data movement? –Data copies can become a management nightmare –Analytics on a “Business As Usual” manner require convenience 76
  • 77. Radoop – RapidMiner on Hadoop Opportunity: –Avoid expensive data movement –Leverage convenient data transformation –Thousands of data connectors, many over semi- structured and unstructured data Why is this big news? –Leverages a naturally occurring wave –Analytics over a richer variety requires much more processing –The energy placed on data extraction and loading moves to energy applied on actual analysis and modelling 77
  • 78. Big Picture on Big Data Analytics Observations and Concluding Remarks
  • 79. Retaining New Yahoo! Mail Registrants Often, Simple is Very Powerful!
  • 80. Integrating Mail and News Data showed that users often check their mail and news in the same session –But no easy way to navigate to Y! News from Y! Mail Mail users who also visit Y! News are 3X more active on Yahoo –Higher retention, repeat visits and time-spent on Yahoo
  • 81. “In the news” Module on Mail Welcome Page Increased retention on Mail for light users by 40%! –Est. Incremental revenue of $16m a year on Y! Mail alone
  • 82. Nordstrom: Queries with No Matches Julie Bornstein, Web Marketing Director –What are my customers looking for and not finding? –June 2002: queries for “belly button rings” –returned no matches in store –Why the sudden interest?
  • 83. Nordstrom: Queries with No Matches Print Ad Campaign Models happen to be sporting a navel ring Nordstrom does not sell navel rings What to do???
  • 84. The early days of mass auto production
  • 85. Today’s Auto: It just works! No need to understand what happens when you turn on ignition Very complex inside, but all simplicity on the outside
  • 86. Application Dates Internships: Just launched Contact : Faizan Chaudhary (faizan.chaudhary@barclays.com) and Tushar Wadaskar (tushar.wadaskar@barclays.com)
  • 87. A Sampling of Technology Opportunities at Barclays: We seek world-class technologists, data scientists, problem-solvers ,Data systems engineers-- the team that will re-invent Financial Services Amer Sajed CEO, Barclaycard US Bassel Ojjeh Chief Data Architect, DSI Simon Gordon MD, Head of Risk and Legal Technology - DSI innovating Data and Analytics solutions within the Financial Markets industry ; understand and influence the future of the global Derivatives market. Risk Technology within Barclays are looking for world-class big data and quantitative modeling skills. A great opportunity to work on some of the hardest technical problems in the industry by using open source to catch the bad guys stealing money all the way up to helping kids saving for their college We at Barclays are unleashing the power of disruptive technology. Innovation led by data and design keeping our customers at the heart of every product we build is our mantra. Come join the revolution. We have a challenge to quadruple our business here in the US - and that can only be delivered through analytics - in marketing, risk, fraud, and operations. And, Pune is the undisputed center of the universe! Faizan Chaudhary Director, Data Systems & Insights (DSI)
  • 88. 88 Imperial College – DSI Distinguished Lecture– Copyright Usama Fayyad © 2014 Usama Fayyad Faizan.chaudhary@barclays.com Tushar.Wadaskar@barclays.com www.barclays.com/joinus Thank You! & Questions Twitter – @Usamaf