SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Taming the BigData Beast for Value &
Insights
Usama Fayyad, Ph.D.
Executive Chairman -Oasis500
Chairman & CTO - Blue Kangaroo

Twitter: @usamaf
Sept 18th, 2013
Financial Times Live
Dubai - UAE

Usama Fayyad, Executive Chairman
usama@oasis500.com Twitter: @usamaf
BigData Overview – Copyright Usama Fayyad © 2013 1
Outline
• Big Data all around us
• Introduction to Data Mining and Predictive Analytics
Over BigData
• Some of the issues in BigData
• On-line data and facts
• Case studies on on-line marketing: Yahoo! Big Data
• Summary and conclusions

BigData Overview – Copyright Usama Fayyad © 2013

2
What Matters in the Age of Analytics?
1.Being Able to exploit all the data that is available
• not just what you've got available
• what you can acquire and use to enhance your actions

2. Proliferating analytics throughout the organization
• make every part of your business smarter

3. Driving significant business value
• embedding analytics into every area of your business can
help you drive top line revenues and/or bottom line cost
efficiencies

BigData Overview – Copyright Usama Fayyad © 2013

3
Why Big Data?
A new term, with associated “Data Scientist” positions:

• Big Data: is a mix of structured, semi-structured, and
unstructured data:
– Typically breaks barriers for traditional RDB storage
– Typically breaks limits of indexing by “rows”
– Typically requires intensive pre-processing before each query
to extract “some structure” – usually using Map-Reduce type
operations

• Above leads to “messy” situations with no standard
recipes or architecture: hence the need for “data
scientists”
– conduct “Data Expeditions”
– Discovery and learning on the spot
BigData Overview – Copyright Usama Fayyad © 2013

4
What Makes Data “Big Data”?
• Big Data is Characterized by the 3-V’s:
– Volume: larger than “normal” – challenging to load/process
• Expensive to do ETL
• Expensive to figure out how to index and retrieve
• Multiple dimensions that are “key”

– Velocity: Rate of arrival poses real-time constraints on what
are typically “batch ETL” operations
• If you fall behind catching up is extremely expensive (replicate very
expensive systems)
• Must keep up with rate and service queries on-the-fly

– Variety: Mix of data types and varying degrees of structure
• Non-standard schema
• Lots of BLOB’s and CLOB’s
• DB queries don’t know what to do with semi-structured and
unstructured data.
BigData Overview – Copyright Usama Fayyad © 2013

5
Today’s Data: e.g. Yahoo! User DNA
Lives in SF

Male, age 32

Lawyer

Checks Yahoo!
Clicked on
Mail daily via
Sony Plasma TV Searched on:
“Hillary Clinton” PC & Phone Searched on:
SS ad
“Italian
restaurant
Palo Alto”

Spends 10 hour/week
Purchased Da
On the internet
Vinci Code
from Amazon

Registration

Campaign

Searched on
from London
Has 25 IM Buddies,last week
Moderates 3 Y!
Groups, and hosts a
360 page viewed by
10k people

Behavior

BigData Overview – Copyright Usama Fayyad © 2013

6

Unknown
How Data Explodes: really big
Web searches on
this person,
hobbies, work,
location

Professional netwk
- reputation
MetaData on everything
Lawyer
Lives in SF

Clicked on
Sony Plasma TV
SS ad

Searched on:
“Hillary Clinton”

Checks Yahoo! Mail
daily via PC & Phone
Searched on:
“Italian
restaurant
Palo Alto”

Searched on from
London last week

Male, age 32
Spends 10 hour/week
On the internet

Has 25 IM Buddies,
Purchased Da Vinci
Blogs, publications,Y! Groups, and
Moderates 3
Code from Amazon
news, localhosts a 360 page viewed by
papers,
10k people

job info, accidents

Likes &
friends likes
Social Graph (FB)
BigData Overview – Copyright Usama Fayyad © 2013

7
The Distinction between “Data” and “Big
Data” is fast disappearing
• Most real data sets nowadays come with a serious mix
of semi-structured and unstructured components:
–
–
–
–
–

Images
Video
Text descriptions and news, blogs, etc…
User and customer commentary
Reactions on social media: e.g. Twitter is a mix of data
anyway

• Using standard transforms, entity extraction, and new
generation tools to transform unstructured raw data
into semi-structured analyzable data
BigData Overview – Copyright Usama Fayyad © 2013

8
Text Data: The Big Driver
• We speak of “big data” and the “Variety” in 3-V’s
• Reality: biggest driver of growth of Big Data has been
text data
– Most work on analysis of “images” and “video” data has
really been reduced to analysis of surrounding text

Nowhere more so than on the internet
• Map-Reduce popularized by Google to address the
problem of processing large amounts of text data:
– Many operations with each being a simple operation but
done at large scale
– Indexing a full copy of the web
– Frequent re-indexing
BigData Overview – Copyright Usama Fayyad © 2013

9
Reality Check on
Brand/Reputation
What are people saying about my
brand on Social Media?

BigData Overview – Copyright Usama Fayyad © 2013

10
BigData Overview – Copyright Usama Fayyad © 2013

11
Reality Check
Surely there are companies I can
work with that can help me make this
practical?

BigData Overview – Copyright Usama Fayyad © 2013

12
BigData Overview – Copyright Usama Fayyad © 2013

13
Reality Check
So what do technology people worry
about these days?

BigData Overview – Copyright Usama Fayyad © 2013

14
To Hadoop or not to Hadoop?
when to use techniques requiring Map-Reduce
and grid computing?
• Typically organizations try to use Map-Reduce
for everything to do with Big Data
– This is actually very inefficient and often irrational
– Certain operations require specialized storage
• Updating segment memberships over large numbers of
users
• Defining new segments on user or usage data
BigData Overview – Copyright Usama Fayyad © 2013

15
To Hadoop or not to Hadoop?
when to use techniques requiring Map-Reduce and grid
computing?
• Map-Reduce is useful when a very simple operation is
to be applied on a large body of unstructured data
– Typically this is during entity and attribute extraction
– Still need Big Data analysis post Hadoop

• Map-Reduce is not efficient or effective for tasks
involving deeper statistical modeling
– good for gathering counts and simple (sufficient) statistics
• E.g. how many times a keyword occurs, quick aggregation of simple
facts in unstructured data, estimates of variances, density, etc…

– Mostly pre-processing for Data Mining
BigData Overview – Copyright Usama Fayyad © 2013

16
Hadoop Use Cases by Data Type
ERP Financial Data
1%

Supply Chain Data Sensor Data
2%
Financial Trading Data
2%
4%
CRM Data
4%

Science Data
7%

Content and Preference Data
24%

Advertising Data
10%

IT Log Data
19%

Social Data
11%

Text and Language Data
16%
BigData Overview – Copyright Usama Fayyad © 2013

17
Analysis & Programming Software

PIG

HIPI

BigData Overview – Copyright Usama Fayyad © 2013

18
BigData Overview – Copyright Usama Fayyad © 2013

19
Many Business Uses
Analytic technique

Uses in business

Marketing and sales

Identify potential customers; establish the
effectiveness of a campaign

Understanding customer behavior

model churn, affinities, propensities, …

Web analytics & metrics

model user preferences from data, collaborative
filtering, targeting, etc.

Fraud detection

Identify fraudulent transactions

Credit scoring

Establish credit worthiness of a customer requesting a
loan

Manufacturing process analysis

Identify the causes of manufacturing problems

Portfolio trading

optimize a portfolio of financial instruments by
maximizing returns & minimizing risks

Healthcare Application

fraud detection, cost optimization, detection of events
like epidemics, etc...

Insurance

fraudulent claim detection, risk assessment

Security and Surveillance

intrusion detection, sensor data analysis, remote
sensing, object/person detection, link analysis, etc...
BigData Overview – Copyright Usama Fayyad © 2013 20
So Internet is a big place with
2B+ users and lots happening?
• Do we understand what each individual is
trying to achieve?
• Do we understand what a community’s
sentiment is?
• Do we understand context and content?

BigData Overview – Copyright Usama Fayyad © 2013

21
Social platforms 2013
• User accounts on FaceBook:
– 971M
– How many fake profiles?
– 83M (per FB 8/2012 report)

• How many users on LinkedIn
– 159.3M (as of 1/2013)

• How many Google+ users?
– 343 million active users in Q4 2012
» Sources: International Business Times 1/28/2013
– http://www.ibtimes.com/google-plus-becomes-worlds-no-2-social-network-afterfacebook-knocking-twitter-1042956

BigData Overview – Copyright Usama Fayyad © 2013

22
How Do People Spend Their On-line Time?

• On-line Shopping?

5%

• Searches?

21%

• Email/Communication?

19%

• Reading Content?

20%

• Social Networking?

22%

• Multimedia Sites?

13%

BigData Overview – Copyright Usama Fayyad © 2013

23
Interesting Events
• Google: How many searches in 2012?
– More than 1.2 Trillion (source Google)
– Estimates 1B to 3B per day

• Twitter: How many Tweets/day?
– 500M (per CEO Dick Costolo at IAB Engage, 10/2012)

• Facebook: Updates per day?
– More than 1B

• YouTube: Views/day
– 4 Billion hours/month - 4B views/day in 1/2012
– 72 hours of video uploaded every minute!

• Social Networks: users who have used sites for spying
on their partners?
– 56%
*Sources: Feb.2012 - compiled from Comscoredatamine.com, Nielsen.com,
thisDigitalLife.com, PewInternet.org
BigData Overview – Copyright Usama Fayyad © 2013

24
Interesting Events
• Country with Highest online friends?
– Brazil
– 481 friends per user
– Japan has least at 29

• Country with maximum time spent shopping on-line??
– China: 5 hours/week

*Sources: Feb.2012 - compiled from Comscoredatamine.com, Nielsen.com,
thisDigitalLife.com, PewInternet.org
BigData Overview – Copyright Usama Fayyad © 2013

25
So Internet is a big place with
lots happening?
• Do we understand what each individual is
trying to achieve?
• Do we understand what a community’s
sentiment is?
• Do we understand context and content?

BigData Overview – Copyright Usama Fayyad © 2013

26
Turning the 3 V’s of Big Data
Into Value

BigData Overview – Copyright Usama Fayyad © 2013

27
Turning the three Vs of Big Data into Value
Understand context and content
• What are appropriate ads?
• Is it Ok to associate my brand with this content?
• Is content sad?, happy?, serious?, informative?
Understand community sentiment
• What is the emotion?
• Is it negative or positive?
• What is the health of my brand online?
Understand user intent?
• What is each individual trying to achieve?
• Critical in monetization, advertising, etc…
BigData Overview – Copyright Usama Fayyad © 2013

28
Understanding Context

BigData Overview – Copyright Usama Fayyad © 2013

29
Reality Check
So who is the company we think is
best at handling BigData?

BigData Overview – Copyright Usama Fayyad © 2013

30
Biggest BigData in Advertising?
Understanding Context for Ads

BigData Overview – Copyright Usama Fayyad © 2013

31
The Display Ads Challenge Today

What Ad would you
place here?

BigData Overview – Copyright Usama Fayyad © 2013

32
The Display Ads Challenge Today
Damaging to Brand?

BigData Overview – Copyright Usama Fayyad © 2013

33
The Display Ads Challenge Today

What Ad
would you
place here?

BigData Overview – Copyright Usama Fayyad © 2013

34
The Display Ads Challenge Today
Irrelevant and
Damaging to Brand

Completely Irrelevant

BigData Overview – Copyright Usama Fayyad © 2013

35
NetSeer: Intent for Display
• Currently Processing 4 Billion Impressions per Day

BigData Overview – Copyright Usama Fayyad © 2013

36
Problem: Hard to Understand User
Intent
Contextual Ad served by Google

BigData Overview – Copyright Usama Fayyad © 2013

What NetSeer Sees:

37
Turning the three Vs of Big Data into Value
Understand context and content
• What are appropriate ads?
• Is it Ok to associate my brand with this content?
• Is content sad?, happy?, serious?, informative?
Understand community sentiment
• What is the emotion?
• Is it negative or positive?
• What is the health of my brand online?
Understand user intent?
• What is each individual trying to achieve?
• Critical in monetization, advertising, etc…
BigData Overview – Copyright Usama Fayyad © 2013

38
User Intent

BigData Overview – Copyright Usama Fayyad © 2013

39
User Intent
Case Study #1
Yahoo! Behavioral Targeting

40
Yahoo! – One of Largest
Destinations on the Web
More people visited
Yahoo! in the past
month than:

80% of the U.S. Internet population uses Yahoo!
– Over 600 million users per month globally!
•

Global network of content, commerce, media, search and access
products

•
•
•
•
•

•

100+ properties including mail, TV, news, shopping, finance,
autos, travel, games, movies, health, etc.

•

• 25+ terabytes of data collected each day

Use coupons
Vote
Recycle
Exercise regularly
Have children
living at home
Wear sunscreen
regularly

• Representing 1000’s of cataloged consumer behaviors
Data is used to develop content, consumer, category and campaign
insights for our key content partners and large advertisers
BigData Overview – Copyright Usama Fayyad © 2013

41

Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
Yahoo! Big Data – A league of its own…
Terrabytes of Warehoused Data

Millions of Events Processed Per Day

14,000

5,000

1,000
500

YSM

Y! Global

GRAND CHALLENGE PROBLEMS OF DATA PROCESSING
TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET

Y! Data Challenge Exceeds others by 2 orders of magnitude
BigData Overview – Copyright Usama Fayyad © 2013

42

Y! Main
warehouse

NYSE

Walmart

VISA

Y! Panama
Warehouse

SABRE

100

Y! LiveStor

225

94

49

AT&T

120

Amazon

50

25

Korea
Telecom

2,000
Behavioral Targeting (BT)
Search

Content

BT

Search Clicks
Ad Clicks

BigData Overview – Copyright Usama Fayyad © 2013

Targeting ads to consumers
whose recent behaviors
online indicate which
product category is relevant
43
to them
Yahoo! User DNA
Registration

Lives in SF

Male, age 32

Lawyer

Campaign

Behavior

Unknown

Checks Yahoo!
Clicked on
Mail daily via
Sony Plasma TV Searched on:
“Hillary Clinton” PC & Phone Searched on:
SS ad
“Italian
restaurant
Palo Alto”

Spends 10 hour/week
Purchased Da
On the internet
Vinci Code
from Amazon

Searched on
from London
last week
Has 25 IM
Buddies, Moderates
3 Y! Groups, and
hosts a 360 page
viewed by 10k
people

• On a per consumer basis: maintain a behavioral/interests profile and
profitability (user value and LTV) metrics
BigData Overview – Copyright Usama Fayyad © 2013

44
How it works | Network + Interests + Modelling
Varying ProductRelevant Users
Identify Users to the Models
Rewarding Good Behaviour
Match Most Purchase Cycles

Analyze predictive patterns for purchase
cycles in over 100 product categories

In each category, build models to describe
behaviour most likely to lead to an ad
response (i.e. click).

Score each user for fit with every
category…daily.

Target ads to users who get highest
„relevance‟ scores in the targeting
categories

BigData Overview – Copyright Usama Fayyad © 2013

45
Recency Matters, So Does Intensity
Active now…

…and with feeling

BigData Overview – Copyright Usama Fayyad © 2013

46
Differentiation | Category specific modelling
Example 2: Category Travel/Last Minute

Intense Click Zone

intensity score

Intense Click Zone

intensity score

Example 1: Category Automotive

time

Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click

time

Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click

Different models allow us to weight and determine intensity and recency

BigData Overview – Copyright Usama Fayyad © 2013

47
Differentiation | Category specific modelling
Example 1: Category Automotive

intensity score

user is in the Intense Click Zone

Intense Click Zone

time

with no further activity, decay takes effect

Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click

Different models allow us to weight and determine intensity and recency

BigData Overview – Copyright Usama Fayyad © 2013

48
Automobile Purchase Intender Example
• A test ad-campaign with a major Euro automobile manufacturer
– Designed a test that served the same ad creative to test and control groups
on Yahoo
– Success metric: performing specific actions on Jaguar website

• Test results: 900% conversion lift vs. control group
– Purchase Intenders were 9 times more likely to configure a vehicle, request
a price quote or locate a dealer than consumers in the control group
– ~3x higher click through rates vs. control group

BigData Overview – Copyright Usama Fayyad © 2013

49
Mortgage Intender Example
Example: Mortgages

Results from a client campaign on Yahoo!
Network
+626%
Conv Lift

We found:
1,900,000 people looking
for mortgage loans.

+122%
CTR Lift

Example search terms qualified for this target:

Mortgages

Home Loans Refinancing

Example Yahoo! Pages visited:

Financing section in Real Estate
Mortgage Loans area in Finance
Real Estate section in Yellow Pages

Date: March 2006

Source: Campaign Click thru Rate lift is determined by Yahoo! Internal
research. Conversion is the number of qualified leads from clicks over
number of impressions served. Audience size represents the audience
within this behavioral interest category that has the highest
propensity to engage with a brand or product and to click on

BigData Overview – Copyright Usama Fayyad © 2013

50

Ditech
Experience summary at Yahoo!
• Dealing with one of the largest data sources (25
Terabyte per day)
• BT business was grown from $20M to about $500M
in 3 years of investment!
• BigData critical to operations
– Ad targeting creates huge value
– Right teams to build technology (3 years of recruiting)
– Search is a BigData proble,

• Big demands for grid computing (Hadoop)
– Not all BigData can be handled via Hadoop
– Spunoff BigData Segmentation data platfrom: nPario
BigData Overview – Copyright Usama Fayyad © 2013

51
Lessons Learned
A lot more data than qualified talent
– Finding talent in BigData is very difficult
– Retaining talent in BigData is even harder

• At Yahoo! we created central group that drove huge
value to company
• Data people need to feel like they have critical mass
– Makes it easier to attract
– Makes it easier to retain

• Drive data efforts by business need, not by technology
priorities
– Chief Data Officer role at Yahoo! – now popular
BigData Overview – Copyright Usama Fayyad © 2013

52
BigData Analytics for Organizations
• Key to competitive Intelligence:
– Understand context
– Understand intent

• Key to understanding consumer trends through
social media analysis
– Brand issues
– Trend issues
– Anticipating the next shift

BigData Overview – Copyright Usama Fayyad © 2013

53
Big Picture on Big Data Analytics
Key points

BigData Overview – Copyright Usama Fayyad © 2013

54
Sometimes,
Simple is Very Powerful!
Retaining New Yahoo! Mail Registrants
Integrating Mail and News
• Data showed that users often check their mail and
news in the same session
– But no easy way to navigate to Y! News from Y! Mail

• Mail users who also visit Y! News are 3X more active
on Yahoo
– Higher retention, repeat visits and time-spent on
Yahoo

BigData Overview – Copyright Usama Fayyad © 2013

56
“In the news” Module on Mail Welcome Page
• Increased retention on Mail for light users by 40%!
– Est. Incremental revenue of $16m a year on Y! Mail alone

BigData Overview – Copyright Usama Fayyad © 2013

57
Threats & Opportunities
• Data world is changing, especially in on-line businesses
• Major shifts from relational DB to NoSQL, document-oriented
stores
• Connecting new world to “old”world?
– Convenience of execution – integration with data platforms
– Appropriateness of algorithms to BigData
– Unstructured data algorithms:
• Text, Semi-structured and Unstructured data
• Entity extraction a must
• Appropriate theory and probability distributions (power laws, fat tails)
• Sparse Data
– Model management and proper aging of models
– Getting to basics so we can decide what models to use:
• Understanding noise and distributions
• Data tours
BigData Overview – Copyright Usama Fayyad © 2013

58
Thank You! & Questions
Usama Fayyad -

usama@fayyad.com
usama_fayyad@yahoo.com

Twitter – @Usamaf
+1-206-529-5123
www.BlueKangaroo.com
www.Oasis500.com
www.open-insights.com

BigData Overview – Copyright Usama Fayyad © 2013

59

Weitere ähnliche Inhalte

Was ist angesagt?

How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesTony Pearson
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...Neo4j
 
Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data AnalyticsVijay Rao
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsAlan Quayle
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Edureka!
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best PracticesYellowfin
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIMC Institute
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
 
Big Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceBig Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceDavid Feinleib
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformNeo4j
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lakeKaran Sachdeva
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 

Was ist angesagt? (20)

How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
 
Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data Analytics
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
 
Big Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 ConferenceBig Data Trends - WorldFuture 2015 Conference
Big Data Trends - WorldFuture 2015 Conference
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 
Enabling digital business with governed data lake
Enabling digital business with governed data lakeEnabling digital business with governed data lake
Enabling digital business with governed data lake
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 

Ähnlich wie Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIBOS conference in Dubai

The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015CMR WORLD TECH
 
Gai 2013 q3_seminar_20130826_03
Gai 2013 q3_seminar_20130826_03Gai 2013 q3_seminar_20130826_03
Gai 2013 q3_seminar_20130826_03GA InfoMart Ltd
 
Big data and your career final
Big data and your career finalBig data and your career final
Big data and your career finalMarina Kerbel
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015Sanmeet Dhokay
 
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201... It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...Edgar Alejandro Villegas
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big dataVedanand Singh
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInMinh-Hoang Nguyen
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptxPentaTech
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAkshata Humbe
 
Turning big data to business outcomes
Turning big data to business outcomes Turning big data to business outcomes
Turning big data to business outcomes Rolta
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnIBM Danmark
 
Big data
Big dataBig data
Big dataRiya
 
How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013Jaime Nistal
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxdickonsondorris
 

Ähnlich wie Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIBOS conference in Dubai (20)

The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015
 
Gai 2013 q3_seminar_20130826_03
Gai 2013 q3_seminar_20130826_03Gai 2013 q3_seminar_20130826_03
Gai 2013 q3_seminar_20130826_03
 
Big data and your career final
Big data and your career finalBig data and your career final
Big data and your career final
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201... It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
It’s Not About Big Data – It’s About Big Insights - SAP Webinar - 20 Aug 201...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedInBig Data Ecosystem @ LinkedIn
Big Data Ecosystem @ LinkedIn
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
BigDataFinal.pptx
BigDataFinal.pptxBigDataFinal.pptx
BigDataFinal.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Turning big data to business outcomes
Turning big data to business outcomes Turning big data to business outcomes
Turning big data to business outcomes
 
Future of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren RavnFuture of Power: Big Data - Søren Ravn
Future of Power: Big Data - Søren Ravn
 
Ictam big data
Ictam big dataIctam big data
Ictam big data
 
Big data
Big dataBig data
Big data
 
How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013How to get started in extracting business value from big data 1 of 2 oct 2013
How to get started in extracting business value from big data 1 of 2 oct 2013
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 

Kürzlich hochgeladen

Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfMichael Silva
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
Role of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxRole of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxNarayaniTripathi2
 
project management information system lecture notes
project management information system lecture notesproject management information system lecture notes
project management information system lecture notesongomchris
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economiccinemoviesu
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Champak Jhagmag
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Devarsh Vakil
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...Amil baba
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTharshitverma1762
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 

Kürzlich hochgeladen (20)

Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdf
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
Role of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxRole of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptx
 
project management information system lecture notes
project management information system lecture notesproject management information system lecture notes
project management information system lecture notes
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results Presentation
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economic
 
Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024Unveiling Business Expansion Trends in 2024
Unveiling Business Expansion Trends in 2024
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACTGOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
GOODSANDSERVICETAX IN INDIAN ECONOMY IMPACT
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 

Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIBOS conference in Dubai

  • 1. Taming the BigData Beast for Value & Insights Usama Fayyad, Ph.D. Executive Chairman -Oasis500 Chairman & CTO - Blue Kangaroo Twitter: @usamaf Sept 18th, 2013 Financial Times Live Dubai - UAE Usama Fayyad, Executive Chairman usama@oasis500.com Twitter: @usamaf BigData Overview – Copyright Usama Fayyad © 2013 1
  • 2. Outline • Big Data all around us • Introduction to Data Mining and Predictive Analytics Over BigData • Some of the issues in BigData • On-line data and facts • Case studies on on-line marketing: Yahoo! Big Data • Summary and conclusions BigData Overview – Copyright Usama Fayyad © 2013 2
  • 3. What Matters in the Age of Analytics? 1.Being Able to exploit all the data that is available • not just what you've got available • what you can acquire and use to enhance your actions 2. Proliferating analytics throughout the organization • make every part of your business smarter 3. Driving significant business value • embedding analytics into every area of your business can help you drive top line revenues and/or bottom line cost efficiencies BigData Overview – Copyright Usama Fayyad © 2013 3
  • 4. Why Big Data? A new term, with associated “Data Scientist” positions: • Big Data: is a mix of structured, semi-structured, and unstructured data: – Typically breaks barriers for traditional RDB storage – Typically breaks limits of indexing by “rows” – Typically requires intensive pre-processing before each query to extract “some structure” – usually using Map-Reduce type operations • Above leads to “messy” situations with no standard recipes or architecture: hence the need for “data scientists” – conduct “Data Expeditions” – Discovery and learning on the spot BigData Overview – Copyright Usama Fayyad © 2013 4
  • 5. What Makes Data “Big Data”? • Big Data is Characterized by the 3-V’s: – Volume: larger than “normal” – challenging to load/process • Expensive to do ETL • Expensive to figure out how to index and retrieve • Multiple dimensions that are “key” – Velocity: Rate of arrival poses real-time constraints on what are typically “batch ETL” operations • If you fall behind catching up is extremely expensive (replicate very expensive systems) • Must keep up with rate and service queries on-the-fly – Variety: Mix of data types and varying degrees of structure • Non-standard schema • Lots of BLOB’s and CLOB’s • DB queries don’t know what to do with semi-structured and unstructured data. BigData Overview – Copyright Usama Fayyad © 2013 5
  • 6. Today’s Data: e.g. Yahoo! User DNA Lives in SF Male, age 32 Lawyer Checks Yahoo! Clicked on Mail daily via Sony Plasma TV Searched on: “Hillary Clinton” PC & Phone Searched on: SS ad “Italian restaurant Palo Alto” Spends 10 hour/week Purchased Da On the internet Vinci Code from Amazon Registration Campaign Searched on from London Has 25 IM Buddies,last week Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people Behavior BigData Overview – Copyright Usama Fayyad © 2013 6 Unknown
  • 7. How Data Explodes: really big Web searches on this person, hobbies, work, location Professional netwk - reputation MetaData on everything Lawyer Lives in SF Clicked on Sony Plasma TV SS ad Searched on: “Hillary Clinton” Checks Yahoo! Mail daily via PC & Phone Searched on: “Italian restaurant Palo Alto” Searched on from London last week Male, age 32 Spends 10 hour/week On the internet Has 25 IM Buddies, Purchased Da Vinci Blogs, publications,Y! Groups, and Moderates 3 Code from Amazon news, localhosts a 360 page viewed by papers, 10k people job info, accidents Likes & friends likes Social Graph (FB) BigData Overview – Copyright Usama Fayyad © 2013 7
  • 8. The Distinction between “Data” and “Big Data” is fast disappearing • Most real data sets nowadays come with a serious mix of semi-structured and unstructured components: – – – – – Images Video Text descriptions and news, blogs, etc… User and customer commentary Reactions on social media: e.g. Twitter is a mix of data anyway • Using standard transforms, entity extraction, and new generation tools to transform unstructured raw data into semi-structured analyzable data BigData Overview – Copyright Usama Fayyad © 2013 8
  • 9. Text Data: The Big Driver • We speak of “big data” and the “Variety” in 3-V’s • Reality: biggest driver of growth of Big Data has been text data – Most work on analysis of “images” and “video” data has really been reduced to analysis of surrounding text Nowhere more so than on the internet • Map-Reduce popularized by Google to address the problem of processing large amounts of text data: – Many operations with each being a simple operation but done at large scale – Indexing a full copy of the web – Frequent re-indexing BigData Overview – Copyright Usama Fayyad © 2013 9
  • 10. Reality Check on Brand/Reputation What are people saying about my brand on Social Media? BigData Overview – Copyright Usama Fayyad © 2013 10
  • 11. BigData Overview – Copyright Usama Fayyad © 2013 11
  • 12. Reality Check Surely there are companies I can work with that can help me make this practical? BigData Overview – Copyright Usama Fayyad © 2013 12
  • 13. BigData Overview – Copyright Usama Fayyad © 2013 13
  • 14. Reality Check So what do technology people worry about these days? BigData Overview – Copyright Usama Fayyad © 2013 14
  • 15. To Hadoop or not to Hadoop? when to use techniques requiring Map-Reduce and grid computing? • Typically organizations try to use Map-Reduce for everything to do with Big Data – This is actually very inefficient and often irrational – Certain operations require specialized storage • Updating segment memberships over large numbers of users • Defining new segments on user or usage data BigData Overview – Copyright Usama Fayyad © 2013 15
  • 16. To Hadoop or not to Hadoop? when to use techniques requiring Map-Reduce and grid computing? • Map-Reduce is useful when a very simple operation is to be applied on a large body of unstructured data – Typically this is during entity and attribute extraction – Still need Big Data analysis post Hadoop • Map-Reduce is not efficient or effective for tasks involving deeper statistical modeling – good for gathering counts and simple (sufficient) statistics • E.g. how many times a keyword occurs, quick aggregation of simple facts in unstructured data, estimates of variances, density, etc… – Mostly pre-processing for Data Mining BigData Overview – Copyright Usama Fayyad © 2013 16
  • 17. Hadoop Use Cases by Data Type ERP Financial Data 1% Supply Chain Data Sensor Data 2% Financial Trading Data 2% 4% CRM Data 4% Science Data 7% Content and Preference Data 24% Advertising Data 10% IT Log Data 19% Social Data 11% Text and Language Data 16% BigData Overview – Copyright Usama Fayyad © 2013 17
  • 18. Analysis & Programming Software PIG HIPI BigData Overview – Copyright Usama Fayyad © 2013 18
  • 19. BigData Overview – Copyright Usama Fayyad © 2013 19
  • 20. Many Business Uses Analytic technique Uses in business Marketing and sales Identify potential customers; establish the effectiveness of a campaign Understanding customer behavior model churn, affinities, propensities, … Web analytics & metrics model user preferences from data, collaborative filtering, targeting, etc. Fraud detection Identify fraudulent transactions Credit scoring Establish credit worthiness of a customer requesting a loan Manufacturing process analysis Identify the causes of manufacturing problems Portfolio trading optimize a portfolio of financial instruments by maximizing returns & minimizing risks Healthcare Application fraud detection, cost optimization, detection of events like epidemics, etc... Insurance fraudulent claim detection, risk assessment Security and Surveillance intrusion detection, sensor data analysis, remote sensing, object/person detection, link analysis, etc... BigData Overview – Copyright Usama Fayyad © 2013 20
  • 21. So Internet is a big place with 2B+ users and lots happening? • Do we understand what each individual is trying to achieve? • Do we understand what a community’s sentiment is? • Do we understand context and content? BigData Overview – Copyright Usama Fayyad © 2013 21
  • 22. Social platforms 2013 • User accounts on FaceBook: – 971M – How many fake profiles? – 83M (per FB 8/2012 report) • How many users on LinkedIn – 159.3M (as of 1/2013) • How many Google+ users? – 343 million active users in Q4 2012 » Sources: International Business Times 1/28/2013 – http://www.ibtimes.com/google-plus-becomes-worlds-no-2-social-network-afterfacebook-knocking-twitter-1042956 BigData Overview – Copyright Usama Fayyad © 2013 22
  • 23. How Do People Spend Their On-line Time? • On-line Shopping? 5% • Searches? 21% • Email/Communication? 19% • Reading Content? 20% • Social Networking? 22% • Multimedia Sites? 13% BigData Overview – Copyright Usama Fayyad © 2013 23
  • 24. Interesting Events • Google: How many searches in 2012? – More than 1.2 Trillion (source Google) – Estimates 1B to 3B per day • Twitter: How many Tweets/day? – 500M (per CEO Dick Costolo at IAB Engage, 10/2012) • Facebook: Updates per day? – More than 1B • YouTube: Views/day – 4 Billion hours/month - 4B views/day in 1/2012 – 72 hours of video uploaded every minute! • Social Networks: users who have used sites for spying on their partners? – 56% *Sources: Feb.2012 - compiled from Comscoredatamine.com, Nielsen.com, thisDigitalLife.com, PewInternet.org BigData Overview – Copyright Usama Fayyad © 2013 24
  • 25. Interesting Events • Country with Highest online friends? – Brazil – 481 friends per user – Japan has least at 29 • Country with maximum time spent shopping on-line?? – China: 5 hours/week *Sources: Feb.2012 - compiled from Comscoredatamine.com, Nielsen.com, thisDigitalLife.com, PewInternet.org BigData Overview – Copyright Usama Fayyad © 2013 25
  • 26. So Internet is a big place with lots happening? • Do we understand what each individual is trying to achieve? • Do we understand what a community’s sentiment is? • Do we understand context and content? BigData Overview – Copyright Usama Fayyad © 2013 26
  • 27. Turning the 3 V’s of Big Data Into Value BigData Overview – Copyright Usama Fayyad © 2013 27
  • 28. Turning the three Vs of Big Data into Value Understand context and content • What are appropriate ads? • Is it Ok to associate my brand with this content? • Is content sad?, happy?, serious?, informative? Understand community sentiment • What is the emotion? • Is it negative or positive? • What is the health of my brand online? Understand user intent? • What is each individual trying to achieve? • Critical in monetization, advertising, etc… BigData Overview – Copyright Usama Fayyad © 2013 28
  • 29. Understanding Context BigData Overview – Copyright Usama Fayyad © 2013 29
  • 30. Reality Check So who is the company we think is best at handling BigData? BigData Overview – Copyright Usama Fayyad © 2013 30
  • 31. Biggest BigData in Advertising? Understanding Context for Ads BigData Overview – Copyright Usama Fayyad © 2013 31
  • 32. The Display Ads Challenge Today What Ad would you place here? BigData Overview – Copyright Usama Fayyad © 2013 32
  • 33. The Display Ads Challenge Today Damaging to Brand? BigData Overview – Copyright Usama Fayyad © 2013 33
  • 34. The Display Ads Challenge Today What Ad would you place here? BigData Overview – Copyright Usama Fayyad © 2013 34
  • 35. The Display Ads Challenge Today Irrelevant and Damaging to Brand Completely Irrelevant BigData Overview – Copyright Usama Fayyad © 2013 35
  • 36. NetSeer: Intent for Display • Currently Processing 4 Billion Impressions per Day BigData Overview – Copyright Usama Fayyad © 2013 36
  • 37. Problem: Hard to Understand User Intent Contextual Ad served by Google BigData Overview – Copyright Usama Fayyad © 2013 What NetSeer Sees: 37
  • 38. Turning the three Vs of Big Data into Value Understand context and content • What are appropriate ads? • Is it Ok to associate my brand with this content? • Is content sad?, happy?, serious?, informative? Understand community sentiment • What is the emotion? • Is it negative or positive? • What is the health of my brand online? Understand user intent? • What is each individual trying to achieve? • Critical in monetization, advertising, etc… BigData Overview – Copyright Usama Fayyad © 2013 38
  • 39. User Intent BigData Overview – Copyright Usama Fayyad © 2013 39
  • 40. User Intent Case Study #1 Yahoo! Behavioral Targeting 40
  • 41. Yahoo! – One of Largest Destinations on the Web More people visited Yahoo! in the past month than: 80% of the U.S. Internet population uses Yahoo! – Over 600 million users per month globally! • Global network of content, commerce, media, search and access products • • • • • • 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc. • • 25+ terabytes of data collected each day Use coupons Vote Recycle Exercise regularly Have children living at home Wear sunscreen regularly • Representing 1000’s of cataloged consumer behaviors Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers BigData Overview – Copyright Usama Fayyad © 2013 41 Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
  • 42. Yahoo! Big Data – A league of its own… Terrabytes of Warehoused Data Millions of Events Processed Per Day 14,000 5,000 1,000 500 YSM Y! Global GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude BigData Overview – Copyright Usama Fayyad © 2013 42 Y! Main warehouse NYSE Walmart VISA Y! Panama Warehouse SABRE 100 Y! LiveStor 225 94 49 AT&T 120 Amazon 50 25 Korea Telecom 2,000
  • 43. Behavioral Targeting (BT) Search Content BT Search Clicks Ad Clicks BigData Overview – Copyright Usama Fayyad © 2013 Targeting ads to consumers whose recent behaviors online indicate which product category is relevant 43 to them
  • 44. Yahoo! User DNA Registration Lives in SF Male, age 32 Lawyer Campaign Behavior Unknown Checks Yahoo! Clicked on Mail daily via Sony Plasma TV Searched on: “Hillary Clinton” PC & Phone Searched on: SS ad “Italian restaurant Palo Alto” Spends 10 hour/week Purchased Da On the internet Vinci Code from Amazon Searched on from London last week Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k people • On a per consumer basis: maintain a behavioral/interests profile and profitability (user value and LTV) metrics BigData Overview – Copyright Usama Fayyad © 2013 44
  • 45. How it works | Network + Interests + Modelling Varying ProductRelevant Users Identify Users to the Models Rewarding Good Behaviour Match Most Purchase Cycles Analyze predictive patterns for purchase cycles in over 100 product categories In each category, build models to describe behaviour most likely to lead to an ad response (i.e. click). Score each user for fit with every category…daily. Target ads to users who get highest „relevance‟ scores in the targeting categories BigData Overview – Copyright Usama Fayyad © 2013 45
  • 46. Recency Matters, So Does Intensity Active now… …and with feeling BigData Overview – Copyright Usama Fayyad © 2013 46
  • 47. Differentiation | Category specific modelling Example 2: Category Travel/Last Minute Intense Click Zone intensity score Intense Click Zone intensity score Example 1: Category Automotive time Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click time Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Different models allow us to weight and determine intensity and recency BigData Overview – Copyright Usama Fayyad © 2013 47
  • 48. Differentiation | Category specific modelling Example 1: Category Automotive intensity score user is in the Intense Click Zone Intense Click Zone time with no further activity, decay takes effect Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click Different models allow us to weight and determine intensity and recency BigData Overview – Copyright Usama Fayyad © 2013 48
  • 49. Automobile Purchase Intender Example • A test ad-campaign with a major Euro automobile manufacturer – Designed a test that served the same ad creative to test and control groups on Yahoo – Success metric: performing specific actions on Jaguar website • Test results: 900% conversion lift vs. control group – Purchase Intenders were 9 times more likely to configure a vehicle, request a price quote or locate a dealer than consumers in the control group – ~3x higher click through rates vs. control group BigData Overview – Copyright Usama Fayyad © 2013 49
  • 50. Mortgage Intender Example Example: Mortgages Results from a client campaign on Yahoo! Network +626% Conv Lift We found: 1,900,000 people looking for mortgage loans. +122% CTR Lift Example search terms qualified for this target: Mortgages Home Loans Refinancing Example Yahoo! Pages visited: Financing section in Real Estate Mortgage Loans area in Finance Real Estate section in Yellow Pages Date: March 2006 Source: Campaign Click thru Rate lift is determined by Yahoo! Internal research. Conversion is the number of qualified leads from clicks over number of impressions served. Audience size represents the audience within this behavioral interest category that has the highest propensity to engage with a brand or product and to click on BigData Overview – Copyright Usama Fayyad © 2013 50 Ditech
  • 51. Experience summary at Yahoo! • Dealing with one of the largest data sources (25 Terabyte per day) • BT business was grown from $20M to about $500M in 3 years of investment! • BigData critical to operations – Ad targeting creates huge value – Right teams to build technology (3 years of recruiting) – Search is a BigData proble, • Big demands for grid computing (Hadoop) – Not all BigData can be handled via Hadoop – Spunoff BigData Segmentation data platfrom: nPario BigData Overview – Copyright Usama Fayyad © 2013 51
  • 52. Lessons Learned A lot more data than qualified talent – Finding talent in BigData is very difficult – Retaining talent in BigData is even harder • At Yahoo! we created central group that drove huge value to company • Data people need to feel like they have critical mass – Makes it easier to attract – Makes it easier to retain • Drive data efforts by business need, not by technology priorities – Chief Data Officer role at Yahoo! – now popular BigData Overview – Copyright Usama Fayyad © 2013 52
  • 53. BigData Analytics for Organizations • Key to competitive Intelligence: – Understand context – Understand intent • Key to understanding consumer trends through social media analysis – Brand issues – Trend issues – Anticipating the next shift BigData Overview – Copyright Usama Fayyad © 2013 53
  • 54. Big Picture on Big Data Analytics Key points BigData Overview – Copyright Usama Fayyad © 2013 54
  • 55. Sometimes, Simple is Very Powerful! Retaining New Yahoo! Mail Registrants
  • 56. Integrating Mail and News • Data showed that users often check their mail and news in the same session – But no easy way to navigate to Y! News from Y! Mail • Mail users who also visit Y! News are 3X more active on Yahoo – Higher retention, repeat visits and time-spent on Yahoo BigData Overview – Copyright Usama Fayyad © 2013 56
  • 57. “In the news” Module on Mail Welcome Page • Increased retention on Mail for light users by 40%! – Est. Incremental revenue of $16m a year on Y! Mail alone BigData Overview – Copyright Usama Fayyad © 2013 57
  • 58. Threats & Opportunities • Data world is changing, especially in on-line businesses • Major shifts from relational DB to NoSQL, document-oriented stores • Connecting new world to “old”world? – Convenience of execution – integration with data platforms – Appropriateness of algorithms to BigData – Unstructured data algorithms: • Text, Semi-structured and Unstructured data • Entity extraction a must • Appropriate theory and probability distributions (power laws, fat tails) • Sparse Data – Model management and proper aging of models – Getting to basics so we can decide what models to use: • Understanding noise and distributions • Data tours BigData Overview – Copyright Usama Fayyad © 2013 58
  • 59. Thank You! & Questions Usama Fayyad - usama@fayyad.com usama_fayyad@yahoo.com Twitter – @Usamaf +1-206-529-5123 www.BlueKangaroo.com www.Oasis500.com www.open-insights.com BigData Overview – Copyright Usama Fayyad © 2013 59

Hinweis der Redaktion

  1. One of the reasons Big Data is growing to to help companies analyze & react to conversations happening online.
  2. Imagine cramming this into your Enterprise Data Model.Dozens & dozens of producers of data, how do you integrate all of this “unstructured data”
  3. Wholly unscientific sampling, sourced from wiki.apache.org powered by pageand private meetings.
  4. Add every companyJython
  5. http://blogs.the451group.com/information_management/files/2011/04/Figures-Aslett_web.jpg
  6. Besides being much more brand safe and avoiding the very embarrassing ad placements we reviewed a few slides back – we do the best job of eliminating the wasted impression – here is an example of this…Google display network serves a DeWalt power tool ad(nail gun) to a page with the keyword match “NailGun”; however this is content that speaks to a Java programming app --- on the right is an example of our technology engine and dashboard – identifying the concepts that not only live on this particular web page – but those concept groups closely related…so we see it as all about Java software programming – and not power tools.