SlideShare a Scribd company logo
1 of 40
Download to read offline
Getting Started with
Big Data
and Making an Impact

@TOUG Jan. 22, 2014
Ian Abramson
EPAM Systems, Canada
January 2014

Confidential
Big Data .. The Silver Bullet?

Confidential
Agenda
Introductions and Goals

What is Big Data

Technology Choices

Making an Impact with Data Science

Use Cases

Confidential

3
About Me
•
•
•
•
•
•
•
•
•
•
•

Degree in Applied Mathematics
Over 20 years with Oracle software
Over 10 years with data warehouses
Big Data Analyst
Author of numerous Oracle books
Blogger: http://ians-oracle.blogspot.com/
Oracle ACE
IOUG Past-President
TOUG Board Member
Toronto based
Twitter: @iabramson
4
WHERE IS BIG DATA?

5
Why Big data?
• New data sources

• Unprecedented volume
• Real World Issues
– Data Systems are reaching capacity
requiring high cost alternatives
– Archive data is too far offline
– Organizations require cost effective
options
– Retain all data for future analysis

6
“Data
becomes
“Big Data”,
when the
size of the
data
becomes a
part of the
problem”

Roger Magoulas
(O’Reily Research)

Big data is
high-volume,
high-velocity
and highvariety
information
assets that
demand costeffective,
innovative
forms of
information
processing for
enhanced
insight and
decision
making.

Gartner:

Big Data is a
term/concept,
which is used
as a generic
name for a
“generation of
technologies
and
architectures
designed to
extract value
economically
from very large
volumes of a
wide variety of
data by
enabling highvelocity
capture,
discovery,
and/or
analysis”.

IDC:

Big data is the
term for a
collection of
data sets so
large and
complex that it
becomes
difficult to
process using
on-hand
database
management
tools or
traditional data
processing
applications.
The challenges
include
capture,
curation,
storage,
search,
sharing,
transfer,
analysis, and
visualization.

Wikipedia:

Big Data Defined

7
The Attributes of Big Data
• Classic Data Attributes:
– Volume
– Velocity
– Variety

• Big Data Technical Attributes
– massive, parallel computing environment
– infinitely scalable computing clusters, including cloud
• Three main technical requirements
– Need medium to accommodate large volumes for storage and data streaming
– Require the computing horsepower and architectural approach which allows
for the processing of the data where it exists and not via extraction and
processing
– Use the appropriate programming which allows for a computational paradigm,
which performs computations in a highly parallel and scalable environment

8
Challenges for Big Data

http://tdwi.org/blogs/fern-halper/2013/10/four-big-data-challenges.aspx

Confidential

9
Big Data and Data Warehouse – war or peaceful
coexistence?

•

The problem – different uses – different schemas and different partitioning. In most cases the requirements are orthogonal – impossible to
provide optimal for everybody data partitioning/indexing

•

The ideal goal – acquire and store “as is” – access using multiple models. Need for powerful artificial
intelligence knowledge base and data access code generators.

•

Will never be optimal for everybody unless huge redundancy

•

Problems are less painful if most of the data are read anyway. Good for analytics, not good for OLTP

•

Eventually Big data platforms will become DW platforms with well developed access interfaces

•

Until then -> acquire and store and then distribute on demand to conventional DW and data marts

10
The New Data Architecture
Data Archive

Operational Systems

Enterprise Data
Social & Clickstream
Sensor Generated

Big Data

ODS

Hadoop
Public Data

HDFS

Map/Reduce

Historical Data
Data Warehouse/BI/Analytics
Other New Sources

11
TECHNOLOGY CHOICES

Confidential

12
The Choices for Your Data

RDBMS
- High Concurrency
- TB Storage
- Indexed reads
- Efficient updates
- Caching
- Highly secure

Analytic
Appliances
- Scalable
- Medium Concurrency
- High Volume
Processing (Postgres)
- No indexes
- TB +
Netezza (128TB/rack)
Oracle (300TB/Rack)

NoSQL
- Highly Scalable
- High Concurrency
- Storage Options
- Updates
- Real-time Capable
- Rudimentary
indexes
- TB + Capacity

Hadoop
- Highly scalable
- Low concurrency
- Distributed Storage
- Complex Access
- Security (TBD)
The Open Source/Big Data Landscape

http://www.bigdata-startups.com/open-source-tools/

14
Hadoop In Detail

Reference: http://blog.blazeclan.com/252/

Confidential

15
Hadoop Distributions

Confidential

16
For Example if you choose Cloudera…

Confidential

17
Comparing Hadoop Distributions

http://www.infoworld.com/d/business-intelligence/enterprise-hadoop-big-data-processing-made-easier-184330?page=0,5
Confidential

18
Big Data’s Technical Challenges
• Disaster recovery
• Security

• Data consistency
• Workload management

• Reprocessing
• Troubleshooting

• Performance
19
DATA SCIENCE

Confidential

20
Big Data vs. BI presentation viewpoint

IMPACT

Confidential

21
Questions for BI and Big Data
• Sample questions for BI
– What is my sales volume by time, by region, by store, by season?

– What is average review rating by product category, by product?
What is the dynamic of reviews, what are the trends?

• Sample questions for Big Data/ Data Science
– How change in review ratings impact sales?

– What is the time lag between review rating change and sales
volume change?
– What products are purchased together and can I improve product
recommendations?

Confidential

22
DATA SCIENCE

Data Science

Skills

Science
Purpose

• State the Problem

Research

• Discover information
about topic

Hypothesis • Predict the Outcome

Experiment

Analysis

Conclusion

Confidential

• Develop a process to
test the hypothesis
• Record the results
• Compare hypothesis
and results

23
Data Science Team
Each team would include:
•

Data Science Analyst – excellent communication skills, science and analytical
background.

•

Data Science Researcher/Solution Architect – good communication,, good
statistical/math, working knowledge 2 out of the following data science libraries (Mahoot
or any other machine learning, Rhadoop, R, SAS, SPSS) –

•

Data Science Technologist – acceptable communication skills, 25% deployable to the
client site (as minimum few should be deployable, others can be offshore), good
developer, working knowledge of Big data and related technologies

•

Data Science presentation engineer – knowledge BI and presentation tools

Nordstrom’s Big Data Team Mission:
“Delighting Customers through data-driven
products”

24
USING BIG DATA

Confidential

25
Data Science Sample use cases

Confidential

26
Top 10 Use Cases (2013 Computerworld)
1. Modeling Risk
2. Customer Churn Analysis
3. Recommendation Engines
4. Ad Targeting
5. POS Transaction Analysis
6. Analysis of network data to predict future failures
7. Threat Analysis
8. Trade Surveillance
9. Search Quality
10.Data Sandbox
http://www.computerworld.com.sg/resource/storage/iiis-2013-technical-workshops/?page=2
The Big Data of Dating
•

From analysis of match.com dating patterns:

•

21+ Million members

•

100+ million hits per month
– January 2nd is the busiest day for people to sign up on dating sites
– Women get 60% more attention if photo is taken indoors
– Men get 19% more attention if theirs is taken outside
– Full-body photos boost both sexes success by 203%
– Posing with animals or your best friends might seem cute but it actually reduces your
popularity by 53 per cent (men) and 42 per cent (women)
– Men get 8% fewer messages if they put up selfies.
– Mentions of words like divorce and separated gets men 52 per cent more messages
– Women who are more forward, using phrases like dinner, drinks or lunch in the first
message get 73 per cent more replies, while men should play it cooler. Those who
mention the same words in their opening message get 35 per cent fewer replies.

Confidential

28
Use Case Development
Business
Stakeholders

Business
Questions

Identify
Business
Value
Define
Success
Criteria

Develop
Hypothesis
and Identify
Data
Sources
Iterate
results and
develop
data for
goals
Use Case Checklist
• Title - An active description which identifies the goals of the
primary actor

• Characteristics:
–

Primary actor

–

Goal in Context

–

Scope

–

Level

–

Stakeholders and Interests

–

Precondition

• Success criteria
–

Precondition

–

Minimal Guarantees

–

Success Guarantees

–

Trigger

–

Main Success Scenario

–

Extensions

•

Technology & Data Variations List

•

Related Information.

Reference: Alistair Cockburn
EXPEDIA CASE STUDY

Archive Use Case
1.5 Petabytes continuous ingestion data

One of the largest Hadoop clusters in the
world
80% Open Source EDW

Staging and Historical
Analysis
Call Center and
Online data

Customer Benefits

 Avoided massive cost of new DW
Infrastructure
 Able to keep and analyze historical
transactions

Informatica
transformation &
aggregation

 Reduce risk of DW replacement
 Able to scale on demand using low-cost
servers
Transaction Volume

 > 500 GB daily increases from all sources
transaction, social, contact center

Analytic Infrastructure

31
Use Case: Sales Analysis
Sales per sq.ft.: Changes Over time
• Fitting the no-intercept line to the scatter of sales over sales floor
brings about visual baseline Sales-per-Sq.Ft. (SpSF) for each year
Mathematically the
SpSF measure is
given by the slope
coefficient of the
trend:
392.51 [CAD/Sq.Ft.]
in 2011 vs.
373.76 [CAD/Sq.Ft.]
in 2012

417 in 2011

417 in 2012

SpSF
Looking for Patterns Anomalies
This chart tells us most of the stores have highest sales on Saturday. But, Store X peaks on Friday and
Is also doing well on Mondays. Why?
10000000

9000000

8000000

7000000

6000000

5000000

4000000

3000000

2000000

1000000

0
THU

FRI

SAT

SUN

MON

TUE

WED
Affinity Analysis Use Case
Build model that provides the foundation for analyzing and
understanding the factors that influence year over year
change in store performance

•

Affinity Analysis is an input to:
•
•
•
•

•

Identify products purchased in tandem
Provide guidance an recommendations for
upsell and cross-sell
Redesign stores, layouts and planograms
Discount Plans and Promotions

Identifying customer baskets in different
time and geography
•

•

Investigating patterns on fine line and
product levels
Ranking customer baskets by Number of
times bought together Revenue
contributed
Clustering of Products

35
Snow Scrapers and Washer Fluid

36
Related Baskets

Size of the circle show how often
basket has been purchased
Season: 2012-05-16 - 2012-08-28
This kind of analysis can be used
for spotting driver products

1.
2.
3.
4.

Potted annuals/plants, Cell-packs/annual plants
Potted annuals/plants, vegetables/plants
Potted annuals/plants, Outdoor soils/outdoor lawn & plant care
Cell-packs/annual plants, vegetables/annual plants
Big Data is Evolving
• The industry is evolving
• Hadoop is now 8 years old since start in 2007 at Yahoo
• CDH 5 recently released
• $2.5B in venture capital in the space
• Hadoop is now considered a standard
• Hbase is an example of a project which has not found a standard
• Many tools today? What will be in 5 years from now?
• How to avoid the big data pitfalls?
• 50% of big data projects fail
• Those who success drive it by focus
• Insight vs. Impact
• Find one problem and fix it
• Data Science
• Change how you do analysis… scientific methods
• New and exciting
• Build a hybrid team to develop Data solutions
• Team can program, knows math and statistics and communicate
Confidential

38
The Big Data Adventure
Thank You and Questions
Ian Abramson
EPAM Systems
Toronto, Canada
GMT -5
Mobile phone:
Skype:
E-mail:

+1 (416) 254-9286
ian.abramson
Ian_Abramson@epam.com

Confidential

40

More Related Content

What's hot

Exalytics for MII sales institute
Exalytics for MII sales instituteExalytics for MII sales institute
Exalytics for MII sales instituteBrama Dhaneswara
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopDataWorks Summit
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Jeffrey T. Pollock
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
NYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewNYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewTravis Wright
 
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingDatabricks
 

What's hot (20)

Exalytics for MII sales institute
Exalytics for MII sales instituteExalytics for MII sales institute
Exalytics for MII sales institute
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
The Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on HadoopThe Convergence of Reporting and Interactive BI on Hadoop
The Convergence of Reporting and Interactive BI on Hadoop
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Global ai conf_final
Global ai conf_finalGlobal ai conf_final
Global ai conf_final
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
NYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services OverviewNYC Data Amp - Microsoft Azure and Data Services Overview
NYC Data Amp - Microsoft Azure and Data Services Overview
 
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
DataStax on Azure: Deploying an industry-leading data platform for cloud apps...
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 

Viewers also liked

Digital-Warriors-Marketing Roadmap with Big Data Analytics
Digital-Warriors-Marketing Roadmap with Big Data AnalyticsDigital-Warriors-Marketing Roadmap with Big Data Analytics
Digital-Warriors-Marketing Roadmap with Big Data AnalyticsJaysonBowden
 
Essential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data ArsenalEssential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data ArsenalMongoDB
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015MassTLC
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldSerg Masyutin
 

Viewers also liked (9)

Big Data
Big DataBig Data
Big Data
 
Digital-Warriors-Marketing Roadmap with Big Data Analytics
Digital-Warriors-Marketing Roadmap with Big Data AnalyticsDigital-Warriors-Marketing Roadmap with Big Data Analytics
Digital-Warriors-Marketing Roadmap with Big Data Analytics
 
Essential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data ArsenalEssential Tools For Your Big Data Arsenal
Essential Tools For Your Big Data Arsenal
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
JEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java WorldJEEConf 2015 Big Data Analysis in Java World
JEEConf 2015 Big Data Analysis in Java World
 
QlikView & Big Data
QlikView & Big DataQlikView & Big Data
QlikView & Big Data
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 

Similar to TOUG Big Data Challenge and Impact

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedcedrinemadera
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 

Similar to TOUG Big Data Challenge and Impact (20)

Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 

More from Toronto-Oracle-Users-Group

More from Toronto-Oracle-Users-Group (9)

Oracle Linux/Oracle VM & Oracle Cloud Overview
Oracle Linux/Oracle VM & Oracle Cloud OverviewOracle Linux/Oracle VM & Oracle Cloud Overview
Oracle Linux/Oracle VM & Oracle Cloud Overview
 
De-Mystifying Oracle Licensing
De-Mystifying Oracle LicensingDe-Mystifying Oracle Licensing
De-Mystifying Oracle Licensing
 
Oracle Web Center Overview
Oracle Web Center OverviewOracle Web Center Overview
Oracle Web Center Overview
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Developing Customer Portal with Oracle APEX - A Case Study
Developing Customer Portal with Oracle APEX - A Case StudyDeveloping Customer Portal with Oracle APEX - A Case Study
Developing Customer Portal with Oracle APEX - A Case Study
 
Developing Mobile Applications for iOS and Android the Oracle way
Developing Mobile Applications for iOS and Android the Oracle wayDeveloping Mobile Applications for iOS and Android the Oracle way
Developing Mobile Applications for iOS and Android the Oracle way
 
Make Oracle scream with Flash Storage - Kaminario
Make Oracle scream with Flash Storage - KaminarioMake Oracle scream with Flash Storage - Kaminario
Make Oracle scream with Flash Storage - Kaminario
 
TOUG-APEXposed
TOUG-APEXposedTOUG-APEXposed
TOUG-APEXposed
 
TOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 RecapTOUG-Oracle Open World 2013 Recap
TOUG-Oracle Open World 2013 Recap
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

TOUG Big Data Challenge and Impact

  • 1. Getting Started with Big Data and Making an Impact @TOUG Jan. 22, 2014 Ian Abramson EPAM Systems, Canada January 2014 Confidential
  • 2. Big Data .. The Silver Bullet? Confidential
  • 3. Agenda Introductions and Goals What is Big Data Technology Choices Making an Impact with Data Science Use Cases Confidential 3
  • 4. About Me • • • • • • • • • • • Degree in Applied Mathematics Over 20 years with Oracle software Over 10 years with data warehouses Big Data Analyst Author of numerous Oracle books Blogger: http://ians-oracle.blogspot.com/ Oracle ACE IOUG Past-President TOUG Board Member Toronto based Twitter: @iabramson 4
  • 5. WHERE IS BIG DATA? 5
  • 6. Why Big data? • New data sources • Unprecedented volume • Real World Issues – Data Systems are reaching capacity requiring high cost alternatives – Archive data is too far offline – Organizations require cost effective options – Retain all data for future analysis 6
  • 7. “Data becomes “Big Data”, when the size of the data becomes a part of the problem” Roger Magoulas (O’Reily Research) Big data is high-volume, high-velocity and highvariety information assets that demand costeffective, innovative forms of information processing for enhanced insight and decision making. Gartner: Big Data is a term/concept, which is used as a generic name for a “generation of technologies and architectures designed to extract value economically from very large volumes of a wide variety of data by enabling highvelocity capture, discovery, and/or analysis”. IDC: Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. Wikipedia: Big Data Defined 7
  • 8. The Attributes of Big Data • Classic Data Attributes: – Volume – Velocity – Variety • Big Data Technical Attributes – massive, parallel computing environment – infinitely scalable computing clusters, including cloud • Three main technical requirements – Need medium to accommodate large volumes for storage and data streaming – Require the computing horsepower and architectural approach which allows for the processing of the data where it exists and not via extraction and processing – Use the appropriate programming which allows for a computational paradigm, which performs computations in a highly parallel and scalable environment 8
  • 9. Challenges for Big Data http://tdwi.org/blogs/fern-halper/2013/10/four-big-data-challenges.aspx Confidential 9
  • 10. Big Data and Data Warehouse – war or peaceful coexistence? • The problem – different uses – different schemas and different partitioning. In most cases the requirements are orthogonal – impossible to provide optimal for everybody data partitioning/indexing • The ideal goal – acquire and store “as is” – access using multiple models. Need for powerful artificial intelligence knowledge base and data access code generators. • Will never be optimal for everybody unless huge redundancy • Problems are less painful if most of the data are read anyway. Good for analytics, not good for OLTP • Eventually Big data platforms will become DW platforms with well developed access interfaces • Until then -> acquire and store and then distribute on demand to conventional DW and data marts 10
  • 11. The New Data Architecture Data Archive Operational Systems Enterprise Data Social & Clickstream Sensor Generated Big Data ODS Hadoop Public Data HDFS Map/Reduce Historical Data Data Warehouse/BI/Analytics Other New Sources 11
  • 13. The Choices for Your Data RDBMS - High Concurrency - TB Storage - Indexed reads - Efficient updates - Caching - Highly secure Analytic Appliances - Scalable - Medium Concurrency - High Volume Processing (Postgres) - No indexes - TB + Netezza (128TB/rack) Oracle (300TB/Rack) NoSQL - Highly Scalable - High Concurrency - Storage Options - Updates - Real-time Capable - Rudimentary indexes - TB + Capacity Hadoop - Highly scalable - Low concurrency - Distributed Storage - Complex Access - Security (TBD)
  • 14. The Open Source/Big Data Landscape http://www.bigdata-startups.com/open-source-tools/ 14
  • 15. Hadoop In Detail Reference: http://blog.blazeclan.com/252/ Confidential 15
  • 17. For Example if you choose Cloudera… Confidential 17
  • 19. Big Data’s Technical Challenges • Disaster recovery • Security • Data consistency • Workload management • Reprocessing • Troubleshooting • Performance 19
  • 21. Big Data vs. BI presentation viewpoint IMPACT Confidential 21
  • 22. Questions for BI and Big Data • Sample questions for BI – What is my sales volume by time, by region, by store, by season? – What is average review rating by product category, by product? What is the dynamic of reviews, what are the trends? • Sample questions for Big Data/ Data Science – How change in review ratings impact sales? – What is the time lag between review rating change and sales volume change? – What products are purchased together and can I improve product recommendations? Confidential 22
  • 23. DATA SCIENCE Data Science Skills Science Purpose • State the Problem Research • Discover information about topic Hypothesis • Predict the Outcome Experiment Analysis Conclusion Confidential • Develop a process to test the hypothesis • Record the results • Compare hypothesis and results 23
  • 24. Data Science Team Each team would include: • Data Science Analyst – excellent communication skills, science and analytical background. • Data Science Researcher/Solution Architect – good communication,, good statistical/math, working knowledge 2 out of the following data science libraries (Mahoot or any other machine learning, Rhadoop, R, SAS, SPSS) – • Data Science Technologist – acceptable communication skills, 25% deployable to the client site (as minimum few should be deployable, others can be offshore), good developer, working knowledge of Big data and related technologies • Data Science presentation engineer – knowledge BI and presentation tools Nordstrom’s Big Data Team Mission: “Delighting Customers through data-driven products” 24
  • 26. Data Science Sample use cases Confidential 26
  • 27. Top 10 Use Cases (2013 Computerworld) 1. Modeling Risk 2. Customer Churn Analysis 3. Recommendation Engines 4. Ad Targeting 5. POS Transaction Analysis 6. Analysis of network data to predict future failures 7. Threat Analysis 8. Trade Surveillance 9. Search Quality 10.Data Sandbox http://www.computerworld.com.sg/resource/storage/iiis-2013-technical-workshops/?page=2
  • 28. The Big Data of Dating • From analysis of match.com dating patterns: • 21+ Million members • 100+ million hits per month – January 2nd is the busiest day for people to sign up on dating sites – Women get 60% more attention if photo is taken indoors – Men get 19% more attention if theirs is taken outside – Full-body photos boost both sexes success by 203% – Posing with animals or your best friends might seem cute but it actually reduces your popularity by 53 per cent (men) and 42 per cent (women) – Men get 8% fewer messages if they put up selfies. – Mentions of words like divorce and separated gets men 52 per cent more messages – Women who are more forward, using phrases like dinner, drinks or lunch in the first message get 73 per cent more replies, while men should play it cooler. Those who mention the same words in their opening message get 35 per cent fewer replies. Confidential 28
  • 30. Use Case Checklist • Title - An active description which identifies the goals of the primary actor • Characteristics: – Primary actor – Goal in Context – Scope – Level – Stakeholders and Interests – Precondition • Success criteria – Precondition – Minimal Guarantees – Success Guarantees – Trigger – Main Success Scenario – Extensions • Technology & Data Variations List • Related Information. Reference: Alistair Cockburn
  • 31. EXPEDIA CASE STUDY Archive Use Case 1.5 Petabytes continuous ingestion data One of the largest Hadoop clusters in the world 80% Open Source EDW Staging and Historical Analysis Call Center and Online data Customer Benefits  Avoided massive cost of new DW Infrastructure  Able to keep and analyze historical transactions Informatica transformation & aggregation  Reduce risk of DW replacement  Able to scale on demand using low-cost servers Transaction Volume  > 500 GB daily increases from all sources transaction, social, contact center Analytic Infrastructure 31
  • 32. Use Case: Sales Analysis Sales per sq.ft.: Changes Over time • Fitting the no-intercept line to the scatter of sales over sales floor brings about visual baseline Sales-per-Sq.Ft. (SpSF) for each year Mathematically the SpSF measure is given by the slope coefficient of the trend: 392.51 [CAD/Sq.Ft.] in 2011 vs. 373.76 [CAD/Sq.Ft.] in 2012 417 in 2011 417 in 2012 SpSF
  • 33. Looking for Patterns Anomalies This chart tells us most of the stores have highest sales on Saturday. But, Store X peaks on Friday and Is also doing well on Mondays. Why? 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 THU FRI SAT SUN MON TUE WED
  • 34. Affinity Analysis Use Case Build model that provides the foundation for analyzing and understanding the factors that influence year over year change in store performance • Affinity Analysis is an input to: • • • • • Identify products purchased in tandem Provide guidance an recommendations for upsell and cross-sell Redesign stores, layouts and planograms Discount Plans and Promotions Identifying customer baskets in different time and geography • • Investigating patterns on fine line and product levels Ranking customer baskets by Number of times bought together Revenue contributed
  • 36. Snow Scrapers and Washer Fluid 36
  • 37. Related Baskets Size of the circle show how often basket has been purchased Season: 2012-05-16 - 2012-08-28 This kind of analysis can be used for spotting driver products 1. 2. 3. 4. Potted annuals/plants, Cell-packs/annual plants Potted annuals/plants, vegetables/plants Potted annuals/plants, Outdoor soils/outdoor lawn & plant care Cell-packs/annual plants, vegetables/annual plants
  • 38. Big Data is Evolving • The industry is evolving • Hadoop is now 8 years old since start in 2007 at Yahoo • CDH 5 recently released • $2.5B in venture capital in the space • Hadoop is now considered a standard • Hbase is an example of a project which has not found a standard • Many tools today? What will be in 5 years from now? • How to avoid the big data pitfalls? • 50% of big data projects fail • Those who success drive it by focus • Insight vs. Impact • Find one problem and fix it • Data Science • Change how you do analysis… scientific methods • New and exciting • Build a hybrid team to develop Data solutions • Team can program, knows math and statistics and communicate Confidential 38
  • 39. The Big Data Adventure
  • 40. Thank You and Questions Ian Abramson EPAM Systems Toronto, Canada GMT -5 Mobile phone: Skype: E-mail: +1 (416) 254-9286 ian.abramson Ian_Abramson@epam.com Confidential 40