SlideShare ist ein Scribd-Unternehmen logo
1 von 29
A Day of
Empowerment

Building Predictive Analytics on
Big Data Platforms
1. Opportunity: Big Data
2. Demystifying Predictive Analytics
3. Taking advantage of combined power
Striving for an
“unfair”
competitive advantage
Old Days
New Days
Big Data
could be looking
like rubbish
Until
you
find out
the use
of it
“Data are becoming the new raw
material of business”
- Craig Mundie, head of research and strategy, Microsoft
Modeling true risk

Network data analysis to
predict failure

Customer churn analysis

Threat analysis

Recommendations

Feature Usage analysis

Ad targeting

…
Collect and
Store

• Complex data (text
files, audio, video, images, …)
• Multiple sources
• Lots of data

Process

• Batch processing
• Parallel execution
• Cluster solution

Analyze

•
•
•
•
•

Simple visualization (reports, dashboard)
Text mining
Sentiment analysis
Prediction models
Collaborative filtering
Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.)

Event Storage

Event Aggregation and
Transformation

Event Transport

Event Serialization and
Archiving

Event Processing
and
Analytics

Presentation

Query
Engine

Interactive
Search

User

Full-text
Search engine

Event DB

Rules
Engine

Reports and
Dashboards

Full-text
Index

Predictive
Analytics

Alerts
Visualization

E-mail, SMS, SNMP, etc.

Operational Management Tools

Event Ingestion
Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.)

Event Storage

Event Transport

Event Aggregation and

Apache Flume Transformation

Event Serialization and
Archiving

Protobuf, Avro, Thrif
t, MessagePack

Event Processing
and
Analytics

Presentation

Query
Engine

Impala

Interactive
Search
Custom

User

Full-text

Solr, ElasticSe
Search engine
arch

Full-text

Event DB
HDFS, Hbase, Cas Index
sandra

Rules
Engine

Drools

Reports and
JasperSoft,
Dashboards
Tableau

Predictive
Analytics

R

Alerts
Visualization
Custom

E-mail, SMS, SNMP, etc.

Operational Management Tools

Event Ingestion

Cloudera
Manager, Apache
Ambari
“The idea that the future is
unpredictable is undermined every
day by the ease with which the
past is explained”
― Daniel Kahneman, Thinking, Fast and Slow
More data is
available for
companies

Storage
technologies
allow to store
and operate it

Advanced
analytics could
be applied to
this new data to
achieve
competitive
advantage
Descriptive

Diagnostic

Predictive

Prescriptive

What happened?

Why did it
happen?

What is going to
happen?

What should we
do about that?

Hindsight

Insight

Foresight
Senior
(Executive)
Management

Ambiguity
The goals to be achieved or the problem to be solved is unclear
Alternatives are difficult to define
Information about outcomes is unavailable.

Uncertainty
Middle
Management

Managers know which goals they wish to achieve.
Information about alternatives and future events is incomplete.

Risk
Junior (Line)
Management

A decision has clear goals and good information is available, but the
future outcomes associated with each alternative are subject to chance.

Certainty
All of the information the decision maker needs is fully available
Define objective

• Increase customer
satisfaction level
• Identify
prospective
customers
• Identify crossselling
opportunities
• Decrease time to
market
• Decrease costs of
marketing
campaigns

Identify
data sets

Design the
model

• Historical data on • Classification
model for Internet
customers from
users defining
CRM system
what one is
• Geographical
interested in
location data
• Smartphone data • Adaptive control
models for
• Social network
managing IT and
data
network
• Text data from the
infrastructure
Internet pages
• Probabilistic
• Image data from
model for defining
the medical
credit worthiness
sources

Design the
solution

• Data storage type
• Logical database
design
• Availability and
scalability of the
solution
• Integration into
corporate
information
environment
• Solution
deployment
model

Implement
the solution

• Add new
functionality to
the existing
corporate BI
platform
• Implement new BI
solution
• Enrich existing
business system
(CRM, ERP) with
the predictive
analytics
functionality
Business
Tasks

Model Family

Algorithms

• Define prospective
customers
• Define traffic jams in
the city
• Recommend
restaurants and menus
• Adjust UI to the
particular user
• Classify body part on
X-Ray image

• Define market
niche
• Define influencers
in the social
networks
• Define similar
customers or
projects in
portfolio
• Define informal
groups in the
organization

• Define fraud bank
transaction
• Define network
intrusion attempts
• Provide automatic
aircraft engine
testing
• Provide automatic IT
infrastructure
monitoring
• Provide clinical test
analysis

• Define the best
price for the goods
or services to
maximize profits
• Define best working
schedule for the
store
• Define best amount
of production
• Define best
business rules

Classification

Clustering

Anomaly Detection

Optimization

• Naïve Bayes
• Logistic regression
• Support Vector
Machines
• Neural Networks

• K-Means
• K nearest
neighbor
• Self-organized
maps
• Mixture of
Gaussians

• Mixture of Gaussians
• Self-learning
anomaly detection

•
•
•
•
•

Gradient descent
Simplex method
Newton’s method
Normal equations
Genetic algorithms
Google to Buy Waze
for $1.3 Billion
Xerox plans to clear
traffic on I-10

The promise of better
data has MetLife investing
$300M in new tech

Gracenote did a whole
business on recommending
music

Obama’s data scientists built
a volunteer army on Facebook
Description:
Cloud-based service for providing more
accurate estimates of the credit
worthiness (loan scoring) using publicly
available data from social networks.
Service is oriented to be used by banks.

Technologies:






Amazon EC2
MySQL
SAP HANA
R
JAVA

Credit Score
Facebook

Twitter

LinkedIn
API

Processing

Preprocessing

MySQL

(data filtering,
data cleansing)

SAP HANA

Credit scoring API

(scoring model)
Description:
Computer aid diagnostic
system that can
recognize human body
part on X-Ray image and
detect broken or
fractured bones

X-Ray Image

Technologies:






Matlab/Octave
Python
PyBrain
NumPy
SciPy

Analytical Engine

This is a hand.
Broken bone
detected
Technology Expertise
Services
Big Data and NoSQL

Data Warehouse

Data Integration

BI Platforms
Big Data Analytics
Predictive Analytics
Data Science Service
Data Integration
Data Warehousing

Data Visualization and Analysis
Building Predictive Analytics on Big Data Platforms

Weitere ähnliche Inhalte

Was ist angesagt?

From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraMolly Alexander
 
Simplify Your Analytics Strategy
Simplify Your Analytics StrategySimplify Your Analytics Strategy
Simplify Your Analytics StrategyKASHISH MUKHEJA
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonSocietyConsulting
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries ciaKevin Pledge
 
Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
 Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi... Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...Molly Alexander
 
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...Molly Alexander
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Kevin Pledge
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Data Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn JinData Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn JinMolly Alexander
 
Scaling Person to Person Enterprise sales
Scaling Person to Person Enterprise salesScaling Person to Person Enterprise sales
Scaling Person to Person Enterprise salesCMassociates
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI marketMohsin Baig
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 

Was ist angesagt? (20)

From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
Mohammed AL Madhani
Mohammed AL MadhaniMohammed AL Madhani
Mohammed AL Madhani
 
Simplify Your Analytics Strategy
Simplify Your Analytics StrategySimplify Your Analytics Strategy
Simplify Your Analytics Strategy
 
Data Science in Digital Marketing - Forest Cassidy, LeadFerret
Data Science in Digital Marketing - Forest Cassidy, LeadFerretData Science in Digital Marketing - Forest Cassidy, LeadFerret
Data Science in Digital Marketing - Forest Cassidy, LeadFerret
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Analytics for actuaries cia
Analytics for actuaries ciaAnalytics for actuaries cia
Analytics for actuaries cia
 
Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
 Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi... Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
Hiring and Developing Analytics Talent in the CPG and Retail Industry - Mohi...
 
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
Integrate Your Data Science & Omni-channel Strategy to Reduce Cost and Increa...
 
Future of Data - Big Data
Future of Data - Big DataFuture of Data - Big Data
Future of Data - Big Data
 
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
Advanced Business Analytics for Actuaries - Canadian Institute of Actuaries J...
 
Data analytics
Data analyticsData analytics
Data analytics
 
LoanHD Overview
LoanHD OverviewLoanHD Overview
LoanHD Overview
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn JinData Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn Jin
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Scaling Person to Person Enterprise sales
Scaling Person to Person Enterprise salesScaling Person to Person Enterprise sales
Scaling Person to Person Enterprise sales
 
Impact of big data on DCMI market
Impact of big data on DCMI marketImpact of big data on DCMI market
Impact of big data on DCMI market
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 

Andere mochten auch

Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data scienceDeepak Singh
 
Open Platforms & Data Smarts: How We Can Do Good Better
Open Platforms & Data Smarts: How We Can Do Good BetterOpen Platforms & Data Smarts: How We Can Do Good Better
Open Platforms & Data Smarts: How We Can Do Good BetterKristin Wolff
 
Building Open Data Platforms from Nordic APIs Platform Summit
Building Open Data Platforms from Nordic APIs Platform SummitBuilding Open Data Platforms from Nordic APIs Platform Summit
Building Open Data Platforms from Nordic APIs Platform SummitAndreas Krohn
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
GrowthStack 2016 — Driving Conversions Beyond the Install
GrowthStack 2016 — Driving Conversions Beyond the InstallGrowthStack 2016 — Driving Conversions Beyond the Install
GrowthStack 2016 — Driving Conversions Beyond the InstallGrow.co
 
Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)Abram Guerra
 
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except EverythingGrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except EverythingGrow.co
 
Accelerating the Value of Data Management Platforms with Tag Management Systems
Accelerating the Value of Data Management Platforms with Tag Management SystemsAccelerating the Value of Data Management Platforms with Tag Management Systems
Accelerating the Value of Data Management Platforms with Tag Management SystemsEnsighten
 
Keynote slides: Platform Strategy Creating Exponential Value in a Connected ...
Keynote slides: Platform Strategy Creating Exponential Value  in a Connected ...Keynote slides: Platform Strategy Creating Exponential Value  in a Connected ...
Keynote slides: Platform Strategy Creating Exponential Value in a Connected ...Ross Dawson
 
The Fundamentals of Platform Strategy: Creating Genuine Value with APIs
The Fundamentals of Platform Strategy: Creating Genuine Value with APIsThe Fundamentals of Platform Strategy: Creating Genuine Value with APIs
The Fundamentals of Platform Strategy: Creating Genuine Value with APIs3scale
 
PwC: New IT Platform From Strategy Through Execution
PwC: New IT Platform From Strategy Through ExecutionPwC: New IT Platform From Strategy Through Execution
PwC: New IT Platform From Strategy Through ExecutionCA Technologies
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
Platform Strategy and Digital Ecosystems
Platform Strategy and Digital EcosystemsPlatform Strategy and Digital Ecosystems
Platform Strategy and Digital EcosystemsApigee | Google Cloud
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Spark Summit
 
Platform Strategy: Openness, Innovation & Control
Platform Strategy: Openness, Innovation & ControlPlatform Strategy: Openness, Innovation & Control
Platform Strategy: Openness, Innovation & ControlMarshall Van Alstyne
 

Andere mochten auch (15)

Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data science
 
Open Platforms & Data Smarts: How We Can Do Good Better
Open Platforms & Data Smarts: How We Can Do Good BetterOpen Platforms & Data Smarts: How We Can Do Good Better
Open Platforms & Data Smarts: How We Can Do Good Better
 
Building Open Data Platforms from Nordic APIs Platform Summit
Building Open Data Platforms from Nordic APIs Platform SummitBuilding Open Data Platforms from Nordic APIs Platform Summit
Building Open Data Platforms from Nordic APIs Platform Summit
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
GrowthStack 2016 — Driving Conversions Beyond the Install
GrowthStack 2016 — Driving Conversions Beyond the InstallGrowthStack 2016 — Driving Conversions Beyond the Install
GrowthStack 2016 — Driving Conversions Beyond the Install
 
Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)Final Health Learning Platform - Strategy Presentation (Summer 2012)
Final Health Learning Platform - Strategy Presentation (Summer 2012)
 
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except EverythingGrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
GrowthStack 2016 — Data Platforms: Why Nothing Has Changed Except Everything
 
Accelerating the Value of Data Management Platforms with Tag Management Systems
Accelerating the Value of Data Management Platforms with Tag Management SystemsAccelerating the Value of Data Management Platforms with Tag Management Systems
Accelerating the Value of Data Management Platforms with Tag Management Systems
 
Keynote slides: Platform Strategy Creating Exponential Value in a Connected ...
Keynote slides: Platform Strategy Creating Exponential Value  in a Connected ...Keynote slides: Platform Strategy Creating Exponential Value  in a Connected ...
Keynote slides: Platform Strategy Creating Exponential Value in a Connected ...
 
The Fundamentals of Platform Strategy: Creating Genuine Value with APIs
The Fundamentals of Platform Strategy: Creating Genuine Value with APIsThe Fundamentals of Platform Strategy: Creating Genuine Value with APIs
The Fundamentals of Platform Strategy: Creating Genuine Value with APIs
 
PwC: New IT Platform From Strategy Through Execution
PwC: New IT Platform From Strategy Through ExecutionPwC: New IT Platform From Strategy Through Execution
PwC: New IT Platform From Strategy Through Execution
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Platform Strategy and Digital Ecosystems
Platform Strategy and Digital EcosystemsPlatform Strategy and Digital Ecosystems
Platform Strategy and Digital Ecosystems
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
 
Platform Strategy: Openness, Innovation & Control
Platform Strategy: Openness, Innovation & ControlPlatform Strategy: Openness, Innovation & Control
Platform Strategy: Openness, Innovation & Control
 

Ähnlich wie Building Predictive Analytics on Big Data Platforms

Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business AdvantageTeradata Aster
 
Data Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy ShelpukData Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy ShelpukSoftServe
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...DATAVERSITY
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessAmit Bhargava
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analyticsRohanKumarJumnani
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojabzmojab
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Jon Mead
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
 

Ähnlich wie Building Predictive Analytics on Big Data Platforms (20)

Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Data Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy ShelpukData Science: The Art of Foul Play by Serhiy Shelpuk
Data Science: The Art of Foul Play by Serhiy Shelpuk
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Data mining
Data miningData mining
Data mining
 
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
ADV Slides: Increasing Artificial Intelligence Success with Master Data Manag...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data mining
Data miningData mining
Data mining
 
Big data overview
Big data overviewBig data overview
Big data overview
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-business
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analytics
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Building Predictive Analytics on Big Data Platforms

  • 1. A Day of Empowerment Building Predictive Analytics on Big Data Platforms
  • 2. 1. Opportunity: Big Data 2. Demystifying Predictive Analytics 3. Taking advantage of combined power
  • 4.
  • 7. Big Data could be looking like rubbish
  • 9. “Data are becoming the new raw material of business” - Craig Mundie, head of research and strategy, Microsoft
  • 10. Modeling true risk Network data analysis to predict failure Customer churn analysis Threat analysis Recommendations Feature Usage analysis Ad targeting …
  • 11. Collect and Store • Complex data (text files, audio, video, images, …) • Multiple sources • Lots of data Process • Batch processing • Parallel execution • Cluster solution Analyze • • • • • Simple visualization (reports, dashboard) Text mining Sentiment analysis Prediction models Collaborative filtering
  • 12.
  • 13. Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.) Event Storage Event Aggregation and Transformation Event Transport Event Serialization and Archiving Event Processing and Analytics Presentation Query Engine Interactive Search User Full-text Search engine Event DB Rules Engine Reports and Dashboards Full-text Index Predictive Analytics Alerts Visualization E-mail, SMS, SNMP, etc. Operational Management Tools Event Ingestion
  • 14. Event sources (Log files, Windows Event Log, WMI, SNMP, database, etc.) Event Storage Event Transport Event Aggregation and Apache Flume Transformation Event Serialization and Archiving Protobuf, Avro, Thrif t, MessagePack Event Processing and Analytics Presentation Query Engine Impala Interactive Search Custom User Full-text Solr, ElasticSe Search engine arch Full-text Event DB HDFS, Hbase, Cas Index sandra Rules Engine Drools Reports and JasperSoft, Dashboards Tableau Predictive Analytics R Alerts Visualization Custom E-mail, SMS, SNMP, etc. Operational Management Tools Event Ingestion Cloudera Manager, Apache Ambari
  • 15. “The idea that the future is unpredictable is undermined every day by the ease with which the past is explained” ― Daniel Kahneman, Thinking, Fast and Slow
  • 16. More data is available for companies Storage technologies allow to store and operate it Advanced analytics could be applied to this new data to achieve competitive advantage
  • 17. Descriptive Diagnostic Predictive Prescriptive What happened? Why did it happen? What is going to happen? What should we do about that? Hindsight Insight Foresight
  • 18. Senior (Executive) Management Ambiguity The goals to be achieved or the problem to be solved is unclear Alternatives are difficult to define Information about outcomes is unavailable. Uncertainty Middle Management Managers know which goals they wish to achieve. Information about alternatives and future events is incomplete. Risk Junior (Line) Management A decision has clear goals and good information is available, but the future outcomes associated with each alternative are subject to chance. Certainty All of the information the decision maker needs is fully available
  • 19. Define objective • Increase customer satisfaction level • Identify prospective customers • Identify crossselling opportunities • Decrease time to market • Decrease costs of marketing campaigns Identify data sets Design the model • Historical data on • Classification model for Internet customers from users defining CRM system what one is • Geographical interested in location data • Smartphone data • Adaptive control models for • Social network managing IT and data network • Text data from the infrastructure Internet pages • Probabilistic • Image data from model for defining the medical credit worthiness sources Design the solution • Data storage type • Logical database design • Availability and scalability of the solution • Integration into corporate information environment • Solution deployment model Implement the solution • Add new functionality to the existing corporate BI platform • Implement new BI solution • Enrich existing business system (CRM, ERP) with the predictive analytics functionality
  • 20. Business Tasks Model Family Algorithms • Define prospective customers • Define traffic jams in the city • Recommend restaurants and menus • Adjust UI to the particular user • Classify body part on X-Ray image • Define market niche • Define influencers in the social networks • Define similar customers or projects in portfolio • Define informal groups in the organization • Define fraud bank transaction • Define network intrusion attempts • Provide automatic aircraft engine testing • Provide automatic IT infrastructure monitoring • Provide clinical test analysis • Define the best price for the goods or services to maximize profits • Define best working schedule for the store • Define best amount of production • Define best business rules Classification Clustering Anomaly Detection Optimization • Naïve Bayes • Logistic regression • Support Vector Machines • Neural Networks • K-Means • K nearest neighbor • Self-organized maps • Mixture of Gaussians • Mixture of Gaussians • Self-learning anomaly detection • • • • • Gradient descent Simplex method Newton’s method Normal equations Genetic algorithms
  • 21. Google to Buy Waze for $1.3 Billion Xerox plans to clear traffic on I-10 The promise of better data has MetLife investing $300M in new tech Gracenote did a whole business on recommending music Obama’s data scientists built a volunteer army on Facebook
  • 22.
  • 23. Description: Cloud-based service for providing more accurate estimates of the credit worthiness (loan scoring) using publicly available data from social networks. Service is oriented to be used by banks. Technologies:      Amazon EC2 MySQL SAP HANA R JAVA Credit Score
  • 25. Description: Computer aid diagnostic system that can recognize human body part on X-Ray image and detect broken or fractured bones X-Ray Image Technologies:      Matlab/Octave Python PyBrain NumPy SciPy Analytical Engine This is a hand. Broken bone detected
  • 27. Big Data and NoSQL Data Warehouse Data Integration BI Platforms
  • 28. Big Data Analytics Predictive Analytics Data Science Service Data Integration Data Warehousing Data Visualization and Analysis