SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Big Data Solutions for
Marketing Analytics
Natalino Busa
@natalinobusa
Parallelism Hadoop Cassandra Akka
Machine Learning Statistics Big Data
Algorithms Cloud Computing Scala Spray
Natalino Busa
@natalinobusa
www.natalinobusa.com
Humanize Data
The bank statements
Back to routine.
Grocery, broken washmachine
After-vacation fun
Pancake house.
Traveling back.
Just back home. Pizza.
Shopping in Sicily
Vacation!
The bank statements How I read the bank bills
Back to routine.
Grocery, broken washmachine
After-vacation fun
Pancake house.
Traveling back.
Just back home. Pizza.
Shopping in Sicily
Vacation!
The bank statements How I read the bank bills What happened those days
data is the fabric of our lives
Let’s give more meaning and context to data.
Abraham Harold Maslow (April 1, 1908 –
June 8, 1970) was an American psychologist
who was best known for creating Maslow's
hierarchy of needs
breathing, food, water, sleep
security of body, resources,
health, employment, property
friend, family, partner
security of love and belonging
self-esteem, confidence,
achievements, respect
spontaneity, creativity,
acceptance, freedom, ethics
Physiology
Contractual
Love & Caring
Esteem
Self-actualization
Very human needs
How much caring can
technology be?
Connectivity, Electricity, Hardware /
Infra
security of basic operations
REST APIs, Encryption, Authentication
Notification, Alerts,
Social bonding, Predictions
Set goals, planning,
Achievements, Advisory role
Freedom,
Trusted Companion
Physiology
Contractual
Love & Caring
Esteem
Self-actualization
Technology is reaching out
Data science top 3
Dimensionality
Reduction
Predictive
Analytics
Clustering
Segmentation
Data science: what’s working?
- Random Forests
- Artificial Neural Networks
- Clustering Algorithms
- Pattern Recognition
- Time-Serie analysis
- Regression
Most actual models are a
combination of these ones
Data science ^.^/
keep it scientific
cross-validate your models
keep it measurable
play with it
create new features
explore the available data
How to code data science?
# Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results
● Language for statistics
● Easy to Analyze and shape data
● Advanced statistical package
● Fueled by academia and professionals
● Very clean visualization packages
Packages for machine learning
time serie forecasting, clustering, classification
decision trees, neural networks
Remote procedure calls (RPC)
From scala/java via RProcess and Rserve
Data Science: R
>>> from sklearn.datasets import load_iris
>>> from sklearn import tree
>>> iris = load_iris()
>>> clf = tree.DecisionTreeClassifier()
>>> clf = clf.fit(iris.data, iris.target)
● Flexible, concise language
● Quick to code and prototype
● Portable, visualization libraries
Machine learning libraries:
scipy, statsmodels, sklearn,
matplotlib, ipython
Web libraries
flask, tornado, (no)SQL clients
Data Science: Python
Earn the trust
The customer’s context
Personal history:
amount of transactions ever done
Long term Interaction:
how the users’ action correlate with others
Real time events:
Trends and recent events
The customer’s context
context is related to time:
slow changing: the defining characteristic of a person
fast changing: events which influence our lives, trends
Require very different
technology solutions !!!
Challenges
Not much time to react
Events must be delivered fast to the new machine APIs
It’s Web, and Mobile Apps: latency budget is limited
Loads of information to process
Understand well the user history
Access a larger context
Big Data and Fast data
ranking and preference
segmentation and clustering
short term trending topics
rule-based recommendations
10’s Terabytes of Data.
This can take hours ….
100’s of events per second.
This must be fast ….
Back to the drawing board
core banking systems
SOAP
services
and DBs
System
BUS
customer
facing appls
channels
A high-level bank schematic
Higher
separation !
Less silos
Interactions
with core
systems
Bigger and Faster
Human-centric applications
Some techs
Hadoop: Distributed Data OS
Reliable
Distributed, Replicated File System
Low cost
↓ Cost vs ↑ Performance/Storage
Computing Powerhouse
All clusters CPU’s working in parallel for
running queries
Cassandra: A low-latency 2D store
Reliable
Distributed, Replicated File System
Low latency
Sub msec. read/write operations
Tunable CAP
Define your level of consistency
Data model:
hashed rows, sorted wide columns
Architecture model:
No SPOF, ring of nodes,
omogeneous system
Scala / Akka / Spray:
a WEB API reactive framework
Actor
A Actor
B
Actor
C
msg 1
msg 2
msg 3
msg 4
● it scales horizontally (can run in cluster mode)
● maximum use of the available cores/memory
● processing is non-blocking, threads are re-used
● can parallelize computing power across many actors
Very fast: 1000’s messages/sec
Very reliable: auto recovery
Lazy: compute only when required
Putting it all together
Hadoop
application (actor based)
millions of millions of
λ=
conversions
( lamda )
Data queues
Science & Engineering
Statistics,
Data Science
Python
R
Visualization
IT Infra
Big Data
Java
Scala
SQL
Hadoop: Big Data Infrastructure, Data Science on large datasets
Big Data and Fast Data
requires different profiles to be able to
achieve the best results
Some lessons learned
● Mix and match technologies is a good thing
● Fast Data must complement Big Data
● Ease integration among teams
● Hadoop, Cassandra, and Akka
● Data Science takes time to figure out
Parallelism Mathematics Programming
Languages Machine Learning Statistics
Big Data Algorithms Cloud Computing
Natalino Busa
@natalinobusa
www.natalinobusa.com
Thanks !
Any questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraShrikant Samarth
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiativeMansi Mehra
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 

Was ist angesagt? (20)

Data Science Toolchain 101
Data Science Toolchain 101Data Science Toolchain 101
Data Science Toolchain 101
 
2015 - Extract SF - Data Quality
2015 - Extract SF - Data Quality2015 - Extract SF - Data Quality
2015 - Extract SF - Data Quality
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning GuruBig Data Hadoop Tutorial by Easylearning Guru
Big Data Hadoop Tutorial by Easylearning Guru
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
Mongo db
Mongo dbMongo db
Mongo db
 
Hadoop_Presentation
Hadoop_PresentationHadoop_Presentation
Hadoop_Presentation
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Big Data Analytics V2
Big Data Analytics V2Big Data Analytics V2
Big Data Analytics V2
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 

Ähnlich wie Big data solutions for advanced marketing analytics

MyHeritage Cassandra meetup 2016
MyHeritage Cassandra meetup 2016MyHeritage Cassandra meetup 2016
MyHeritage Cassandra meetup 2016Ran Peled
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...NETWAYS
 
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...NETWAYS
 
Getting the most out of your containerized database
Getting the most out of your containerized databaseGetting the most out of your containerized database
Getting the most out of your containerized databaseClaus Matzinger
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Sensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaSensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaClaus Matzinger
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseStavros Papadopoulos
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 

Ähnlich wie Big data solutions for advanced marketing analytics (20)

MyHeritage Cassandra meetup 2016
MyHeritage Cassandra meetup 2016MyHeritage Cassandra meetup 2016
MyHeritage Cassandra meetup 2016
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...OSDC 2017 |  An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
OSDC 2017 | An Open Machine Data Analysis Stack with Docker, CrateDB, and Gr...
 
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
OSDC 2017 - Claus Matzinger - An Open Machine Data Analysis Srack with Docker...
 
Getting the most out of your containerized database
Getting the most out of your containerized databaseGetting the most out of your containerized database
Getting the most out of your containerized database
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
963
963963
963
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Sensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und GrafanaSensordaten analysieren mit Docker, CrateDB und Grafana
Sensordaten analysieren mit Docker, CrateDB und Grafana
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 

Mehr von Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooksNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

Mehr von Natalino Busa (17)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Data science apps: beyond notebooks
Data science apps: beyond notebooksData science apps: beyond notebooks
Data science apps: beyond notebooks
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Big data solutions for advanced marketing analytics

  • 1. Big Data Solutions for Marketing Analytics Natalino Busa @natalinobusa
  • 2. Parallelism Hadoop Cassandra Akka Machine Learning Statistics Big Data Algorithms Cloud Computing Scala Spray Natalino Busa @natalinobusa www.natalinobusa.com
  • 5. Back to routine. Grocery, broken washmachine After-vacation fun Pancake house. Traveling back. Just back home. Pizza. Shopping in Sicily Vacation! The bank statements How I read the bank bills
  • 6. Back to routine. Grocery, broken washmachine After-vacation fun Pancake house. Traveling back. Just back home. Pizza. Shopping in Sicily Vacation! The bank statements How I read the bank bills What happened those days
  • 7. data is the fabric of our lives Let’s give more meaning and context to data.
  • 8. Abraham Harold Maslow (April 1, 1908 – June 8, 1970) was an American psychologist who was best known for creating Maslow's hierarchy of needs
  • 9. breathing, food, water, sleep security of body, resources, health, employment, property friend, family, partner security of love and belonging self-esteem, confidence, achievements, respect spontaneity, creativity, acceptance, freedom, ethics Physiology Contractual Love & Caring Esteem Self-actualization Very human needs
  • 10. How much caring can technology be?
  • 11. Connectivity, Electricity, Hardware / Infra security of basic operations REST APIs, Encryption, Authentication Notification, Alerts, Social bonding, Predictions Set goals, planning, Achievements, Advisory role Freedom, Trusted Companion Physiology Contractual Love & Caring Esteem Self-actualization Technology is reaching out
  • 12. Data science top 3 Dimensionality Reduction Predictive Analytics Clustering Segmentation
  • 13. Data science: what’s working? - Random Forests - Artificial Neural Networks - Clustering Algorithms - Pattern Recognition - Time-Serie analysis - Regression Most actual models are a combination of these ones
  • 14. Data science ^.^/ keep it scientific cross-validate your models keep it measurable play with it create new features explore the available data
  • 15. How to code data science?
  • 16. # Multiple Linear Regression Example fit <- lm(y ~ x1 + x2 + x3, data=mydata) summary(fit) # show results ● Language for statistics ● Easy to Analyze and shape data ● Advanced statistical package ● Fueled by academia and professionals ● Very clean visualization packages Packages for machine learning time serie forecasting, clustering, classification decision trees, neural networks Remote procedure calls (RPC) From scala/java via RProcess and Rserve Data Science: R
  • 17. >>> from sklearn.datasets import load_iris >>> from sklearn import tree >>> iris = load_iris() >>> clf = tree.DecisionTreeClassifier() >>> clf = clf.fit(iris.data, iris.target) ● Flexible, concise language ● Quick to code and prototype ● Portable, visualization libraries Machine learning libraries: scipy, statsmodels, sklearn, matplotlib, ipython Web libraries flask, tornado, (no)SQL clients Data Science: Python
  • 19. The customer’s context Personal history: amount of transactions ever done Long term Interaction: how the users’ action correlate with others Real time events: Trends and recent events
  • 20. The customer’s context context is related to time: slow changing: the defining characteristic of a person fast changing: events which influence our lives, trends Require very different technology solutions !!!
  • 21. Challenges Not much time to react Events must be delivered fast to the new machine APIs It’s Web, and Mobile Apps: latency budget is limited Loads of information to process Understand well the user history Access a larger context
  • 22. Big Data and Fast data ranking and preference segmentation and clustering short term trending topics rule-based recommendations 10’s Terabytes of Data. This can take hours …. 100’s of events per second. This must be fast ….
  • 23. Back to the drawing board
  • 24. core banking systems SOAP services and DBs System BUS customer facing appls channels A high-level bank schematic
  • 25. Higher separation ! Less silos Interactions with core systems Bigger and Faster
  • 28. Hadoop: Distributed Data OS Reliable Distributed, Replicated File System Low cost ↓ Cost vs ↑ Performance/Storage Computing Powerhouse All clusters CPU’s working in parallel for running queries
  • 29. Cassandra: A low-latency 2D store Reliable Distributed, Replicated File System Low latency Sub msec. read/write operations Tunable CAP Define your level of consistency Data model: hashed rows, sorted wide columns Architecture model: No SPOF, ring of nodes, omogeneous system
  • 30. Scala / Akka / Spray: a WEB API reactive framework Actor A Actor B Actor C msg 1 msg 2 msg 3 msg 4 ● it scales horizontally (can run in cluster mode) ● maximum use of the available cores/memory ● processing is non-blocking, threads are re-used ● can parallelize computing power across many actors Very fast: 1000’s messages/sec Very reliable: auto recovery Lazy: compute only when required
  • 31. Putting it all together Hadoop application (actor based) millions of millions of λ= conversions ( lamda ) Data queues
  • 32. Science & Engineering Statistics, Data Science Python R Visualization IT Infra Big Data Java Scala SQL Hadoop: Big Data Infrastructure, Data Science on large datasets Big Data and Fast Data requires different profiles to be able to achieve the best results
  • 33. Some lessons learned ● Mix and match technologies is a good thing ● Fast Data must complement Big Data ● Ease integration among teams ● Hadoop, Cassandra, and Akka ● Data Science takes time to figure out
  • 34. Parallelism Mathematics Programming Languages Machine Learning Statistics Big Data Algorithms Cloud Computing Natalino Busa @natalinobusa www.natalinobusa.com Thanks ! Any questions?