SlideShare a Scribd company logo
1 of 50
Create a Data Science Lab with
Microsoft and Open Source Tools
Marcel Franke, pmOne AG, Germany
About me – Marcel Franke
Practice Lead Advanced Analytics & Data Science
pmOne AG – Germany, Austria, Switzerland
>10 years experiences with large scale
Data Warehouses based on SQL Server
Blog: dwjunkie.wordpress.com
What is data science?
The Definition
Data science incorporates varying
elements and builds on techniques and
theories from many fields, including
mathematics, statistics, data engineering,
pattern recognition and learning, advanced
computing, visualization, uncertainty
modeling, data warehousing, and high
performance computing with the goal of
extracting meaning from data and
creating data products.

Source: http://en.wikipedia.org/wiki/Data_science
A brief look into history
GAMBLING –
THAT’S WHERE
EVERYTHING
STARTED
The beginnings of gambling
Gambling exists since 3000 BC
First games based on dices

Origin in China and Mesopotamian
* Source: Tiemeyer, E.; Zsifkovitis, H.: Information als Führungsmittel, München: Computerwoche Verlag 1995
Scientific foundations
17th century Paradox of
Chevaliers de Méré
LaPlace und Fermat discussed
the paradox in several letters
The beginning of theory of
probability
* Source: http://de.wikipedia.org/wiki/De-M%C3%A9r%C3%A9-Paradoxon
The science in Data Science
Calculate probabilities
Pattern recognition
Calculation of analytical variance
Machine Learning
Simulations
Predictions
BI, Data Mining & Prediction
WEATHER
FORECAST
What do companies do today?
Walmart – The pioneer of data analytics

Source: Data Unser – Dr. Bloching, Bilder: walmart.com, yourdealz.de, squidoo.com, fuzzybrew.com
Visa

80% correct prediction of divorces
within the next 5 years
Reason: Divorce is the highest risk
for private insolvency
Source: visa.de
Customers need to find the right case

What do consumers
really do?
Blonde looks
somehow different 

The new washing powder is really great…
Data can be accessed easily…
… but, it‘s hard to analyze it.
Other areas of application
SOCIAL
MEDIA

PRODUCT REMOMMENDATION
RETARGETING

PREDICTIVE
MAINTENANCE

PREDICT RISKS

areas of
application
SALES PREDICTIONS

CUSTOMER ANLYSIS

DYNAMIC PRICING

DISPOSITION
How does this fit to Big Data?
Our starting point…
Structured data

Unstructured data

Harmonize and
generate Information
(Role of „Data Scientist“)

„BIG Data“
Volume, Variety, Velocity
Typical Big Data Architecture
Big Data Analytics

Excel

Big Data Advanced Analytics

PowerPivot
Big Data Preparation (SQL, Map Reduce)

Unstructured data

Structured data
Massive Parallel Processing

Big Data Storage Platform
“[Facebook] started in the Hadoop world. We are now bringing in
relational to enhance that. We're kind of going [in] the other
direction.”
“We've been there, and [we] realized that using the wrong
technology for certain kinds of problems can be difficult. We
started at the end and we're working our way backwards, bringing
in both.”
Ken Rudin,
Source: http://tdwi.org/articles/2013/05/06/facebooks-relationalplatform.aspx?j=192038&e=marcel.franke@pmone.com&l=50_HTML&u=3967541&mid=1060748&jb=84&m=1

Director of Analytics for Facebook
Some word to „R“
• R is a language and environment for statistical
computing and graphics
• R is Open Source under GNU general public license
• Most widely used statistical software
• Everything happens in-memory
• Comes with a package manager (~5000 packages)
• Provides also graphical functionalities
Samples of R
How to approach projects?
Starting Point
Problems, which we know from the BI world already, are further exacerbated by
big data.

•

Complexity of systems constantly grows

•

Amount of data growth exponentially (= Big Data)

•

Need for change is more frequent and is increasingly delving deeper into
business rules

•

Solutions can no longer be thought ahead
Solution Option 1 – Classic Deterministic

Everything can be planned and
design at the drawing board…
How does a system with products & components and their
relationships behaves with each other?

Quelle: Cesar Hidalgo
Solution Option 2 – Learn from „mother Nature“
• How does nature deal with complex non-linear systems?
• Evolution – Variation and selection – „Trial and Error“

„It is not the strongest of the species that
survives, nor the most intelligent but the one
most responsive to change.“ (Charles Darwin)
A candlestick?
45 Iterations

Technology helps, to speed iterations.
Laboratory & Factory
The laboratory

Try & Error
Pattern Recognition
Analytical Apps
An efficient laboratory to experiment
Power Pivot
In-Memory

Microsoft Excel

Power View

Unstructured
Data

Power Query

Source Systems

Power Map

SQL Server

Structured
Data
OleD
B
Odata

WebServer-Logs
Sensor-Data

Data Marketplace

SAP

Databases
Easy to cosume

The factory
Integrated in the business process

Analyze on mass data

Host it and run it

At Enterpise Scale
For Realtime Enterprise
Stable Big Data Architecture
Prediction &
Data Science

Front-Ends &
Mobile
Windows
Azure

On-Premises

Source Systems

Unstructured
Data

WebServer-Logs
Sensor-Data

HDInsight

SQL Server PDW

Data Marketplace

Structured
Data

SAP

Databases
How do we scale?
The battle
How do we scale?
Relational data & compute

SQL Server 2012
Parallel Data
Warehouse
Half Rack

Infiniband

Analytical data &
compute

HP DL 385
40 Cores
2 TB RAM
Fusion-IO Card
What is Revolution Analytics?
• Founded in 2007
• Aim: Evolution of R for high-performance
• Offer R packages for faster performance and
greater stability
• Enterprise & Community products
• Stand-alone, Scale-out (HPC), on Hadoop
How do we handle our data?
R-ODBC: 10 MB/s

Flat file export: 80 MB/s

Data preparation

Data transfer

predictive scripts
Results
• Generate predictions for 30.000 customers
–
–
–
–

•
•
•
•

50.000 rows per customer, 54 columns
Customer goal: 5 Minutes
Our solution: 7.500 customers in 5 Minutes
Benchmark: 1 Minute

Revolution Analytics ODBC driver does not work with PDW
Standard R ODBC driver reads data with 10 MB/s
Workaround via flat file export
RDS format faster than csv
Other solutions?
• R in database
• R on Hadoop
– RHadoop
– Revolution Analytics RHadoop
Other solutions?
• Services & Cloud
THANK YOU!
• For attending this session and
PASS SQLRally Nordic 2013, Stockholm
Titles are set to 34 pt, Arial
Click to edit Master title style
• Level 1 text is 28 pt Arial
– Level 2 text is 24 pt Arial
• Level 3 text is 20 pt Arial
– Level 4 text is 20 pt Arial
• Level 5 text is 20 pt Arial
Notes (hidden)
• Some speakers may use this slide for hidden
notes
• Please delete if you prefer not to use
• Please note you are also able to use notes
section for each slide

More Related Content

What's hot

Big Data with SAP HANA Vora
Big Data with SAP HANA VoraBig Data with SAP HANA Vora
Big Data with SAP HANA VoraVigram V
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformAmazon Web Services
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Ocean9, Inc.
 
SAP HANA for Line of Business Sales
SAP HANA for Line of Business SalesSAP HANA for Line of Business Sales
SAP HANA for Line of Business SalesSAP Technology
 
データベースMeetup Vol3
データベースMeetup Vol3データベースMeetup Vol3
データベースMeetup Vol3Koji Shinkubo
 
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, CouchbaseDatabase Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase✔ Eric David Benari, PMP
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Willy Lulciuc
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resumeTarun P
 
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro MorisakiA11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro MorisakiInsight Technology, Inc.
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707Henrique Pinto
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDataWorks Summit
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesDatabricks
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricksBrandon Berlinrut
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 

What's hot (20)

Big Data with SAP HANA Vora
Big Data with SAP HANA VoraBig Data with SAP HANA Vora
Big Data with SAP HANA Vora
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
 
SAP HANA for Line of Business Sales
SAP HANA for Line of Business SalesSAP HANA for Line of Business Sales
SAP HANA for Line of Business Sales
 
データベースMeetup Vol3
データベースMeetup Vol3データベースMeetup Vol3
データベースMeetup Vol3
 
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, CouchbaseDatabase Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Tarun poladi resume
Tarun poladi resumeTarun poladi resume
Tarun poladi resume
 
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro MorisakiA11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
Varadarajan CV
Varadarajan CVVaradarajan CV
Varadarajan CV
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta Lakes
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 

Viewers also liked

Analytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und rAnalytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und rMarcel Franke
 
SAP HANA, Power Pivot, SQL Server – In-memory-Technologien im Vergleich
SAP HANA, Power Pivot, SQL Server – In-memory-Technologien im VergleichSAP HANA, Power Pivot, SQL Server – In-memory-Technologien im Vergleich
SAP HANA, Power Pivot, SQL Server – In-memory-Technologien im VergleichMarcel Franke
 
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015Marcel Franke
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentationiHub Research
 
Founding a Hadoop Data Science Lab
Founding a Hadoop Data Science LabFounding a Hadoop Data Science Lab
Founding a Hadoop Data Science LabAndre Langevin
 
Microsoft Data Science Technologies 201505
Microsoft Data Science Technologies 201505Microsoft Data Science Technologies 201505
Microsoft Data Science Technologies 201505Mark Tabladillo
 
Acid and base conc
Acid and base concAcid and base conc
Acid and base concDevonsdeals
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Mark Tabladillo
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1Chetan Khatri
 
States of matter
States of matterStates of matter
States of matterSiyavula
 
Lab report for water experiment
Lab report for water experimentLab report for water experiment
Lab report for water experimentAshwin12345
 
Implementing Science Investigations for the CSEC SBA
Implementing Science Investigations for the CSEC SBAImplementing Science Investigations for the CSEC SBA
Implementing Science Investigations for the CSEC SBADebbie-Ann Hall
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Analytics>Forward - Design Thinking for Data Science
Analytics>Forward - Design Thinking for Data ScienceAnalytics>Forward - Design Thinking for Data Science
Analytics>Forward - Design Thinking for Data ScienceZeydy Ortiz, Ph. D.
 
Diffusion lab report
Diffusion lab reportDiffusion lab report
Diffusion lab reportleroy walker
 
How to write a plan and design experiment
How to write a plan and design experimentHow to write a plan and design experiment
How to write a plan and design experimentMalikah Hypolite
 

Viewers also liked (20)

Analytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und rAnalytic powerhouse parallel data warehouse und r
Analytic powerhouse parallel data warehouse und r
 
SAP HANA, Power Pivot, SQL Server – In-memory-Technologien im Vergleich
SAP HANA, Power Pivot, SQL Server – In-memory-Technologien im VergleichSAP HANA, Power Pivot, SQL Server – In-memory-Technologien im Vergleich
SAP HANA, Power Pivot, SQL Server – In-memory-Technologien im Vergleich
 
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
In Memory-Technologien im Vergleich - SQL Server Konferenz 2015
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
Founding a Hadoop Data Science Lab
Founding a Hadoop Data Science LabFounding a Hadoop Data Science Lab
Founding a Hadoop Data Science Lab
 
Microsoft Data Science Technologies 201505
Microsoft Data Science Technologies 201505Microsoft Data Science Technologies 201505
Microsoft Data Science Technologies 201505
 
Hacking101 delhi 2013
Hacking101 delhi 2013Hacking101 delhi 2013
Hacking101 delhi 2013
 
Acid and base conc
Acid and base concAcid and base conc
Acid and base conc
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 
Lauric Acid Lab
Lauric Acid LabLauric Acid Lab
Lauric Acid Lab
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
 
States of matter
States of matterStates of matter
States of matter
 
Lab report for water experiment
Lab report for water experimentLab report for water experiment
Lab report for water experiment
 
Implementing Science Investigations for the CSEC SBA
Implementing Science Investigations for the CSEC SBAImplementing Science Investigations for the CSEC SBA
Implementing Science Investigations for the CSEC SBA
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Leroy sba
Leroy sbaLeroy sba
Leroy sba
 
Analytics>Forward - Design Thinking for Data Science
Analytics>Forward - Design Thinking for Data ScienceAnalytics>Forward - Design Thinking for Data Science
Analytics>Forward - Design Thinking for Data Science
 
Diffusion lab report
Diffusion lab reportDiffusion lab report
Diffusion lab report
 
How to write a plan and design experiment
How to write a plan and design experimentHow to write a plan and design experiment
How to write a plan and design experiment
 
React js
React jsReact js
React js
 

Similar to Create a Data Science Lab with Microsoft and Open Source tools

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackDenodo
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Andy Lathrop
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationInside Analysis
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistancephdAssistance1
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Jonathan Woodward
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightMSDEVMTL
 

Similar to Create a Data Science Lab with Microsoft and Open Source tools (20)

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16Bluegranite AA Webinar FINAL 28JUN16
Bluegranite AA Webinar FINAL 28JUN16
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - Phdassistance
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
OpenSistemas Corporate Presentation
OpenSistemas Corporate PresentationOpenSistemas Corporate Presentation
OpenSistemas Corporate Presentation
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Create a Data Science Lab with Microsoft and Open Source tools

  • 1.
  • 2. Create a Data Science Lab with Microsoft and Open Source Tools Marcel Franke, pmOne AG, Germany
  • 3. About me – Marcel Franke Practice Lead Advanced Analytics & Data Science pmOne AG – Germany, Austria, Switzerland >10 years experiences with large scale Data Warehouses based on SQL Server Blog: dwjunkie.wordpress.com
  • 4. What is data science?
  • 5. The Definition Data science incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. Source: http://en.wikipedia.org/wiki/Data_science
  • 6. A brief look into history
  • 8. The beginnings of gambling Gambling exists since 3000 BC First games based on dices Origin in China and Mesopotamian * Source: Tiemeyer, E.; Zsifkovitis, H.: Information als Führungsmittel, München: Computerwoche Verlag 1995
  • 9. Scientific foundations 17th century Paradox of Chevaliers de Méré LaPlace und Fermat discussed the paradox in several letters The beginning of theory of probability * Source: http://de.wikipedia.org/wiki/De-M%C3%A9r%C3%A9-Paradoxon
  • 10. The science in Data Science Calculate probabilities Pattern recognition Calculation of analytical variance Machine Learning Simulations Predictions
  • 11. BI, Data Mining & Prediction
  • 13. What do companies do today?
  • 14. Walmart – The pioneer of data analytics Source: Data Unser – Dr. Bloching, Bilder: walmart.com, yourdealz.de, squidoo.com, fuzzybrew.com
  • 15. Visa 80% correct prediction of divorces within the next 5 years Reason: Divorce is the highest risk for private insolvency Source: visa.de
  • 16. Customers need to find the right case What do consumers really do? Blonde looks somehow different  The new washing powder is really great…
  • 17. Data can be accessed easily…
  • 18. … but, it‘s hard to analyze it.
  • 19. Other areas of application SOCIAL MEDIA PRODUCT REMOMMENDATION RETARGETING PREDICTIVE MAINTENANCE PREDICT RISKS areas of application SALES PREDICTIONS CUSTOMER ANLYSIS DYNAMIC PRICING DISPOSITION
  • 20. How does this fit to Big Data?
  • 21. Our starting point… Structured data Unstructured data Harmonize and generate Information (Role of „Data Scientist“) „BIG Data“ Volume, Variety, Velocity
  • 22. Typical Big Data Architecture Big Data Analytics Excel Big Data Advanced Analytics PowerPivot Big Data Preparation (SQL, Map Reduce) Unstructured data Structured data Massive Parallel Processing Big Data Storage Platform
  • 23. “[Facebook] started in the Hadoop world. We are now bringing in relational to enhance that. We're kind of going [in] the other direction.” “We've been there, and [we] realized that using the wrong technology for certain kinds of problems can be difficult. We started at the end and we're working our way backwards, bringing in both.” Ken Rudin, Source: http://tdwi.org/articles/2013/05/06/facebooks-relationalplatform.aspx?j=192038&e=marcel.franke@pmone.com&l=50_HTML&u=3967541&mid=1060748&jb=84&m=1 Director of Analytics for Facebook
  • 24. Some word to „R“ • R is a language and environment for statistical computing and graphics • R is Open Source under GNU general public license • Most widely used statistical software • Everything happens in-memory • Comes with a package manager (~5000 packages) • Provides also graphical functionalities
  • 26. How to approach projects?
  • 27. Starting Point Problems, which we know from the BI world already, are further exacerbated by big data. • Complexity of systems constantly grows • Amount of data growth exponentially (= Big Data) • Need for change is more frequent and is increasingly delving deeper into business rules • Solutions can no longer be thought ahead
  • 28. Solution Option 1 – Classic Deterministic Everything can be planned and design at the drawing board…
  • 29. How does a system with products & components and their relationships behaves with each other? Quelle: Cesar Hidalgo
  • 30. Solution Option 2 – Learn from „mother Nature“ • How does nature deal with complex non-linear systems? • Evolution – Variation and selection – „Trial and Error“ „It is not the strongest of the species that survives, nor the most intelligent but the one most responsive to change.“ (Charles Darwin)
  • 32. 45 Iterations Technology helps, to speed iterations.
  • 34. The laboratory Try & Error Pattern Recognition Analytical Apps
  • 35. An efficient laboratory to experiment Power Pivot In-Memory Microsoft Excel Power View Unstructured Data Power Query Source Systems Power Map SQL Server Structured Data OleD B Odata WebServer-Logs Sensor-Data Data Marketplace SAP Databases
  • 36.
  • 37. Easy to cosume The factory Integrated in the business process Analyze on mass data Host it and run it At Enterpise Scale For Realtime Enterprise
  • 38. Stable Big Data Architecture Prediction & Data Science Front-Ends & Mobile Windows Azure On-Premises Source Systems Unstructured Data WebServer-Logs Sensor-Data HDInsight SQL Server PDW Data Marketplace Structured Data SAP Databases
  • 39.
  • 40. How do we scale?
  • 42. How do we scale? Relational data & compute SQL Server 2012 Parallel Data Warehouse Half Rack Infiniband Analytical data & compute HP DL 385 40 Cores 2 TB RAM Fusion-IO Card
  • 43. What is Revolution Analytics? • Founded in 2007 • Aim: Evolution of R for high-performance • Offer R packages for faster performance and greater stability • Enterprise & Community products • Stand-alone, Scale-out (HPC), on Hadoop
  • 44. How do we handle our data? R-ODBC: 10 MB/s Flat file export: 80 MB/s Data preparation Data transfer predictive scripts
  • 45. Results • Generate predictions for 30.000 customers – – – – • • • • 50.000 rows per customer, 54 columns Customer goal: 5 Minutes Our solution: 7.500 customers in 5 Minutes Benchmark: 1 Minute Revolution Analytics ODBC driver does not work with PDW Standard R ODBC driver reads data with 10 MB/s Workaround via flat file export RDS format faster than csv
  • 46. Other solutions? • R in database • R on Hadoop – RHadoop – Revolution Analytics RHadoop
  • 48. THANK YOU! • For attending this session and PASS SQLRally Nordic 2013, Stockholm
  • 49. Titles are set to 34 pt, Arial Click to edit Master title style • Level 1 text is 28 pt Arial – Level 2 text is 24 pt Arial • Level 3 text is 20 pt Arial – Level 4 text is 20 pt Arial • Level 5 text is 20 pt Arial
  • 50. Notes (hidden) • Some speakers may use this slide for hidden notes • Please delete if you prefer not to use • Please note you are also able to use notes section for each slide

Editor's Notes

  1. A lotoftopicsandskillsarecombinedData Warehouse is also a partofitMore Statisticsandmathematicskillsareneeded
  2. Wheredoes Data Science comefrom?
  3. Whenyou do someresearch on thattopicyou will automaticallystumbleaboutgamblingorgamesofchances.
  4. Dicecup
  5. 2 scientistsstartedthinkingaboutgamling on a morescientificway.Writing verylongletters back andforthDifferentprobabilitytowinifyouplaywith 1 diceor 2
  6. 1.)Howbigistheprobabilitytowinorloose, ortoreach a certaingoal?2.) Isthereanycorrelationbetweenthecustomerincomeandthesalesamount?5.) Whathappensifwechangecertainparameterslikeprice?6.) Whatisthesalesamoutof a certainproduct in thenextquarteroryear?
  7. Howdoesthistopic fit to BI?
  8. Whatcan I do withit?
  9. So what do companies do withit?I consciouslydidn‘tusetheword Big Data but you all knowthatthisnewareaisveryhot in marketingandnews. So whatarethegoodexamples & usecases?
  10. Kasse – cash deskBelohnung – rewardWindel - nappy
  11. Stellwert von R herausheben -> fast alle Anbieter basieren auf RWir viel im Bereich Open Source verwendet
  12. InjectorforwashingpelletsWaste, poorquality,
  13. Ideaof a processmodellcalled Lab & FactoryExperimental approachIterativeFastFind newpatterns
  14. Isforthedatascientisttoexperiment
  15. Ifwefoundsomethinginteresting, wecandeployittothefactoryIt‘stheplacewherewerunouranalyticalcode at Enterprise scale
  16. Mostoftheanalyticaltoolsare out thereforyearslike Databases, R, SAS, SPSSWeoftenherelimitations in scalability & performanceDB -> MPPR, SAS, -> In-Memory
  17. POC on different analyticusecaseswiththebigvendorsComplex SQL-QueriesSimulationsPredictionswith R
  18. SQL -> wir wissen wie wir skalierenR -> Skalierung schwierig, deshalb Revolution
  19. Kein stabiler Markt, viele Möglichkeiten