SlideShare ist ein Scribd-Unternehmen logo
1 von 41
SAS & Hadoop – das passt!
Guido Oswald ( @guidooswald )
www.sasforum.com/ch
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WO FÄNGT BIG DATA AN?!
Wenn Excel explodiert?
Wenn ich meine “Comfort-Zone” verlasse?
Sobald ich unstrukturierte Daten habe?
Alles über 1TB?
Die drei Vs?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BIG DATA IST WIE TEENAGER LIEBE?
Jeder redet darüber – keiner weiss wie es
geht aber jeder denkt der andere macht es
– also behauptet jeder er macht es auch
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HADOOP THE CUTE ELEPHANT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WARUM IST HADOOP INTERESSANT?
SKALIERBARKEIT
LEISTUNGSSTARK
PREISWERT - open source
VERTEILTE VERARBEITUNG
DATENREDUNDANZ
HANDELSÜBLICHER SERVER
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
 Hadoop wird sehr bald ein(e) Ersatz Ergänzung sein zu:
 Business Intelligence;
 Data Warehousing;
 Data Integration;
 Analytics.
QUELLE: 10 Myths About Hadoop - TDWI Best Practices Report
HADOOP IN BETRIEB:
 Grund #1 um Hadoop einzusetzen:
Analytics (71%)
 Herausforderungen beim Einsatz von Hadoop:
 Hadoop hat keinerlei eingebauten,
analytischen Funktionen.
 Kosten: kostspielig aufgrund umfangreicher, eigengestrickter Lösungen.
HEUTE
< 12
MONATE
< 24
MONATE
< 36
MONATE
3+
JAHRE
NIE
10%
WARUM IST HADOOP INTERESSANT?
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WARUM SAS?
IN-MEMORY
HIGH-PERFORMANCE
ANALYTICS
BUSINESS INTELLIGENCE
VISUALISIERUNG
DATA MANAGEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP GRÜNDE FÜR DIE KOMBINATION BEIDER WELTEN
 High-performance Advanced Analytics;
 Business Intelligence und Data Visualization;
 Massiv skalierbar, auf verteilter, handelsüblicher Hardware
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“BIG DATA” – DATEN IM ÜBERFLUSS
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“HADOOP”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“ANALYTICS”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“ANALYTICS”
Überfluss an
Daten
Verabeitungs-
Leistung Intelligenz
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BIG DATA
ANALYTICS
BAUSTEINE VON USE CASES
Kunden
Haushalte
Konten
Salden
Produkte
Historie
…
…
GAA + SB Terminal
Online Banking
Mobile Apps
Kooperations-Partner
Beschwerden
Web & Social
Presse
Bilanzen / XBRL
…
…
Mustererkennung
Korrelationen
Prognosen
Text Analytics
…
…
In-Memory
Hadoop
SAP HANA
…
…
Bekannte Daten
(DWH)
Neue, unbekannte
und ungenutzte Daten
Analytik
Technologische
Enabler
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DETOUR…
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
BIG DATA LAB
FINDEN SIE MIT SAS IHRE BIG-DATA-STRATEGIE
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BIG DATA
VORGEHEN
TRADITIONELLER PROJEKTANSATZ
Business
Case
Management
Entscheidung
Budget
Freigabe
Team
aufsetzen
Tool Auswahl
Infrastruktur
aufbauen
Daten akquirieren Modelle erstellen
Produktion
vorbereiten
Test Go Live
Idee Ergebnis
Anforderungen
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Innovation
Lab
Innovation
Lab
BIG DATA
VORGEHEN
INNOVATION LAB: AGIL – RISIKOARM – SKALIERBAR
Business
Case
Management
Entscheidung
Budget
Freigabe
Team
aufsetzen
Tool Auswahl
Infrastruktur
aufbauen
Daten akquirieren Modelle erstellen
Produktion
vorbereiten
Test Go Live
Idee Ergebnis
Big Data
Lab
Modelle verfeinernDaten aktualisieren
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS ANGEBOT BIG DATA LAB
TECHNOLOGIE SERVICE
Größenskalierung
S M L
Bereit-
stellung
On-
Premise
Cloud
Datenmanagement
► Data Loader for Hadoop
► Access to Hadoop
► Metadatenmanagement
Analytics
► Visual Analytics
► Visual Statistics
► In-Memory Statistics
Software-
Lösungen
► Installation
► Konfiguration
► Training
► Umsetzung eines beispielhaften
Use Cases
Zusätzlich buchbare Dienstleistungen:
► Coaching und Bereitstellung von
Experten (Data Scientist, Daten-
Management-Experte)
► Consulting
Einsatzfertiges
Komplettpaket für die
selbständige
Entwicklung von
Big Data Use Cases
zum Fixpreis
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ZURÜCK ZUM THEMA..
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP SAS® UND DAS HADOOP ECOSYSTEM
Next-Gen
SAS
®
User
SAS
®
User
User
Interface
Metadata
Data
Access
Data
Processing
File
System
SAS Metadata
In-Memory
Data Access
HivePig
Map Reduce
HDFS
Base SAS & SAS/ACCESS® to Hadoop™
In-Memory
Data Access
HivePig
SAS® Data
Management
SAS® Visual
Analytics
SAS® Visual
Statistics
SAS®
Enterprise
Miner™
SAS®
Studio
SAS® LASR™ Analytic
Server
SAS Embedded
Process
SAS® In-memory
Statistics for
Hadoop
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
MAP REDUCE A (SIMPLE) WORD COUNT…
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Hadoop kann
sehr schnell
sehr komplex
werden!
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HADOOP
ECOSYSTEM
KOMPLEXITÄT REDUZIEREN
Pig (Skriptsprache)
Hive (SQL)
Cloudera Impala
Proc Hadoop (BASE SAS)
SAS ACCESS to Hadoop
SAS ACCESS to Impala
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
SAS DATA LOADER
FÜR HADOOP
Self-service Big
Data Aufbereitung
für Fachanwender
Certified by Hortonworks and Cloudera
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP WIE?
SAS & Hadoop verbinden sich auf verschiedene Weise:
 SAS kann Hadoop wie jede andere Datenquelle behandeln und
Daten von (FROM) Hadoop lesen, wenn dies der geeignete Weg
ist.
 SAS kann mit (WITH) Hadoop arbeiten und Daten in eine
spezialisierte ‘advanced analytics’ In-Memory-Umgebung heben.
 SAS kann direkt in (IN) Hadoop arbeiten und die Fähigkeiten der
verteilten Verarbeitung von Hadoop nutzen.



Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
FROM
SAS & HADOOP SAS FROM HADOOP
SAS hat Zugriff auf und schickt Daten von Hadoop zu
einem SAS Server für die Verarbeitung. Ergebnisse
warden zurückgeschrieben.
 Eine Brücke wird von Hadoop zu existierenden SAS Umgebungen gebaut.
 Hadoop wird genutzt als eine weitere Datenquelle.
 Leistungsfähigkeit ist auf die Bandbreite einer ‘single pipe’ begrenzt.
 Ideal für Fälle, wenn sich nicht alle zu analysierenden Daten in Hadoop
befinden oder wenn ein etablierter Prozess nicht in Hadoop ablaufen
kann.
DATA MOVEMENT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WITH
SAS & HADOOP SAS WITH HADOOP
SAS greift auf Daten in Hadoop zu und verarbeitet diese auf
einem SAS Server, während die Daten selbst und die
Berechnungen massiv parallelisiert werden.
 Stellt Fähigkeiten zur Verfügung, die Hadoop nicht gut selbst erledigen kann.
 Unterstützt ‘Advanced Analytics’ durch geteilte Verarbeitung.
 Erlaubt es, die Datenhaltung und die Verarbeitung der Analyse getrennt
voneinander zu skalieren.
 Ideal für Fälle, in denen analytische Genauigkeit, Ausgereiftheit der
Algorithmen und Überwachung (Governance) benötigt werden.
DATA LIFT INTO MEMORY
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IN
SAS & HADOOP SAS IN HADOOP
SAS verarbeitet Daten direkt im Hadoop Cluster.
SAS LOGIC
 Der SAS ‘Embedded Process’ ermöglicht skalierende Berechnungs-Leistung in
Hadoop .
 SAS rechnet in Hadoop und fein abgestimmt durch Hadoop-Technolgie.
 Unterstüzung für Daten-Transformation, Datenqualität und ‘Scoring’ in Hadoop.
 Ideal, wenn alle Daten in Hadoop gehalten warden und Hadoop der
richtige Ort für die Verarbeitung darstellt.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP SAS IN HADOOP
SAS verarbeitet Daten direkt im Hadoop Cluster.
 Der SAS ‘Embedded Process’ ermöglicht skalierende Berechnungs-Leistung in
Hadoop .
 SAS rechnet in Hadoop und fein abgestimmt durch Hadoop-Technolgie.
 Unterstüzung für Daten-Transformation, Datenqualität und ‘Scoring’ in Hadoop.
 Ideal, wenn alle Daten in Hadoop gehalten warden und Hadoop der
richtige Ort für die Verarbeitung darstellt.
 SAS In-Memory-Lösungen können auch direkt im
Hadoop-Cluster auf geteilter Infrastrukutr installiert werden.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DER PRAGMATISCHE ANSATZ
Prepare data IN
Hadoop for
analytics
Move data FROM
Hadoop into a SAS
environment
Deploy and manage
model score code
IN Hadoop
Lift data IN to
memory for analytics
at scale
Model data at scale in-
memory WITH advanced
modeling tools
Use the
right
approach
for what
needs to be
done!
Explore data at scale, in-
memory WITH data
visualization
SAS & HADOOP
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ROGERS MEDIA
 Data visualization & high performance analytics
 Processing data on 12 million customers
 40 million records per month in Hortonworks
 More than 600 relevant web characteristics
“Several of us from Rogers in
the room looked at each
other, and said ‘That is really
wicked; that’s cool.”
Chris Dingle
Senior Director of Audience Solutions
Rogers Communications
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
MACY’S
 20% reduction in churn
 $500,000 annual savings
 Customer lifetime value analysis
 More accurate response prediction
 Optimized promotions
“... they can look at data and
spend more time analyzing it
and become internal
consultants who provide more
of the insight behind the
data.”
Kerem Tomak
Vice President of Analytics
www.sasforum.com/ch
Guido Oswald (@guidooswald) – Guido.Oswald@sas.com

Weitere ähnliche Inhalte

Ähnlich wie SAS Forum Switzerland 2015: Big Data - Guido Oswald

Data Mesh: "Daten als Produkt" weitergedacht
Data Mesh: "Daten als Produkt" weitergedachtData Mesh: "Daten als Produkt" weitergedacht
Data Mesh: "Daten als Produkt" weitergedacht
IBsolution GmbH
 
Sas unternehmenspräsentation 2013
Sas unternehmenspräsentation 2013Sas unternehmenspräsentation 2013
Sas unternehmenspräsentation 2013
tnittel
 
HEC Deutsch MHoetger Espresso Web 300117
HEC Deutsch MHoetger Espresso Web 300117HEC Deutsch MHoetger Espresso Web 300117
HEC Deutsch MHoetger Espresso Web 300117
Michael Hötger
 
Cross Application Timesheet.pdf
Cross Application Timesheet.pdfCross Application Timesheet.pdf
Cross Application Timesheet.pdf
ssusereb0ae41
 
Mergers and Acquisitions in the software industry - deutscher Vortrag
Mergers and Acquisitions in the software industry - deutscher VortragMergers and Acquisitions in the software industry - deutscher Vortrag
Mergers and Acquisitions in the software industry - deutscher Vortrag
Dr. Karl-Michael Popp
 
Logical Data Warehouse - SQL mit Oracle DB und Hadoop
Logical Data Warehouse - SQL mit Oracle DB und HadoopLogical Data Warehouse - SQL mit Oracle DB und Hadoop
Logical Data Warehouse - SQL mit Oracle DB und Hadoop
OPITZ CONSULTING Deutschland
 

Ähnlich wie SAS Forum Switzerland 2015: Big Data - Guido Oswald (20)

Big data trend oder hype slideshare
Big data   trend oder hype slideshareBig data   trend oder hype slideshare
Big data trend oder hype slideshare
 
Data Mesh: "Daten als Produkt" weitergedacht
Data Mesh: "Daten als Produkt" weitergedachtData Mesh: "Daten als Produkt" weitergedacht
Data Mesh: "Daten als Produkt" weitergedacht
 
Sas unternehmenspräsentation 2013
Sas unternehmenspräsentation 2013Sas unternehmenspräsentation 2013
Sas unternehmenspräsentation 2013
 
Big Data Konnektivität
Big Data KonnektivitätBig Data Konnektivität
Big Data Konnektivität
 
SAP BW/4HANA - Ein Überblick
SAP BW/4HANA - Ein ÜberblickSAP BW/4HANA - Ein Überblick
SAP BW/4HANA - Ein Überblick
 
HEC Deutsch MHoetger Espresso Web 300117
HEC Deutsch MHoetger Espresso Web 300117HEC Deutsch MHoetger Espresso Web 300117
HEC Deutsch MHoetger Espresso Web 300117
 
IT Security Management mit ARIS Cloud Enterprise - AWS Security Web Day
IT Security Management mit ARIS Cloud Enterprise - AWS Security Web DayIT Security Management mit ARIS Cloud Enterprise - AWS Security Web Day
IT Security Management mit ARIS Cloud Enterprise - AWS Security Web Day
 
Webinar Big Data - Enterprise Readiness mit Hadoop
Webinar Big Data - Enterprise Readiness mit HadoopWebinar Big Data - Enterprise Readiness mit Hadoop
Webinar Big Data - Enterprise Readiness mit Hadoop
 
IT Trends 2011 - und welchen Einfluss Business Analytics darauf hat
IT Trends 2011 - und welchen Einfluss Business Analytics darauf hatIT Trends 2011 - und welchen Einfluss Business Analytics darauf hat
IT Trends 2011 - und welchen Einfluss Business Analytics darauf hat
 
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
 
Peter Hanke (Netapp Austria)
Peter Hanke (Netapp Austria)Peter Hanke (Netapp Austria)
Peter Hanke (Netapp Austria)
 
Analytics für Einsteiger
Analytics für EinsteigerAnalytics für Einsteiger
Analytics für Einsteiger
 
Webinar SAP/ABAP und Microsoft
Webinar  SAP/ABAP und MicrosoftWebinar  SAP/ABAP und Microsoft
Webinar SAP/ABAP und Microsoft
 
Cross Application Timesheet.pdf
Cross Application Timesheet.pdfCross Application Timesheet.pdf
Cross Application Timesheet.pdf
 
Mergers and Acquisitions in the software industry - deutscher Vortrag
Mergers and Acquisitions in the software industry - deutscher VortragMergers and Acquisitions in the software industry - deutscher Vortrag
Mergers and Acquisitions in the software industry - deutscher Vortrag
 
Logical Data Warehouse - SQL mit Oracle DB und Hadoop
Logical Data Warehouse - SQL mit Oracle DB und HadoopLogical Data Warehouse - SQL mit Oracle DB und Hadoop
Logical Data Warehouse - SQL mit Oracle DB und Hadoop
 
Big Data Webinar (Deutsch)
Big Data Webinar (Deutsch)Big Data Webinar (Deutsch)
Big Data Webinar (Deutsch)
 
Warum sap hana sql data warehousing
Warum sap hana sql data warehousingWarum sap hana sql data warehousing
Warum sap hana sql data warehousing
 
Portfolio 2016 animated style
Portfolio 2016   animated stylePortfolio 2016   animated style
Portfolio 2016 animated style
 
EOSD 2012: Deutsche Wolke
EOSD 2012: Deutsche WolkeEOSD 2012: Deutsche Wolke
EOSD 2012: Deutsche Wolke
 

SAS Forum Switzerland 2015: Big Data - Guido Oswald

  • 1. SAS & Hadoop – das passt! Guido Oswald ( @guidooswald ) www.sasforum.com/ch
  • 2. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 3. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 4. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 5. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 6. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WO FÄNGT BIG DATA AN?! Wenn Excel explodiert? Wenn ich meine “Comfort-Zone” verlasse? Sobald ich unstrukturierte Daten habe? Alles über 1TB? Die drei Vs?
  • 7. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BIG DATA IST WIE TEENAGER LIEBE? Jeder redet darüber – keiner weiss wie es geht aber jeder denkt der andere macht es – also behauptet jeder er macht es auch
  • 8. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 9. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 10. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HADOOP THE CUTE ELEPHANT
  • 11. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WARUM IST HADOOP INTERESSANT? SKALIERBARKEIT LEISTUNGSSTARK PREISWERT - open source VERTEILTE VERARBEITUNG DATENREDUNDANZ HANDELSÜBLICHER SERVER
  • 12. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.  Hadoop wird sehr bald ein(e) Ersatz Ergänzung sein zu:  Business Intelligence;  Data Warehousing;  Data Integration;  Analytics. QUELLE: 10 Myths About Hadoop - TDWI Best Practices Report HADOOP IN BETRIEB:  Grund #1 um Hadoop einzusetzen: Analytics (71%)  Herausforderungen beim Einsatz von Hadoop:  Hadoop hat keinerlei eingebauten, analytischen Funktionen.  Kosten: kostspielig aufgrund umfangreicher, eigengestrickter Lösungen. HEUTE < 12 MONATE < 24 MONATE < 36 MONATE 3+ JAHRE NIE 10% WARUM IST HADOOP INTERESSANT?
  • 13. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WARUM SAS? IN-MEMORY HIGH-PERFORMANCE ANALYTICS BUSINESS INTELLIGENCE VISUALISIERUNG DATA MANAGEMENT
  • 14. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP GRÜNDE FÜR DIE KOMBINATION BEIDER WELTEN  High-performance Advanced Analytics;  Business Intelligence und Data Visualization;  Massiv skalierbar, auf verteilter, handelsüblicher Hardware
  • 15. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 16. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “BIG DATA” – DATEN IM ÜBERFLUSS
  • 17. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “HADOOP”
  • 18. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “ANALYTICS”
  • 19. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “ANALYTICS” Überfluss an Daten Verabeitungs- Leistung Intelligenz
  • 20. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
  • 21. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BIG DATA ANALYTICS BAUSTEINE VON USE CASES Kunden Haushalte Konten Salden Produkte Historie … … GAA + SB Terminal Online Banking Mobile Apps Kooperations-Partner Beschwerden Web & Social Presse Bilanzen / XBRL … … Mustererkennung Korrelationen Prognosen Text Analytics … … In-Memory Hadoop SAP HANA … … Bekannte Daten (DWH) Neue, unbekannte und ungenutzte Daten Analytik Technologische Enabler
  • 22. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DETOUR…
  • 23. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. BIG DATA LAB FINDEN SIE MIT SAS IHRE BIG-DATA-STRATEGIE
  • 24. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BIG DATA VORGEHEN TRADITIONELLER PROJEKTANSATZ Business Case Management Entscheidung Budget Freigabe Team aufsetzen Tool Auswahl Infrastruktur aufbauen Daten akquirieren Modelle erstellen Produktion vorbereiten Test Go Live Idee Ergebnis Anforderungen
  • 25. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Innovation Lab Innovation Lab BIG DATA VORGEHEN INNOVATION LAB: AGIL – RISIKOARM – SKALIERBAR Business Case Management Entscheidung Budget Freigabe Team aufsetzen Tool Auswahl Infrastruktur aufbauen Daten akquirieren Modelle erstellen Produktion vorbereiten Test Go Live Idee Ergebnis Big Data Lab Modelle verfeinernDaten aktualisieren
  • 26. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS ANGEBOT BIG DATA LAB TECHNOLOGIE SERVICE Größenskalierung S M L Bereit- stellung On- Premise Cloud Datenmanagement ► Data Loader for Hadoop ► Access to Hadoop ► Metadatenmanagement Analytics ► Visual Analytics ► Visual Statistics ► In-Memory Statistics Software- Lösungen ► Installation ► Konfiguration ► Training ► Umsetzung eines beispielhaften Use Cases Zusätzlich buchbare Dienstleistungen: ► Coaching und Bereitstellung von Experten (Data Scientist, Daten- Management-Experte) ► Consulting Einsatzfertiges Komplettpaket für die selbständige Entwicklung von Big Data Use Cases zum Fixpreis
  • 27. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ZURÜCK ZUM THEMA..
  • 28. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP SAS® UND DAS HADOOP ECOSYSTEM Next-Gen SAS ® User SAS ® User User Interface Metadata Data Access Data Processing File System SAS Metadata In-Memory Data Access HivePig Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access HivePig SAS® Data Management SAS® Visual Analytics SAS® Visual Statistics SAS® Enterprise Miner™ SAS® Studio SAS® LASR™ Analytic Server SAS Embedded Process SAS® In-memory Statistics for Hadoop
  • 29. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. MAP REDUCE A (SIMPLE) WORD COUNT…
  • 30. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Hadoop kann sehr schnell sehr komplex werden!
  • 31. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HADOOP ECOSYSTEM KOMPLEXITÄT REDUZIEREN Pig (Skriptsprache) Hive (SQL) Cloudera Impala Proc Hadoop (BASE SAS) SAS ACCESS to Hadoop SAS ACCESS to Impala
  • 32. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. SAS DATA LOADER FÜR HADOOP Self-service Big Data Aufbereitung für Fachanwender Certified by Hortonworks and Cloudera
  • 33. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP WIE? SAS & Hadoop verbinden sich auf verschiedene Weise:  SAS kann Hadoop wie jede andere Datenquelle behandeln und Daten von (FROM) Hadoop lesen, wenn dies der geeignete Weg ist.  SAS kann mit (WITH) Hadoop arbeiten und Daten in eine spezialisierte ‘advanced analytics’ In-Memory-Umgebung heben.  SAS kann direkt in (IN) Hadoop arbeiten und die Fähigkeiten der verteilten Verarbeitung von Hadoop nutzen.   
  • 34. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. FROM SAS & HADOOP SAS FROM HADOOP SAS hat Zugriff auf und schickt Daten von Hadoop zu einem SAS Server für die Verarbeitung. Ergebnisse warden zurückgeschrieben.  Eine Brücke wird von Hadoop zu existierenden SAS Umgebungen gebaut.  Hadoop wird genutzt als eine weitere Datenquelle.  Leistungsfähigkeit ist auf die Bandbreite einer ‘single pipe’ begrenzt.  Ideal für Fälle, wenn sich nicht alle zu analysierenden Daten in Hadoop befinden oder wenn ein etablierter Prozess nicht in Hadoop ablaufen kann. DATA MOVEMENT
  • 35. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WITH SAS & HADOOP SAS WITH HADOOP SAS greift auf Daten in Hadoop zu und verarbeitet diese auf einem SAS Server, während die Daten selbst und die Berechnungen massiv parallelisiert werden.  Stellt Fähigkeiten zur Verfügung, die Hadoop nicht gut selbst erledigen kann.  Unterstützt ‘Advanced Analytics’ durch geteilte Verarbeitung.  Erlaubt es, die Datenhaltung und die Verarbeitung der Analyse getrennt voneinander zu skalieren.  Ideal für Fälle, in denen analytische Genauigkeit, Ausgereiftheit der Algorithmen und Überwachung (Governance) benötigt werden. DATA LIFT INTO MEMORY
  • 36. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. IN SAS & HADOOP SAS IN HADOOP SAS verarbeitet Daten direkt im Hadoop Cluster. SAS LOGIC  Der SAS ‘Embedded Process’ ermöglicht skalierende Berechnungs-Leistung in Hadoop .  SAS rechnet in Hadoop und fein abgestimmt durch Hadoop-Technolgie.  Unterstüzung für Daten-Transformation, Datenqualität und ‘Scoring’ in Hadoop.  Ideal, wenn alle Daten in Hadoop gehalten warden und Hadoop der richtige Ort für die Verarbeitung darstellt.
  • 37. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP SAS IN HADOOP SAS verarbeitet Daten direkt im Hadoop Cluster.  Der SAS ‘Embedded Process’ ermöglicht skalierende Berechnungs-Leistung in Hadoop .  SAS rechnet in Hadoop und fein abgestimmt durch Hadoop-Technolgie.  Unterstüzung für Daten-Transformation, Datenqualität und ‘Scoring’ in Hadoop.  Ideal, wenn alle Daten in Hadoop gehalten warden und Hadoop der richtige Ort für die Verarbeitung darstellt.  SAS In-Memory-Lösungen können auch direkt im Hadoop-Cluster auf geteilter Infrastrukutr installiert werden.
  • 38. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DER PRAGMATISCHE ANSATZ Prepare data IN Hadoop for analytics Move data FROM Hadoop into a SAS environment Deploy and manage model score code IN Hadoop Lift data IN to memory for analytics at scale Model data at scale in- memory WITH advanced modeling tools Use the right approach for what needs to be done! Explore data at scale, in- memory WITH data visualization SAS & HADOOP
  • 39. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ROGERS MEDIA  Data visualization & high performance analytics  Processing data on 12 million customers  40 million records per month in Hortonworks  More than 600 relevant web characteristics “Several of us from Rogers in the room looked at each other, and said ‘That is really wicked; that’s cool.” Chris Dingle Senior Director of Audience Solutions Rogers Communications
  • 40. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. MACY’S  20% reduction in churn  $500,000 annual savings  Customer lifetime value analysis  More accurate response prediction  Optimized promotions “... they can look at data and spend more time analyzing it and become internal consultants who provide more of the insight behind the data.” Kerem Tomak Vice President of Analytics

Hinweis der Redaktion

  1. Paul Kent Datenmengen, die zu groß oder zu komplex sind oder sich zu schnell ändern, um sie mit händischen und klassischen Methoden der Datenverarbeitung auszuwerten
  2. Jeder redet darüber – keiner Weiss wie es geht abder jeder denkt der andere macht es – also behauptet jeder er macht es auch
  3. Created by someone working at Yahoo, it was released as an Open Source project in 2008, and today is managed by the non-profit Apache Software Foundation. Hadoop was built for fast, low cost, efficient, and data-protected file manipulation. It excels at massively-parallelized file manipulation - its ability to handle huge amounts of data – any kind of data – quickly. It was not built for advanced analytics. Because of its very infrastructure (the nodes don’t intercommunicate except through sorts and shuffles), iterative algorithms require multiple map-shuffle/sort-reduce phases to complete. This creates multiple files between MapReduce phases and is very inefficient for advanced analytic computing. It has gotten better in recent years, with the addition of many 3rd party (some of them also open source), but remains, essentially, a bulk file and data manipulation storage system. Here are some of the more popular uses for the framework today: Low-cost storage and active data archive. Staging area for a data warehouse and analytics store. Data lake. Sandbox for discovery and analysis.
  4. TDWI research (Q2 2014) sponsored by Cloudera, EMC Greenplum, Hortonworks, ParAccel, SAP, SAS, Tableau Software, and Teradata. Hadoop clearly poised to become a *complement*, and NOT a replacement, to BI, DW, DI and analytics. However, quite interesting, when asked if Hadoop was currently in production, only 10% of respondents confirmed that their Hadoop deployment was actually used in Production today. Why such a low number? Well, while Hadoop is driven first and foremost to become an enabler of Analytics, it does not have the actual analytics capabilities built in. Trying to develop those capabilities within the Hadoop ecosystem, using the Hadoop components such as MapReduce, Hive, etc, results in staffing issues and a high cost of in-house development.
  5. A privately-held company established in 1976, SAS is the #1 World Leader in Advanced Business Analytics - 38% Market Share in 2013 We have 14,000 Employees Worldwide, with our solutions deployed in over 135 countries. SAS leads the world in Analytics (latest reviews by IDC and Forrester), as well as being in the Leader quadrants in *17* of Gartner’s Magic Quadrants (from Data Management, to Business Intelligence to Advanced Analytics) SAS reported revenues of US$3 billion dollars in 2013, and is famous for its industry-leading 25% reinvestment in R&D.
  6. Why? To provide High-performance Advanced Analytics, Business Intelligence and Data Visualization on a Low Cost, Distributed, Massive Scale.
  7. Big Data, as defined when the Volume, variety of velocity of the data is just too much for an organization’s systems or processes, to manage it in a timely manner to make business decisions, was introduced in 2011. Before that time, virtually no one had heard of these terms in this context.
  8. Interestingly enough, Hadoop was around before the “Big Data” term was coined, and has being on a steady inclined ever since.
  9. Finally, if we look at the interest in analytics, we find also a steady incline for the last 10 years or so.
  10. This is where we are now. What we call, the “Era of Abundance”. Lots of data, the processing power to handle it, and the Intelligence to do the right thing with it.
  11. Letztlich die Zusammenfassung: Hadoop bringt die Vielfalt der Daten und SAS die Big Data Analytics Technologie Zusammen gibt dies das Rezept für neue Use Cases!
  12. Bringing all of the use cases together…. We are an integral part of the rapidly evolving Hadoop ecosystem We integrate with Hadoop… We complement Hadoop… We go beyond what Hadoop can offer for each component of the Analytics Lifecycle…
  13. Three years ago we told you about hadoop and its limitations … now the market and the community has responded… SAS leads the way with in-memory and alternative parallel processing patterns … edge… Moving from left to right…all of the four above mentioned design patterns are covered..the goal is how can we meet SAS user needs today and in future…and also understand and meet the needs of new generation of users (e.g. data scientists)….
  14. SAS Data Loader for Hadoop is a new offering from SAS purpose built to solve the big data challenges that we talked about previously. It has a web-based wizard-driven user interface that minimizes the need for training and improves the productivity of business analysts and data scientists working with hadoop. Certified by Cloudera and Hortonworks
  15. The From approach is the “traditional” established SAS approach, where Hadoop can be treated simply like any other data source. As noted on the slide, the FROM is really bi-directional, and can write back TO Hadoop using the same approach. It is mostly meant to represent that we mainly take data FROM the Hadoop cluster to process in a SAS environment.
  16. With the WITH approach, SAS introduces a number of concepts. First, we now have the LASR Analytics Server. This is a core piece of our technology that allows for massively parallel, distributed, in-memory processing of advanced analytics. LASR is a purpose-built analytics server, that can run advanced analytics, in a massively parallelized environment (meaning it leverages memory *and* processing from multiple servers). Since it was built for advanced analytics, it can produce results faster and with very few instructions – whereas the same results on Hadoop are traditionally produced using hundreds - even thousands of lines of codes. Second, we also have the SAS Embedded Process, which is a light weight, non-invasive technology that allows the communication with and the leverage of Hadoop technologies to lift data into memory in an optimized, extremely fast way. Notice the multiple arrows, which means that if you have, say, 16 data nodes, you will be able to parallelize and lift the data in SAS’ in-memory environment 16 times faster. This WITH concept really means “BESIDES HADOOP”, or ALONGSIDE it. As long as we’re leveraging massive parallelization for both the data and the processing.
  17. The ‘IN’ approach also leverages the light weight SAS Embedded Process, but this time, it is to run specialized SAS code (data quality, data transformation and manipulation, scoring) directly in the Hadoop cluster… effectively leveraging the massive Hadoop parallel processing and native resources such as MapReduce. Not all SAS code can be executed this way. Strategic deployment such as Scoring code, Data transformation code or Data Quality code can be applied in this manner. SAS has been doing this for a very long time… this is not new for us. Taking sophisticated scoring code and running it in place, inside a database. Now we’re extending this capability to Hadoop. This is ideal when data is so voluminous that lifting it all in memory would be prohibitive. We can explore the data to find what is relevant even before doing data transformations for modelling. Alternatively, we can also “model at scale” – the idea of automatically segmenting the data (with tools such as SAS® Visual Statistics) and then building models by segment.
  18. Another version of the ‘IN’ approach… where the in-memory solutions from SAS are deployed IN the Hadoop cluster, effectively sharing the cluster with HADOOP, and leveraging YARN to manage necessary resources.
  19. The complete analytical life cycle is important to understand, as this is the reality most companies face: - Data needs to be prepared specifically for analytics (a crucial step), then it needs to be explored in a highly efficient environment, purpose built for interactive visualization, then it needs to be modeled in a purpose built advanced analytics environment. Finally, many times the final scoring can happen where the bulk of the data reside, in Hadoop. Through it all, key metadata act as glue, ensuring proper governance of the processes and data, tracking lineage and impact analysis, so that the user can know what may result from any changes at any point in the cycle.
  20. The ultimate goal was to position the most adequate advertising to a given visiting customer on Rogers’ web site. Traits are a characteristics/parameter of each visit. For example, the time of a visit, the number of clicks, the target browser, the device used (iPad, Samsung, etc). The 600 traits used in the final model were actually derived from a list of 75,000 original traits. http://youtu.be/wTnkg16jHwg
  21. The initial objective: stop the “one size fits all email marketing” approach, resulting in a reduction of 20% in churn subscription. This lead to generating more accurate, real-time decisions about customer preferences. The ability to gain customer insight across channels is a critical part of improving customer satisfaction and revenues, and Macys.com uses SAS to validate and guide the site's cross- and up-sell offer algorithms. http://www.sas.com/en_us/customers/macys.html
  22. This diagram shows two axes, degree of intelligence and the level of competitive advantage that can be achieved. I am going to propose that using data and applying analytics to data, can accelerate the loop of Intelligence and Experience that links strategy and operations. Most would agree that in the area of collections and recoveries, the historic intelligence is not a great predictor of the present, yet alone the future. Organisations start with data and may build data marts to allow them to access the data locked away in operational systems. Some bring in most data, others pick the data sources that are based upon past experience, so often complaints data and call centre file notes are omitted and yet both could be really useful in segmentation and predictive modelling. The data needs to be cleaned up as it is consolidated – garbage in garbage out! Then there is a whole set of reports queries and alerts that tell you where you have been or may also tell you where you are today, providing the information is available fast enough. But it is when you start to apply analytics to the data that business intelligence and competitive starts to grow Exploring data is all about understanding more about the data and relationships between data sources than you knew from experience or intuition. Yes we may know that impairment on zero rate balance transfers on the credit card are a I risk group, but what other factors are key in determining the different segments that we may wish to apply different collections strategies to? Forecasting is not about continuing the line on the graph, but about applying a range of forecasting analytical techniques to sets of data to work out what is most likely to occur in the future. Prediction involves building models based upon past experience. These models may be very complex and predict a binary outcome or a probability outcome. So for example, we might explore the customer base to identify the factors that are most likely to lead to default, purchase or churn. We could then build models based upon that data and predict which customers may impair and if they did, which would respond best to pre-delinquency contact? Finally, the pinnacle of the use of analytics is the use of optimisation analytics to deploy resources appropriately to achieve the greatest collection of debt with business constraints. So by way of example, if we wanted to put all impaired customers through at least one collection strategy, pull the over 90 days debt down by 30%, tackle a problem with the silver card customers, make only one call to each customer a week, have enough call centre staff to avoid any caller waiting longer the 5 seconds for an answer, what would be the right level of outbound mailing to generate the optimum level of collections whilst giving all objectives an appropriate level of attention. I will explore this in more detail later. That’s the power of SAS Analytics. According to Gartner (in a report issued February 2008): “SAS dominates in advanced analytic solutions. No other vendor in the Magic Quadrant has its range of capabilities or can point to the same number of advanced analytic deployments.” Forrester Research (in a report issued July 2008) says that “SAS remains the best game in town for fully integrated high-end analytics from a single vendor.”
  23. Why Hadoop is being considered (or has been implemented), and HOW it will actually be exploited to derive value, are sometimes two very different things… Depending on the point of view. These are the key value drivers regarding how SAS affects Hadoop. Analysts, statisticians, data scientists, etc will be very interested in increasing the ACCURACY of their analysis, mainly because they can now: Run their analysis on more data (sometimes even all the data); and Run more complex algorithms because of the massively parallel processing The SCALABILITY will generally be a concern of IT as well as the Business side of things, but maybe not from the same angle. IT want to make sure they do not paint themselves into a corner, and that whatever architecture they deploy will meet the needs of the business down the road, while business just want to be able to embrace all of the Big Data coming their way. IT folks will likely be very focused on the GOVERNANCE of data: making sure it is properly secured, it is comprehensive and timely, etc. Finally, the VALUE (ECONOMICS) of the project needs to be embraced and recognized by all. Economics can be derived by: Increased self service of Hadoop data acquisition by SAS Analysts increases ability to generated insight from Hadoop as a new and rich data source Better value from Hadoop data is enabled through scale and accuracy of analytics possible through the SAS LASR Server (the ‘With’ approach). Insights are not bound by processing capability Better value from Hadoop data used for analytic insight (better quality, data shaping for analytics and ease of score code deployment in place) and ability to deploy In-Memory capabilities in the Hadoop cluster