SAS & Hadoop – das passt!
Guido Oswald ( @guidooswald )
www.sasforum.com/ch
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WO FÄNGT BIG DATA AN?!
Wenn Excel explodiert?
Wenn ich meine “...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BIG DATA IST WIE TEENAGER LIEBE?
Jeder redet darüber – keiner ...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HADOOP THE CUTE ELEPHANT
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WARUM IST HADOOP INTERESSANT?
SKALIERBARKEIT
LEISTUNGSSTARK
PR...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
 Hadoop wird sehr bald ein(e) Ersatz Ergänzung sein zu:
 Bus...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WARUM SAS?
IN-MEMORY
HIGH-PERFORMANCE
ANALYTICS
BUSINESS INTEL...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP GRÜNDE FÜR DIE KOMBINATION BEIDER WELTEN
 High-p...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“BIG DATA” – DATEN IM ÜBERFLUSS
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“HADOOP”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“ANALYTICS”
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ERA OF
ABUNDANCE
“ANALYTICS”
Überfluss an
Daten
Verabeitungs-
...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BIG DATA
ANALYTICS
BAUSTEINE VON USE CASES
Kunden
Haushalte
Ko...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DETOUR…
Company Confidential - For Internal Use Only
Copyright © 2015, SAS Institute Inc. All rights reserved.
BIG DATA LAB
FINDEN...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
BIG DATA
VORGEHEN
TRADITIONELLER PROJEKTANSATZ
Business
Case
M...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Innovation
Lab
Innovation
Lab
BIG DATA
VORGEHEN
INNOVATION LAB...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS ANGEBOT BIG DATA LAB
TECHNOLOGIE SERVICE
Größenskalierung
...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ZURÜCK ZUM THEMA..
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP SAS® UND DAS HADOOP ECOSYSTEM
Next-Gen
SAS
®
User...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
MAP REDUCE A (SIMPLE) WORD COUNT…
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
Hadoop kann
sehr schnell
sehr komplex
werden!
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
HADOOP
ECOSYSTEM
KOMPLEXITÄT REDUZIEREN
Pig (Skriptsprache)
Hi...
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
SAS DATA LOADER
FÜR HADOOP
Self-service Big
Data Aufbereitung
...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP WIE?
SAS & Hadoop verbinden sich auf verschiedene...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
FROM
SAS & HADOOP SAS FROM HADOOP
SAS hat Zugriff auf und schi...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
WITH
SAS & HADOOP SAS WITH HADOOP
SAS greift auf Daten in Hado...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
IN
SAS & HADOOP SAS IN HADOOP
SAS verarbeitet Daten direkt im ...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
SAS & HADOOP SAS IN HADOOP
SAS verarbeitet Daten direkt im Had...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
DER PRAGMATISCHE ANSATZ
Prepare data IN
Hadoop for
analytics
M...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
ROGERS MEDIA
 Data visualization & high performance analytics...
Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
MACY’S
 20% reduction in churn
 $500,000 annual savings
 Cu...
www.sasforum.com/ch
Guido Oswald (@guidooswald) – Guido.Oswald@sas.com
Nächste SlideShare
Wird geladen in …5
×

SAS Forum Switzerland 2015: Big Data - Guido Oswald

570 Aufrufe

Veröffentlicht am

Viele Unternehmen (vom Multinationalen Großunternehmen bis hin zum KMU) experimentieren bereits mit Hadoop als zuverlässige und günstige Datenplattform.
Egal ob als Ersatz für das DWH, parallel zum DWH oder als 'Staging Platform', dem sog. Data Lake, Hadoop hat viele Vorteile was Effizienz und Performance angeht und ist zudem erst einmal lizenzkostenfrei. Der putzige Elefant hat das Potential die Karriere von Linux im Rechenzentrum zu wiederholen.
Für SAS ist Hadoop ein richtiger Glücksgriff. Nicht nur als günstiger und agiler Datenspeicher, sondern auch als Rechenplattform für die verteilten Prozeduren und die massiv parallel rechnende In-Memory Engine "LASR".
Wie SAS einen Hadoop Cluster nutzen kann und wie andere MPP Datenbanken (SAP HANA, Teradata, Pivotal) in dieses Bild passen soll dieser Vortrag zeigen.

Veröffentlicht in: Software
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
570
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
0
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie
  • Paul Kent
    Datenmengen, die zu groß oder zu komplex sind oder sich zu schnell ändern, um sie mit händischen und klassischen Methoden der Datenverarbeitung auszuwerten
  • Jeder redet darüber – keiner Weiss wie es geht abder jeder denkt der andere macht es – also behauptet jeder er macht es auch
  • Created by someone working at Yahoo, it was released as an Open Source project in 2008, and today is managed by the non-profit Apache Software Foundation.

    Hadoop was built for fast, low cost, efficient, and data-protected file manipulation. It excels at massively-parallelized file manipulation - its ability to handle huge amounts of data – any kind of data – quickly.

    It was not built for advanced analytics. Because of its very infrastructure (the nodes don’t intercommunicate except through sorts and shuffles), iterative algorithms require multiple map-shuffle/sort-reduce phases to complete. This creates multiple files between MapReduce phases and is very inefficient for advanced analytic computing. It has gotten better in recent years, with the addition of many 3rd party (some of them also open source), but remains, essentially, a bulk file and data manipulation storage system.

    Here are some of the more popular uses for the framework today:
    Low-cost storage and active data archive.
    Staging area for a data warehouse and analytics store.
    Data lake.
    Sandbox for discovery and analysis.
  • TDWI research (Q2 2014) sponsored by Cloudera, EMC Greenplum, Hortonworks, ParAccel, SAP, SAS, Tableau Software, and Teradata.

    Hadoop clearly poised to become a *complement*, and NOT a replacement, to BI, DW, DI and analytics.

    However, quite interesting, when asked if Hadoop was currently in production, only 10% of respondents confirmed that their Hadoop deployment was actually used in Production today. Why such a low number?

    Well, while Hadoop is driven first and foremost to become an enabler of Analytics, it does not have the actual analytics capabilities built in. Trying to develop those capabilities within the Hadoop ecosystem, using the Hadoop components such as MapReduce, Hive, etc, results in staffing issues and a high cost of in-house development.
  • A privately-held company established in 1976, SAS is the #1 World Leader in Advanced Business Analytics - 38% Market Share in 2013
    We have 14,000 Employees Worldwide, with our solutions deployed in over 135 countries.

    SAS leads the world in Analytics (latest reviews by IDC and Forrester), as well as being in the Leader quadrants in *17* of Gartner’s Magic Quadrants (from Data Management, to Business Intelligence to Advanced Analytics)

    SAS reported revenues of US$3 billion dollars in 2013, and is famous for its industry-leading 25% reinvestment in R&D.
  • Why? To provide High-performance Advanced Analytics, Business Intelligence and Data Visualization on a Low Cost, Distributed, Massive Scale.
  • Big Data, as defined when the Volume, variety of velocity of the data is just too much for an organization’s systems or processes, to manage it in a timely manner to make business decisions, was introduced in 2011. Before that time, virtually no one had heard of these terms in this context.
  • Interestingly enough, Hadoop was around before the “Big Data” term was coined, and has being on a steady inclined ever since.
  • Finally, if we look at the interest in analytics, we find also a steady incline for the last 10 years or so.
  • This is where we are now. What we call, the “Era of Abundance”. Lots of data, the processing power to handle it, and the Intelligence to do the right thing with it.
  • Letztlich die Zusammenfassung:
    Hadoop bringt die Vielfalt der Daten und
    SAS die Big Data Analytics Technologie


    Zusammen gibt dies das Rezept für neue Use Cases!
  • Bringing all of the use cases together….

    We are an integral part of the rapidly evolving Hadoop ecosystem
    We integrate with Hadoop…
    We complement Hadoop…
    We go beyond what Hadoop can offer for each component of the Analytics Lifecycle…
  • Three years ago we told you about hadoop and its limitations … now the market and the community has responded… SAS leads the way with in-memory and alternative parallel processing patterns … edge…

    Moving from left to right…all of the four above mentioned design patterns are covered..the goal is how can we meet SAS user needs today and in future…and also understand and meet the needs of new generation of users (e.g. data scientists)….
  • SAS Data Loader for Hadoop is a new offering from SAS purpose built to solve the big data challenges that we talked about previously.
    It has a web-based wizard-driven user interface that minimizes the need for training and improves the productivity of business analysts and data scientists working with hadoop.

    Certified by Cloudera and Hortonworks
  • The From approach is the “traditional” established SAS approach, where Hadoop can be treated simply like any other data source. As noted on the slide, the FROM is really bi-directional, and can write back TO Hadoop using the same approach. It is mostly meant to represent that we mainly take data FROM the Hadoop cluster to process in a SAS environment.

  • With the WITH approach, SAS introduces a number of concepts.

    First, we now have the LASR Analytics Server. This is a core piece of our technology that allows for massively parallel, distributed, in-memory processing of advanced analytics. LASR is a purpose-built analytics server, that can run advanced analytics, in a massively parallelized environment (meaning it leverages memory *and* processing from multiple servers). Since it was built for advanced analytics, it can produce results faster and with very few instructions – whereas the same results on Hadoop are traditionally produced using hundreds - even thousands of lines of codes.

    Second, we also have the SAS Embedded Process, which is a light weight, non-invasive technology that allows the communication with and the leverage of Hadoop technologies to lift data into memory in an optimized, extremely fast way. Notice the multiple arrows, which means that if you have, say, 16 data nodes, you will be able to parallelize and lift the data in SAS’ in-memory environment 16 times faster.

    This WITH concept really means “BESIDES HADOOP”, or ALONGSIDE it. As long as we’re leveraging massive parallelization for both the data and the processing.
  • The ‘IN’ approach also leverages the light weight SAS Embedded Process, but this time, it is to run specialized SAS code (data quality, data transformation and manipulation, scoring) directly in the Hadoop cluster… effectively leveraging the massive Hadoop parallel processing and native resources such as MapReduce.

    Not all SAS code can be executed this way. Strategic deployment such as Scoring code, Data transformation code or Data Quality code can be applied in this manner.

    SAS has been doing this for a very long time… this is not new for us. Taking sophisticated scoring code and running it in place, inside a database. Now we’re extending this capability to Hadoop.

    This is ideal when data is so voluminous that lifting it all in memory would be prohibitive. We can explore the data to find what is relevant even before doing data transformations for modelling. Alternatively, we can also “model at scale” – the idea of automatically segmenting the data (with tools such as SAS® Visual Statistics) and then building models by segment.
  • Another version of the ‘IN’ approach… where the in-memory solutions from SAS are deployed IN the Hadoop cluster, effectively sharing the cluster with HADOOP, and leveraging YARN to manage necessary resources.
  • The complete analytical life cycle is important to understand, as this is the reality most companies face:

    - Data needs to be prepared specifically for analytics (a crucial step), then it needs to be explored in a highly efficient environment, purpose built for interactive visualization, then it needs to be modeled in a purpose built advanced analytics environment. Finally, many times the final scoring can happen where the bulk of the data reside, in Hadoop.

    Through it all, key metadata act as glue, ensuring proper governance of the processes and data, tracking lineage and impact analysis, so that the user can know what may result from any changes at any point in the cycle.
  • The ultimate goal was to position the most adequate advertising to a given visiting customer on Rogers’ web site.

    Traits are a characteristics/parameter of each visit. For example, the time of a visit, the number of clicks, the target browser, the device used (iPad, Samsung, etc). The 600 traits used in the final model were actually derived from a list of 75,000 original traits.

    http://youtu.be/wTnkg16jHwg
  • The initial objective: stop the “one size fits all email marketing” approach, resulting in a reduction of 20% in churn subscription. This lead to generating more accurate, real-time decisions about customer preferences. The ability to gain customer insight across channels is a critical part of improving customer satisfaction and revenues, and Macys.com uses SAS to validate and guide the site's cross- and up-sell offer algorithms.

    http://www.sas.com/en_us/customers/macys.html
  • This diagram shows two axes, degree of intelligence and the level of competitive advantage that can be achieved. I am going to propose that using data and applying analytics to data, can accelerate the loop of Intelligence and Experience that links strategy and operations.
    Most would agree that in the area of collections and recoveries, the historic intelligence is not a great predictor of the present, yet alone the future.

    Organisations start with data and may build data marts to allow them to access the data locked away in operational systems. Some bring in most data, others pick the data sources that are based upon past experience, so often complaints data and call centre file notes are omitted and yet both could be really useful in segmentation and predictive modelling.
    The data needs to be cleaned up as it is consolidated – garbage in garbage out!
    Then there is a whole set of reports queries and alerts that tell you where you have been or may also tell you where you are today, providing the information is available fast enough.
    But it is when you start to apply analytics to the data that business intelligence and competitive starts to grow

    Exploring data is all about understanding more about the data and relationships between data sources than you knew from experience or intuition. Yes we may know that impairment on zero rate balance transfers on the credit card are a I risk group, but what other factors are key in determining the different segments that we may wish to apply different collections strategies to?

    Forecasting is not about continuing the line on the graph, but about applying a range of forecasting analytical techniques to sets of data to work out what is most likely to occur in the future.

    Prediction involves building models based upon past experience. These models may be very complex and predict a binary outcome or a probability outcome. So for example, we might explore the customer base to identify the factors that are most likely to lead to default, purchase or churn. We could then build models based upon that data and predict which customers may impair and if they did, which would respond best to pre-delinquency contact?

    Finally, the pinnacle of the use of analytics is the use of optimisation analytics to deploy resources appropriately to achieve the greatest collection of debt with business constraints. So by way of example, if we wanted to put all impaired customers through at least one collection strategy, pull the over 90 days debt down by 30%, tackle a problem with the silver card customers, make only one call to each customer a week, have enough call centre staff to avoid any caller waiting longer the 5 seconds for an answer, what would be the right level of outbound mailing to generate the optimum level of collections whilst giving all objectives an appropriate level of attention.
    I will explore this in more detail later.

    That’s the power of SAS Analytics.

    According to Gartner (in a report issued February 2008): “SAS dominates in advanced analytic solutions. No other vendor in the Magic Quadrant has its range of capabilities or can point to the same number of advanced analytic deployments.”

    Forrester Research (in a report issued July 2008) says that “SAS remains the best game in town for fully integrated high-end analytics from a single vendor.”
  • Why Hadoop is being considered (or has been implemented), and HOW it will actually be exploited to derive value, are sometimes two very different things… Depending on the point of view.

    These are the key value drivers regarding how SAS affects Hadoop.

    Analysts, statisticians, data scientists, etc will be very interested in increasing the ACCURACY of their analysis, mainly because they can now:

    Run their analysis on more data (sometimes even all the data); and
    Run more complex algorithms because of the massively parallel processing

    The SCALABILITY will generally be a concern of IT as well as the Business side of things, but maybe not from the same angle. IT want to make sure they do not paint themselves into a corner, and that whatever architecture they deploy will meet the needs of the business down the road, while business just want to be able to embrace all of the Big Data coming their way.

    IT folks will likely be very focused on the GOVERNANCE of data: making sure it is properly secured, it is comprehensive and timely, etc.


    Finally, the VALUE (ECONOMICS) of the project needs to be embraced and recognized by all. Economics can be derived by:
    Increased self service of Hadoop data acquisition by SAS Analysts increases ability to generated insight from Hadoop as a new and rich data source
    Better value from Hadoop data is enabled through scale and accuracy of analytics possible through the SAS LASR Server (the ‘With’ approach). Insights are not bound by processing capability
    Better value from Hadoop data used for analytic insight (better quality, data shaping for analytics and ease of score code deployment in place) and ability to deploy In-Memory capabilities in the Hadoop cluster


  • SAS Forum Switzerland 2015: Big Data - Guido Oswald

    1. 1. SAS & Hadoop – das passt! Guido Oswald ( @guidooswald ) www.sasforum.com/ch
    2. 2. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    3. 3. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    4. 4. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    5. 5. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    6. 6. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WO FÄNGT BIG DATA AN?! Wenn Excel explodiert? Wenn ich meine “Comfort-Zone” verlasse? Sobald ich unstrukturierte Daten habe? Alles über 1TB? Die drei Vs?
    7. 7. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BIG DATA IST WIE TEENAGER LIEBE? Jeder redet darüber – keiner weiss wie es geht aber jeder denkt der andere macht es – also behauptet jeder er macht es auch
    8. 8. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    9. 9. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    10. 10. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HADOOP THE CUTE ELEPHANT
    11. 11. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WARUM IST HADOOP INTERESSANT? SKALIERBARKEIT LEISTUNGSSTARK PREISWERT - open source VERTEILTE VERARBEITUNG DATENREDUNDANZ HANDELSÜBLICHER SERVER
    12. 12. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.  Hadoop wird sehr bald ein(e) Ersatz Ergänzung sein zu:  Business Intelligence;  Data Warehousing;  Data Integration;  Analytics. QUELLE: 10 Myths About Hadoop - TDWI Best Practices Report HADOOP IN BETRIEB:  Grund #1 um Hadoop einzusetzen: Analytics (71%)  Herausforderungen beim Einsatz von Hadoop:  Hadoop hat keinerlei eingebauten, analytischen Funktionen.  Kosten: kostspielig aufgrund umfangreicher, eigengestrickter Lösungen. HEUTE < 12 MONATE < 24 MONATE < 36 MONATE 3+ JAHRE NIE 10% WARUM IST HADOOP INTERESSANT?
    13. 13. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WARUM SAS? IN-MEMORY HIGH-PERFORMANCE ANALYTICS BUSINESS INTELLIGENCE VISUALISIERUNG DATA MANAGEMENT
    14. 14. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP GRÜNDE FÜR DIE KOMBINATION BEIDER WELTEN  High-performance Advanced Analytics;  Business Intelligence und Data Visualization;  Massiv skalierbar, auf verteilter, handelsüblicher Hardware
    15. 15. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    16. 16. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “BIG DATA” – DATEN IM ÜBERFLUSS
    17. 17. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “HADOOP”
    18. 18. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “ANALYTICS”
    19. 19. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ERA OF ABUNDANCE “ANALYTICS” Überfluss an Daten Verabeitungs- Leistung Intelligenz
    20. 20. Copyr ight © 2015, SAS Institute Inc. All rights reser ved.
    21. 21. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BIG DATA ANALYTICS BAUSTEINE VON USE CASES Kunden Haushalte Konten Salden Produkte Historie … … GAA + SB Terminal Online Banking Mobile Apps Kooperations-Partner Beschwerden Web & Social Presse Bilanzen / XBRL … … Mustererkennung Korrelationen Prognosen Text Analytics … … In-Memory Hadoop SAP HANA … … Bekannte Daten (DWH) Neue, unbekannte und ungenutzte Daten Analytik Technologische Enabler
    22. 22. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DETOUR…
    23. 23. Company Confidential - For Internal Use Only Copyright © 2015, SAS Institute Inc. All rights reserved. BIG DATA LAB FINDEN SIE MIT SAS IHRE BIG-DATA-STRATEGIE
    24. 24. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. BIG DATA VORGEHEN TRADITIONELLER PROJEKTANSATZ Business Case Management Entscheidung Budget Freigabe Team aufsetzen Tool Auswahl Infrastruktur aufbauen Daten akquirieren Modelle erstellen Produktion vorbereiten Test Go Live Idee Ergebnis Anforderungen
    25. 25. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Innovation Lab Innovation Lab BIG DATA VORGEHEN INNOVATION LAB: AGIL – RISIKOARM – SKALIERBAR Business Case Management Entscheidung Budget Freigabe Team aufsetzen Tool Auswahl Infrastruktur aufbauen Daten akquirieren Modelle erstellen Produktion vorbereiten Test Go Live Idee Ergebnis Big Data Lab Modelle verfeinernDaten aktualisieren
    26. 26. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS ANGEBOT BIG DATA LAB TECHNOLOGIE SERVICE Größenskalierung S M L Bereit- stellung On- Premise Cloud Datenmanagement ► Data Loader for Hadoop ► Access to Hadoop ► Metadatenmanagement Analytics ► Visual Analytics ► Visual Statistics ► In-Memory Statistics Software- Lösungen ► Installation ► Konfiguration ► Training ► Umsetzung eines beispielhaften Use Cases Zusätzlich buchbare Dienstleistungen: ► Coaching und Bereitstellung von Experten (Data Scientist, Daten- Management-Experte) ► Consulting Einsatzfertiges Komplettpaket für die selbständige Entwicklung von Big Data Use Cases zum Fixpreis
    27. 27. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ZURÜCK ZUM THEMA..
    28. 28. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP SAS® UND DAS HADOOP ECOSYSTEM Next-Gen SAS ® User SAS ® User User Interface Metadata Data Access Data Processing File System SAS Metadata In-Memory Data Access HivePig Map Reduce HDFS Base SAS & SAS/ACCESS® to Hadoop™ In-Memory Data Access HivePig SAS® Data Management SAS® Visual Analytics SAS® Visual Statistics SAS® Enterprise Miner™ SAS® Studio SAS® LASR™ Analytic Server SAS Embedded Process SAS® In-memory Statistics for Hadoop
    29. 29. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. MAP REDUCE A (SIMPLE) WORD COUNT…
    30. 30. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. Hadoop kann sehr schnell sehr komplex werden!
    31. 31. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. HADOOP ECOSYSTEM KOMPLEXITÄT REDUZIEREN Pig (Skriptsprache) Hive (SQL) Cloudera Impala Proc Hadoop (BASE SAS) SAS ACCESS to Hadoop SAS ACCESS to Impala
    32. 32. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. SAS DATA LOADER FÜR HADOOP Self-service Big Data Aufbereitung für Fachanwender Certified by Hortonworks and Cloudera
    33. 33. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP WIE? SAS & Hadoop verbinden sich auf verschiedene Weise:  SAS kann Hadoop wie jede andere Datenquelle behandeln und Daten von (FROM) Hadoop lesen, wenn dies der geeignete Weg ist.  SAS kann mit (WITH) Hadoop arbeiten und Daten in eine spezialisierte ‘advanced analytics’ In-Memory-Umgebung heben.  SAS kann direkt in (IN) Hadoop arbeiten und die Fähigkeiten der verteilten Verarbeitung von Hadoop nutzen.   
    34. 34. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. FROM SAS & HADOOP SAS FROM HADOOP SAS hat Zugriff auf und schickt Daten von Hadoop zu einem SAS Server für die Verarbeitung. Ergebnisse warden zurückgeschrieben.  Eine Brücke wird von Hadoop zu existierenden SAS Umgebungen gebaut.  Hadoop wird genutzt als eine weitere Datenquelle.  Leistungsfähigkeit ist auf die Bandbreite einer ‘single pipe’ begrenzt.  Ideal für Fälle, wenn sich nicht alle zu analysierenden Daten in Hadoop befinden oder wenn ein etablierter Prozess nicht in Hadoop ablaufen kann. DATA MOVEMENT
    35. 35. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. WITH SAS & HADOOP SAS WITH HADOOP SAS greift auf Daten in Hadoop zu und verarbeitet diese auf einem SAS Server, während die Daten selbst und die Berechnungen massiv parallelisiert werden.  Stellt Fähigkeiten zur Verfügung, die Hadoop nicht gut selbst erledigen kann.  Unterstützt ‘Advanced Analytics’ durch geteilte Verarbeitung.  Erlaubt es, die Datenhaltung und die Verarbeitung der Analyse getrennt voneinander zu skalieren.  Ideal für Fälle, in denen analytische Genauigkeit, Ausgereiftheit der Algorithmen und Überwachung (Governance) benötigt werden. DATA LIFT INTO MEMORY
    36. 36. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. IN SAS & HADOOP SAS IN HADOOP SAS verarbeitet Daten direkt im Hadoop Cluster. SAS LOGIC  Der SAS ‘Embedded Process’ ermöglicht skalierende Berechnungs-Leistung in Hadoop .  SAS rechnet in Hadoop und fein abgestimmt durch Hadoop-Technolgie.  Unterstüzung für Daten-Transformation, Datenqualität und ‘Scoring’ in Hadoop.  Ideal, wenn alle Daten in Hadoop gehalten warden und Hadoop der richtige Ort für die Verarbeitung darstellt.
    37. 37. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. SAS & HADOOP SAS IN HADOOP SAS verarbeitet Daten direkt im Hadoop Cluster.  Der SAS ‘Embedded Process’ ermöglicht skalierende Berechnungs-Leistung in Hadoop .  SAS rechnet in Hadoop und fein abgestimmt durch Hadoop-Technolgie.  Unterstüzung für Daten-Transformation, Datenqualität und ‘Scoring’ in Hadoop.  Ideal, wenn alle Daten in Hadoop gehalten warden und Hadoop der richtige Ort für die Verarbeitung darstellt.  SAS In-Memory-Lösungen können auch direkt im Hadoop-Cluster auf geteilter Infrastrukutr installiert werden.
    38. 38. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. DER PRAGMATISCHE ANSATZ Prepare data IN Hadoop for analytics Move data FROM Hadoop into a SAS environment Deploy and manage model score code IN Hadoop Lift data IN to memory for analytics at scale Model data at scale in- memory WITH advanced modeling tools Use the right approach for what needs to be done! Explore data at scale, in- memory WITH data visualization SAS & HADOOP
    39. 39. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. ROGERS MEDIA  Data visualization & high performance analytics  Processing data on 12 million customers  40 million records per month in Hortonworks  More than 600 relevant web characteristics “Several of us from Rogers in the room looked at each other, and said ‘That is really wicked; that’s cool.” Chris Dingle Senior Director of Audience Solutions Rogers Communications
    40. 40. Copyr ight © 2015, SAS Institute Inc. All rights reser ved. MACY’S  20% reduction in churn  $500,000 annual savings  Customer lifetime value analysis  More accurate response prediction  Optimized promotions “... they can look at data and spend more time analyzing it and become internal consultants who provide more of the insight behind the data.” Kerem Tomak Vice President of Analytics
    41. 41. www.sasforum.com/ch Guido Oswald (@guidooswald) – Guido.Oswald@sas.com

    ×