Marketplace and Quality Assurance Presentation - Vincent Chirchir
Big data webex sascha oehl
1. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Big Data Webex
Sascha Oehl
2. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Ist es real?
3. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
WOHER
Zukunft
Wenn wir nur die Zukunft kennen würden,
könnten wir in der Gegenwart die richtigen
Entscheidungen treffen.
4. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Life Sciences
Research
Location-Based
Advertising
One to One
Marketing
On-Demand
Maintenance
Satellite
Images
Fraud
Detection
Churn
Analysis
Risk
Analysis
Sentiment
Analysis
One to One
Marketing
Geomation
Farming
Location-Based
Advertising
Oil
Exploration
Network
Monitoring
Asset
Tracking
On-Demand
Maintenance
Traffic Flow
Optimization
Seismic
Monitoring
Satellite
Images
Fraud
Detection
Churn
Analysis
Risk
Analysis
Sentiment
Analysis
5. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
6. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Autoversicherungen
Berechnung zuerst nur nach meiner Fahrfähigkeit
Dann nach PS meines Autos
Dann auch nach Autotyp
Dann auch nach Regionen
Dann auch nach Familienstand, Kinder, Beruf,
Alter, Stellplatz, Wohnsituation, …
7. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
8. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Supermarkt
Aufbau zuerst nach Funktionalität
Dann nach Gespür des Marktleiters
Dann nach Befragung der Kunden
Dann nach Analyse des einzelnen Einkaufs
Dann nach Analyse meines Einkaufsverhaltens
Dann nach Analyse meines
geschäftsübergreifenden Einkaufsverhalten
9. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
10. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Automobilzulieferer
Just in Time Lieferung
Bedarf für eine Vorhersage was gebraucht wird
SAP APO (Advanced Planning and Optimization)
Vorhersage der Zukunft (Was wird gebraucht) auf
Basis der Vergangenheit und der Gegenwart
11. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Entwicklung
12. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
13. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Klassische Entscheidungsfindung
HiPPO
Highest payed persons opinion
Das „Bachgefühl“ / Die Erfahrung des
Entscheiders
14. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
15. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Umbruch in der
Entscheidungsfindung
Analyse von Daten
Schnelle Reaktionen
16. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Business
Process
Database Data
OLTP
Machine
Sensor Data, Complex Data
M2m Log
Files
Satellite
Imaging Bio-
Informatics
Sensors
Recording
Video
Human
Enterprise Content,
External Sources
Email
Documents
Web Logs Social
1x 10x 100x
Big Data Transforms how we Capture and Capitalize on Data
It is one of the Biggest Drivers of IT Spend Today!
17. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Store Control
Schnellere Entscheidungen
Genauere Steuerung
Weniger Aufwand
Insight Out!
Viele
Daten
rein
Mehr Datenquellen
Mehr Datenvolumen
Schnellere Speicherung und Analyse
Weniger Algorithmen
Geringere Datenqualität
Understand
18. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Konsequenz
19. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Klassisch
Man kennt den Grund
Man legt Indizes an
Die Anwendung ist klar
Man speichert was man benötigt
BigData
Man speichert
Keine Klarheit wonach man suchen wird
Man indiziert nicht
Datenbanken
20. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Klassisch
Geplant und individuell
Redundanz und Sicherheit
BigData
Schnell und billig
Skalierbar
Einfach
Infrastruktur
21. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Klassisch
Komplizierte Algorithmen
Vorhersage der Zukunft aus der
Vergangenheit
BigData
Interpretation der Gegenwart in
Verbindung mit der Vergangenheit
(Was ist das letzte mal in solch einer
Konstellation passiert)
Analyse
22. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Klassisch
Monatliche / Wöchentliche
Entscheidung
BigData
Tägliche / Stündliche
Entscheidung
Entscheidung
23. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Ausführung
24. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
HDS infrastructure for Big Data
HDS Solution – OPAD
One Platform for All Data
Big Data
Dark
Data
Multi-protocol/Multi-
data Type
Virtualized
Data Mobility
Universal
Management
Infrastructure On
Demand
HDI/HCP
UCP
SAP Oracle Microsoft®HNAS Hadoop
Resilience and
Protection
25. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Hadoop
Software Plattform zur
Verarbeitung großer
Datenmengen,
unstrukturiert und
semi-strukturiert.
Verteilung der Arbeit
auf viele Server.
26. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Cloudera
Eine kommerzielle
Implementation von
Apache Haddop
27. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Hadoop Ökosystem
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
28. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Hadoop Ökosystem
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
Daten Speicherung
HDFS
HBASE
Koordinierung
ZooKeeper
Datenverarbeitung
Map Reduce
Task Tracker
29. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Netzwerk
Hadoop Ökosystem
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
Daten Speicherung
HDFS
HBASE
Koordinierung
ZooKeeper
Datenverarbeitung
Map Reduce
Task Tracker
30. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Netzwerk
Hadoop Ökosystem
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
Daten Speicherung
HDFS
HBASE
Koordinierung
ZooKeeper
Datenverarbeitung
Map Reduce
Task Tracker
Orchestration
Oozie
Data Mining
Mahout
Datenzugriff
Flume
Sqoop
Client Zugriff
Hue
Hive
Pig
31. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Netzwerk
Hadoop Rechnertypen
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
Data Node (3-…)
Daten Speicherung
HDFS
HBASE
Datenverarbeitung
Map Reduce
Task Tracker
32. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Netzwerk
Hadoop Rechnertypen
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
Data Node (3-…)
Daten Speicherung
HDFS
HBASE
Datenverarbeitung
Map Reduce
Task Tracker
Cluster Name Node
Verwaltung des verteilten
Dateisystems
Job Tracker
33. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Netzwerk
Hadoop Rechnertypen
Java Virtual Machine
Betriebssystem – Linux (Ubuntu, RedHat…) / Windows
Hardware
Data Node (3-…)
Daten Speicherung
HDFS
HBASE
Datenverarbeitung
Map Reduce
Task Tracker
Edge Node
Annahme von Anfragen
Client Zugriff
Hue,Hive,Pig
Orchestration
Oozie
Koordinierung
ZooKeeper
Cluster Name Node
Verwaltung des verteilten
Dateisystems
Job Tracker
34. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Shared FS
Java Virtual MachineJava Virtual Machine
Netzwerk
Hadoop Second Copy für die Name Nodes
Betriebssystem – Linux (Ubuntu,
RedHat…) / Windows
Hardware
Active
Cluster Name Node
Verwaltung des verteilten
Dateisystems
Job Tracker
Betriebssystem – Linux (Ubuntu,
RedHat…) / Windows
Hardware
Standby
Cluster Name Node
Housekeeping
Backup Copy
Verwaltung des verteilten
Dateisystems nach
Übernahme
Job Tracker nach Übernahme
35. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
36. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Data Warehouse
Anwendungs Server
Datenbank
Server
Storage
37. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Data Warehouse
Anwendungs Server
Datenbank
Server
Storage
Big Data
Edge
Cluster
Name
Node
Data Nodes
38. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Pre-tested, pre-integrated hardware and software
Hadoop Reference Architecture
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
39. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Can be customized to fit any application
‒ Customer purchases Cloudera and other applications from vendors
Comprehensive Big Data services
‒ Red Hat Linux, Cloudera
Hadoop Reference Architecture
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
40. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Management node
‒ 1 x Compute Rack 210H Server
‒ 2 x 6-core E2620@2Ghz Processors
‒ 64GB RAM
‒ 2xGigE (onboard)
‒ 2 x 300GB SAS 10K RPM
Networking
‒ 2 x Cisco Nexus 3348
Hadoop Reference Architecture
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
41. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Hadoop Cluster Name nodes (Primary + Secondary)
‒ 2 x Hitachi Compute Rack 220S server
‒ 2 x 8-core E2470@2,3 Ghz
‒ 64GB RAM
‒ 2 x GigE (onboard)
‒ 12 x 3.5-inch 3TB NL-SAS 7200 RPM drives
Datenredundanz durch Raid5 11+1
Hadoop Reference Architecture
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
42. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Hadoop Data nodes
‒ 3… x Hitachi Compute Rack 220S server
‒ 2 x 8-core E2470@2,3 Ghz
‒ 64GB RAM
‒ 2 x GigE (onboard)
‒ 12 x 3.5-inch 3TB NL-SAS 7200 RPM drives
Datenredundanz durch Kopien der Daten auf
mehreren Nodes
Performancesteigerung durch hinzufügen weiterer Nodes
Grundlage für Sizing der Lösung
‒ Terasort – 120MB/s pro Node
‒ TestDFSIO – 75MB/s Write pro Node
Hadoop Reference Architecture
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
43. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Desaster Recovery Schutz
‒ Nutzung von DistCp
„Spiegelung“
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
Primäre Seite
Cluster A
Sekundäre Seite
Cluster B
Parallele Kopie über IP
44. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Backup
Datensicherung
Management
DATA
NODE
-
HDFS
TASK
TRAC
KER
Name Node
Sec Name Node
LAN
Primäre Seite
Cluster A
Inkrementelles Backup der
HDFS Dateien
Datensicherung der Name Nodes
45. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
46. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Hitachi Hadoop Appliance
Skalierbar von 3 bis viele Data Nodes
Nutzung marktführender Integration
Klare Performanceerwartungen
Verfügbar
47. CONFIDENTIAL – For use by Hitachi Data Systems employees and other audiences under NDA only.
Fragen?