Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Hadoop, Oracle and the Industrial      Revolution of Data   Guy Harrison, Dell Software Group
Hadoop, Oracle and theIndustrial Revolution of DataGuy HarrisonExecutive Director, R&DInformation management group
Introductions                www.guyharrison.net                guy_harrison@dell.com                http://twitter.com/gu...
Dell, Quest and Toad4                      Software Group
5   Software Group
6   Software Group
7   Software Group
8   Software Group
9   Software Group
10   Software Group
Star trek shirt fatality analysis  RedYellow  Blue         0   10    20      30      40      50         60            70  ...
12   Software Group
13   Software Group
Quest Software is now part of Dell14                                   Software Group
“Big” Data?15                 Software Group
Three or Four “V”s                 Value                                   Competitive or Collective                      ...
Data volumes have always beenincreasing….     2006 Perspective17                              Software Group
Though the absolute volumes are     boggling…  Digital information     created 2011                                       ...
Velocity19              Software Group
20   Software Group
Fail whales 21           Software Group
Variety   OR – the               industrial               Revolution of               data22                        Softwa...
23   Software Group
24   Software Group
25   Software Group
26   Software Group
27   Software Group
28   Software Group
29   Software Group
Data: now and then       1993                 2013              Generated            Generated              internally    ...
“Big” data driven     by the smallest     devices31                       Software Group
Smartphone hardware• Quad-core 1.4 GHz CPU• 1GB RAM• 64GB Storage• 1080p display• GSM/Bluetooth/WiFi Network• 8MP Camera• ...
Smartphone software33                    Software Group
34   Software Group
35   Software Group
36   Software Group
37   Software Group
Name: Willy      BowmanNationality: GermanDON‟T MENTIONTHE WAR
Data Input39                Software Group
40   Software Group
Siri     “Siri call me an             “I want to jump off a     ambulance”                   bridge”     From now on, I‟ll...
Sixth-Sense42                 Software Group
43   Software Group
44   Software Group
Brain Control45              Software Group
46   Software Group
47   Software Group
48   Software Group
49   Software Group
50   Software Group
The intrumented human                              •       Compass                              •       Camera            ...
All this requires   But what else are they     and generates       good for?     huge data sets52                         ...
The data           Companies want to     “exhaust” itself   generate competitive     generates new      advantage through ...
Big Data Analytics     Machine                              Collective     Learning                             Intelligen...
55   Software Group
56   Software Group
57   Software Group
58   Software Group
59   Software Group
60   Software Group
61   Software Group
62   Software Group
63   Software Group
64   Software Group
65   Software Group
Search                                 Optimization                Advertising                                            ...
Collective     Intelligence                        ?     beats Artificial     Intelligence67                          Soft...
68   Software Group
69   Software Group
70   Software Group
71   Software Group
72   Software Group
For the last 40     years AI has     been     consistently     disappointing73                     Software Group
74   Software Group
75   Software Group
In 2011 AI made     a comeback76                     Software Group
77   Software Group
78   Software Group
79   Software Group
80   Software Group
81   Software Group
82   Software Group
83   Software Group
Google:     Pioneers of Big     Data84                     Software Group
85   Software Group
86   Software Group
87   Software Group
88   Software Group
Google Software Architecture                    Google Applications     Map Reduce            Chubby            BigTable  ...
Map Reduce             MAP              MAP               MAP                MAP                 MAP                   MAP...
Multi-stage Map-Reduce                  SORT     AGGREGATE   SCAN                  MAPPER    MAPPER     MAPPER            ...
Schema on Read     vs Schema on     Write92                    Software Group
Schema on Read vs Schema on Write                                                                                         ...
Hadoop: Open     Source Map-     Reduce Stack94                  Software Group
Hadoop at Yahoo     Yahoo! Hadoop cluster:        4000 nodes        16PB disk        64 TB of RAM        32,000 Cores95   ...
96   Software Group
MAP REDUCE             HADOOP CLIENT(DISTRIBUTED           (JAVA, PIG, HIVE)PROCESSING)                                   ...
Oozie (Workflow manager)      Hive              Pig               SQOOP           Flume     (Query)        (Scripting)    ...
HBaseA Real time database built on Hadoop                              Log         MemStore       Buffer Cache            ...
Hbase Data ModelName         Site                 Counter                                 NameId           Name           ...
Hive101          Software Group
102   Software Group
SQL                      JAVA      RESULTS103                          Software Group
Other SQL-like Hadoop Interfaces• Cloudera Impala• MapR Drill• Aster• Greenplumb (Pivotal HD)• Paraccel• Hadapt• Oracle SQ...
Pig105         Software Group
Pig Latin      SQL or Hive QL106                    Software Group
Meanwhile, back      at the      Deathstar…107                     Software Group
108   Software Group
109   Software Group
Oracle ExadataDatabase servers    Storage Servers 64 cores, 576 GB   112 cores,             RAM    100 TB SAS or          ...
Oracle Big Data Appliance  18 Sun X4270 M2 servers      − 48GB RAM per node (864GB total)      − 2x6 Core CPU per node (2...
Big Data Appliance Software• Cloudera  Enterprise• Oracle  Enterprise R• Oracle  NoSQL• Oracle Big  Data  Connectors115   ...
Latency       ORACLE                     ORACLE        ORACLE       BIG DATA                  EXALOGIC      EXALYTICS     ...
The following      week at the      Borg      collective….117                   Software Group
© 2012 Quest Software Inc. All rights reserved.                                                  118   Pg. 118
119   Software Group
Integrating      Hadoop and      RDBMS120                 Software Group
Scenario #1: Reference data in RDBMS                                      PRODUCTS                                     CUS...
Scenario #2: Hadoop for off-line analytics                                        PRODUCTS                                ...
Scenario #3: MapReduce output to RDBMS                                  DB QUERY                                          ...
Scenario #4: Hadoop as RDBMS “activearchive”                   QUERY                               TOOL                   ...
The Big Data      Stack125                  Software Group
DATA SCIENTISTCASCADING                           R (ET AL)                                                              J...
127   Software Group
DATA SCIENTIST                       BIG DATA ANALYTICS SOFTWARECASCADING                           R (ET AL)             ...
INDEXING                        SENTIMENT          AND                         ANALYSIS        SEARCH        VISUALIZATION...
In Summary….130                  Software Group
Hadoop is….131                 Software Group
Proven at Scale133               Software Group
A platform for Advanced analytics134                                 Software Group
ETL Free                                                                                       Schema on Write            ...
The most      concrete      technology      enabling the Big      Data revolution136                      Software Group
Hadoop is not….137                     Software Group
But future Enterprise      A replacement                      Data Architectures      for RDBMS                      will ...
Though OLTP systems      Suitable for                     can be built with      OLTP                     Hadoop-compatibl...
Hadoop alone only      A complete                   solves the storage      solution                   challenge of Big Da...
Shameless plugs141                     Software Group
Toad for Cloud Databases142                        Software Group
Toad BI SuiteBusiness Intelligencesolutions with first classsupport forHadoop, Oracle andmany other platforms143          ...
Kitenga Analytics Suite144                       Software Group
SharePlex® for Hadoop                   JMS Queue                Hadoop                                             Poster...
Toad for Hadoop                  Hive Query IDE                  Oracle <-> Hadoop data                  management       ...
147   Software Group
THANK YOUGuy_harrison@dell.com@guyharrisonguyharrison.net
THANK YOUGuy_harrison@dell.com    @guyharrison   guyharrison.net
Hadoop, Oracle and the big data revolution collaborate 2013
Hadoop, Oracle and the big data revolution collaborate 2013
Hadoop, Oracle and the big data revolution collaborate 2013
Nächste SlideShare
Wird geladen in …5
×

Hadoop, Oracle and the big data revolution collaborate 2013

2.094 Aufrufe

Veröffentlicht am

Presentation given at Collaborate 2013

Veröffentlicht in: Technologie

Hadoop, Oracle and the big data revolution collaborate 2013

  1. 1. Hadoop, Oracle and the Industrial Revolution of Data Guy Harrison, Dell Software Group
  2. 2. Hadoop, Oracle and theIndustrial Revolution of DataGuy HarrisonExecutive Director, R&DInformation management group
  3. 3. Introductions www.guyharrison.net guy_harrison@dell.com http://twitter.com/guyharrison3 Software Group
  4. 4. Dell, Quest and Toad4 Software Group
  5. 5. 5 Software Group
  6. 6. 6 Software Group
  7. 7. 7 Software Group
  8. 8. 8 Software Group
  9. 9. 9 Software Group
  10. 10. 10 Software Group
  11. 11. Star trek shirt fatality analysis RedYellow Blue 0 10 20 30 40 50 60 70 80 Pct 11 Software Group
  12. 12. 12 Software Group
  13. 13. 13 Software Group
  14. 14. Quest Software is now part of Dell14 Software Group
  15. 15. “Big” Data?15 Software Group
  16. 16. Three or Four “V”s Value Competitive or Collective advantage Volume Variety Terabytes Structured Petabytes Unstructured Exabytes Human Generated Zetabytes Machine Generated Velocity User populations x Transaction rates x Machine data16 Software Group
  17. 17. Data volumes have always beenincreasing…. 2006 Perspective17 Software Group
  18. 18. Though the absolute volumes are boggling… Digital information created 2011 2.13E+21Total Digital capacity 1.18E+21 Digital information 2008 4.87E+18 Living Human Genomes 5.48E+18 Google 1.10E+17 Human Brain 2.81E+15 1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21 Gigabyte Terabyte Petabyte Exabyte zettabyte 18 Software Group
  19. 19. Velocity19 Software Group
  20. 20. 20 Software Group
  21. 21. Fail whales 21 Software Group
  22. 22. Variety OR – the industrial Revolution of data22 Software Group
  23. 23. 23 Software Group
  24. 24. 24 Software Group
  25. 25. 25 Software Group
  26. 26. 26 Software Group
  27. 27. 27 Software Group
  28. 28. 28 Software Group
  29. 29. 29 Software Group
  30. 30. Data: now and then 1993 2013 Generated Generated internally externally Key to Key to operational competitiveness efficiency Source of product innovation Changing our world30 Software Group
  31. 31. “Big” data driven by the smallest devices31 Software Group
  32. 32. Smartphone hardware• Quad-core 1.4 GHz CPU• 1GB RAM• 64GB Storage• 1080p display• GSM/Bluetooth/WiFi Network• 8MP Camera• GPS & Compass32 Software Group
  33. 33. Smartphone software33 Software Group
  34. 34. 34 Software Group
  35. 35. 35 Software Group
  36. 36. 36 Software Group
  37. 37. 37 Software Group
  38. 38. Name: Willy BowmanNationality: GermanDON‟T MENTIONTHE WAR
  39. 39. Data Input39 Software Group
  40. 40. 40 Software Group
  41. 41. Siri “Siri call me an “I want to jump off a ambulance” bridge” From now on, I‟ll call you „An Ambulance‟. OK? I found 14 bridges nearby:41 Software Group
  42. 42. Sixth-Sense42 Software Group
  43. 43. 43 Software Group
  44. 44. 44 Software Group
  45. 45. Brain Control45 Software Group
  46. 46. 46 Software Group
  47. 47. 47 Software Group
  48. 48. 48 Software Group
  49. 49. 49 Software Group
  50. 50. 50 Software Group
  51. 51. The intrumented human • Compass • Camera • Mike/earphones • Heads up display• Bluetooth Personal Area Network• 3G/WiFi Wide Area Network • Pulse, temp• GPS monitor• Storage • Silent alarms • Pedometer, sleep monitoring 51 Software Group
  52. 52. All this requires But what else are they and generates good for? huge data sets52 Software Group
  53. 53. The data Companies want to “exhaust” itself generate competitive generates new advantage through opportunites “Big Data analytics”53 Software Group
  54. 54. Big Data Analytics Machine Collective Learning Intelligence Programs that Programs that use evolve with inputs from “crowds‟ “experience” to seem intelligent Predictive Analytics Programs that extrapolate from existing data into the future54 Software Group
  55. 55. 55 Software Group
  56. 56. 56 Software Group
  57. 57. 57 Software Group
  58. 58. 58 Software Group
  59. 59. 59 Software Group
  60. 60. 60 Software Group
  61. 61. 61 Software Group
  62. 62. 62 Software Group
  63. 63. 63 Software Group
  64. 64. 64 Software Group
  65. 65. 65 Software Group
  66. 66. Search Optimization Advertising Recommendation • Targeting Systems • Tailoring Security Game Collective • Vulnerability optimization Intelligence • Penetration Detection Medical • Risk analysis • Diagnosis Fraud Detection • Prognosis Predictive Analytics • Churn • Defaults66 Software Group
  67. 67. Collective Intelligence ? beats Artificial Intelligence67 Software Group
  68. 68. 68 Software Group
  69. 69. 69 Software Group
  70. 70. 70 Software Group
  71. 71. 71 Software Group
  72. 72. 72 Software Group
  73. 73. For the last 40 years AI has been consistently disappointing73 Software Group
  74. 74. 74 Software Group
  75. 75. 75 Software Group
  76. 76. In 2011 AI made a comeback76 Software Group
  77. 77. 77 Software Group
  78. 78. 78 Software Group
  79. 79. 79 Software Group
  80. 80. 80 Software Group
  81. 81. 81 Software Group
  82. 82. 82 Software Group
  83. 83. 83 Software Group
  84. 84. Google: Pioneers of Big Data84 Software Group
  85. 85. 85 Software Group
  86. 86. 86 Software Group
  87. 87. 87 Software Group
  88. 88. 88 Software Group
  89. 89. Google Software Architecture Google Applications Map Reduce Chubby BigTable Google File System (GFS)89 Software Group
  90. 90. Map Reduce MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP START MAP REDUCE MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP90 Software Group
  91. 91. Multi-stage Map-Reduce SORT AGGREGATE SCAN MAPPER MAPPER MAPPER MAPPER MAPPER MAPPERCLIENT REDUCE HDFS MAPPER MAPPER MAPPER MAPPER MAPPER MAPPER91 Software Group
  92. 92. Schema on Read vs Schema on Write92 Software Group
  93. 93. Schema on Read vs Schema on Write Schema on Write Code Analyse Transform Load Utilize Extract DataData Cleanse Aggregate Warehouse Norma lize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse93 Software Group
  94. 94. Hadoop: Open Source Map- Reduce Stack94 Software Group
  95. 95. Hadoop at Yahoo Yahoo! Hadoop cluster: 4000 nodes 16PB disk 64 TB of RAM 32,000 Cores95 Software Group
  96. 96. 96 Software Group
  97. 97. MAP REDUCE HADOOP CLIENT(DISTRIBUTED (JAVA, PIG, HIVE)PROCESSING) Hadoop 1.0 HDFS Architecture (DISTRIBUTED STORAGE)JOB TRACKER NAME NODE SECONDARY NAME NODE DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER DATA NODE DATA NODE DATA NODETASK TRACKER TASK TRACKER TASK TRACKER97 Software Group
  98. 98. Oozie (Workflow manager) Hive Pig SQOOP Flume (Query) (Scripting) (RDBMS loader) (Log Loader) ZooKeeper Hbase Hadoop Map Reduce (Locking) (Database) Hadoop File System (HDFS)98 Software Group
  99. 99. HBaseA Real time database built on Hadoop Log MemStore Buffer Cache Buffer Table Table Table Table Datafiles Redo HFile HFile WA Log ASM HDFS Disks Disks 99 Software Group
  100. 100. Hbase Data ModelName Site Counter NameId Name SiteId SiteNameDick Ebay 507,018 1 Dick 1 EbayDick Google 690,414 2 Jane 2 GoogleJane Google 716,426 3 FacebookDick Facebook 723,649 4 ILoveLarry.comJane Facebook 643,261 5 MadBillFans.comJane ILoveLarry.com 856,767Dick MadBillFans.com 675,230 NameId SiteId Counter 1 1 507,018 1 3 690,414 2 3 716,426 1 3 723,649 2 3 643,261 2 4 856,767 1 5 675,230Id Name Ebay Google Facebook (other columns) MadBillFans.com 1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230Id Name Google Facebook (other columns) ILoveLarry.com 2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767 100 Software Group
  101. 101. Hive101 Software Group
  102. 102. 102 Software Group
  103. 103. SQL JAVA RESULTS103 Software Group
  104. 104. Other SQL-like Hadoop Interfaces• Cloudera Impala• MapR Drill• Aster• Greenplumb (Pivotal HD)• Paraccel• Hadapt• Oracle SQL Connector for Hadoop (External Table interface to HDFS)104 Software Group
  105. 105. Pig105 Software Group
  106. 106. Pig Latin SQL or Hive QL106 Software Group
  107. 107. Meanwhile, back at the Deathstar…107 Software Group
  108. 108. 108 Software Group
  109. 109. 109 Software Group
  110. 110. Oracle ExadataDatabase servers Storage Servers 64 cores, 576 GB 112 cores, RAM 100 TB SAS or 336 TB SATA plus 5 TB SSD 110 Software Group
  111. 111. Oracle Big Data Appliance  18 Sun X4270 M2 servers − 48GB RAM per node (864GB total) − 2x6 Core CPU per node (216 total) − 12x2TB HDD per node (216 spindles, 864 TB) − 40Gb/s Infiniband between nodes − 10Gb/s Ethernet to datacentre  Competitive Pricing www.oracle.com/us/bigdata/index.html114 Software Group
  112. 112. Big Data Appliance Software• Cloudera Enterprise• Oracle Enterprise R• Oracle NoSQL• Oracle Big Data Connectors115 Software Group
  113. 113. Latency ORACLE ORACLE ORACLE BIG DATA EXALOGIC EXALYTICS APPLIANCE ORACLE WEBLOGIC ORACLE ORACLE NOSQL ESSBASE ORACLE ORACLE EXADATA LOADER FOR HADOOP APACHE HADOOP ORACLE ORACLE RDBMS TIMES TEN Storage Costs116 Software Group
  114. 114. The following week at the Borg collective….117 Software Group
  115. 115. © 2012 Quest Software Inc. All rights reserved. 118 Pg. 118
  116. 116. 119 Software Group
  117. 117. Integrating Hadoop and RDBMS120 Software Group
  118. 118. Scenario #1: Reference data in RDBMS PRODUCTS CUSTOMERS HDFS WEBlOGS RDBMS121 Software Group
  119. 119. Scenario #2: Hadoop for off-line analytics PRODUCTS CUSTOMERS HDFS SALES HISTORY RDBMS122 Software Group
  120. 120. Scenario #3: MapReduce output to RDBMS DB QUERY TOOL WEBLOGS SUMMARY HDFS WEBLOGS RDBMS123 Software Group
  121. 121. Scenario #4: Hadoop as RDBMS “activearchive” QUERY TOOL SALES 2011 SALES 2010 SALES 2009 SALES 2009 SALES 2008 SALES 2008 HDFS RDBMS124 Software Group
  122. 122. The Big Data Stack125 Software Group
  123. 123. DATA SCIENTISTCASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS126 Software Group
  124. 124. 127 Software Group
  125. 125. DATA SCIENTIST BIG DATA ANALYTICS SOFTWARECASCADING R (ET AL) JAVA API PIG MAHOUT JAVA API HIVE MAP-REDUCE HBASE HDFS128 Software Group
  126. 126. INDEXING SENTIMENT AND ANALYSIS SEARCH VISUALIZATION BASKET ANALYSIS RECOMMENDERS COLLECTIVE BIG DATA CLUSTERING INTELLIGENCE ANALYTICS PREDICTIVE ANALYTICS CLASSIFICATION EXPERT SYSTEMS MACHINE (LIKE WATSON) LEARNING OPTIMIZATION129 Software Group
  127. 127. In Summary….130 Software Group
  128. 128. Hadoop is….131 Software Group
  129. 129. Proven at Scale133 Software Group
  130. 130. A platform for Advanced analytics134 Software Group
  131. 131. ETL Free Schema on Write Code Analyse Extract Transform Load UtilizeData Clean Aggre Data se gate Warehouse Norm alize Schema on Read Code AnalyseData Load Utilize Hadoop Cleanse135 Software Group
  132. 132. The most concrete technology enabling the Big Data revolution136 Software Group
  133. 133. Hadoop is not….137 Software Group
  134. 134. But future Enterprise A replacement Data Architectures for RDBMS will likely incorporate Hadoop side by side with RDBMS138 Software Group
  135. 135. Though OLTP systems Suitable for can be built with OLTP Hadoop-compatible NoSQL systems such as HBase and Cassandra139 Software Group
  136. 136. Hadoop alone only A complete solves the storage solution challenge of Big Data140 Software Group
  137. 137. Shameless plugs141 Software Group
  138. 138. Toad for Cloud Databases142 Software Group
  139. 139. Toad BI SuiteBusiness Intelligencesolutions with first classsupport forHadoop, Oracle andmany other platforms143 Software Group
  140. 140. Kitenga Analytics Suite144 Software Group
  141. 141. SharePlex® for Hadoop JMS Queue Hadoop Poster HBase Real Time replication Change Data Batched Capture HDFS File Copy Audit / Change Redo-logs Data145 Software Group
  142. 142. Toad for Hadoop Hive Query IDE Oracle <-> Hadoop data management Basic Hadoop administration Beta June146 Software Group
  143. 143. 147 Software Group
  144. 144. THANK YOUGuy_harrison@dell.com@guyharrisonguyharrison.net
  145. 145. THANK YOUGuy_harrison@dell.com @guyharrison guyharrison.net

×