Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Next Generation Data Platforms - Deon Thomas

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Big Data - Part I
Big Data - Part I
Wird geladen in …3
×

Hier ansehen

1 von 36 Anzeige

Next Generation Data Platforms - Deon Thomas

Herunterladen, um offline zu lesen

A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.

A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Next Generation Data Platforms - Deon Thomas (20)

Anzeige

Weitere von Thoughtworks (20)

Aktuellste (20)

Anzeige

Next Generation Data Platforms - Deon Thomas

  1. 1. NEXT GENERATION DATA PLATFORMS! Beyond the traditional view
  2. 2. What is Big Data ? A new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis.
  3. 3. •  Terabytes •  Records •  Tables, •  Files •  Structured •  Unstructured •  Semi-Structured •  All the above •  Batch •  Realtime •  Streams •  Near Realtime Velocity Variety Volume
  4. 4. Volume! + Velocity = Value + Variety
  5. 5. Beyond the traditional view
  6. 6. 1.  Variety!
  7. 7. Available Consistency Partition Tolerance Mongo HBase Redis CP RDBMS CA AP CouchDB Cassandra DynamoDB Riak Brewer’s CAP Theorem
  8. 8. NoSQL ●  no ACID transactions ●  sharded indexes ●  restricted Joins ●  support columnar storage! In memory DB ●  real time transactions ●  not fully geared for enterprise level data ●  variety of indexes ●  complex joins HDFS GDF HBASE Database evolution
  9. 9. Hadoop - Has become synonymous with Big Data.!
  10. 10. MapReduce! Map Shuffle Reduce
  11. 11. 2. Velocity!
  12. 12. https://www.flickr.com/photos/lopetz/3912416793/ REAL TIME BATCH
  13. 13. Real Time vs Batch(MapReduce)!
  14. 14. 15 ACHIEVING VELOCITY (Parallel computing) Shared Memory Processor 1 Shared Data Lock (Shared Data) Worker (Shared Data) Unlock (Shared Data)
  15. 15.  ACHIEVING VELOCITY (Parallel computing) Shared Data Chunk 1 Chunk 2 Chunk 3 Message Passing (E.g. MapReduce) Processor 1 Processor 2 Processor 3
  16. 16. 3. Volume!
  17. 17. 4. Value!
  18. 18. Analytics
  19. 19. Pull-based Batch Loads Enterprise Data Models Complex ETL Logic Poorly Suited to Non-Relational Data Emergent design is difficult Conventional Architectures!
  20. 20. OLAP (Online Analytical processing) SELECT SUM(s.dollar_cost), s.product_key, p.description FROM SALES_FACT s … … … GROUP BY s.product_key, p.description
  21. 21. Why Databases! ●  Transaction processing (ACID properties) ●  SQL - Indexes and queries OLAP ●  Transaction processing not needed for analytics o  Moving of data via ETL ●  Large volumes of data, indexes become irrelevant ●  Schema or Write vs Schema Read!
  22. 22. Analytics is not just SQL queries
  23. 23. There is more to analytics than you think!
  24. 24. Do we lose hope, how do we move forward!
  25. 25. 1.  Variety - How to we deal with different kinds of data ? 2.  Volume - How to we cope with large volume of data? 3.  Velocity - How do we solve realtime problems? 4.  Value - What is our value ?! Summary!
  26. 26. Lambda Architecture!
  27. 27. Lambda Architecture! HDFS (Hadoop) MapReduce HBASE (Storm) ElephantDB Solr Solr Solr
  28. 28. Real time processing + Batch since 1983!
  29. 29. Should i adopt or should i not!
  30. 30. “Change before you have to” Jack Welch
  31. 31. Questions thoughtworks.com @DuffleDoe http://deonthomas.blogspot.com we’re hiring ...

×