Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Lunch & Learn Intro to Big Data

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Big datatraining ranga_1
Big datatraining ranga_1
Wird geladen in …3
×

Hier ansehen

1 von 37 Anzeige

Lunch & Learn Intro to Big Data

Herunterladen, um offline zu lesen

On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.

On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (17)

Anzeige

Ähnlich wie Lunch & Learn Intro to Big Data (20)

Anzeige

Lunch & Learn Intro to Big Data

  1. 1. Simplify your Business Lunch and Learn September 25, 2015 Introduction to Big Data Devin Hopps
  2. 2. Welcome! • Introductions • Big Data Fun Quiz
  3. 3. How much data does Twitter generate in one day? 8 TB!!!
  4. 4. What is this: “sudo mv new.dat old.dat”? A Linux command to rename a file.
  5. 5. Introduction to Big Data • What is Big Data? • Big Data Technologies • Why Should YOU Care About Big Data?
  6. 6. What is Big Data?
  7. 7. What is Big Data? •Volume •Velocity •Variety Gartner Analyst Doug Laney, 2001:
  8. 8. Volume
  9. 9. Velocity
  10. 10. Variety “Unstructured” Data + Classification = Structure
  11. 11. What are the Biggest Data Sources?
  12. 12. The 4th V: Value • Big Data = Unprecedented availability and resolution of what is observed in our universe. • Distilling valuable new information from the 3 V’s is the domain of Big Data technologies.
  13. 13. Big Data Technologies •Hadoop •NoSQL •Machine Intelligence
  14. 14. A Short History: Hadoop is a top level Apache project
  15. 15. •Provides Distributed: –Storage = HDFS –Processing = Map-Reduce
  16. 16. HDFS: • Files Stored as 128 MB Replicated Blocks Fault Tolerant!
  17. 17. Map-Reduce: • Jobs are Coordinated Tasks Processed on Data Nodes Fault Tolerant!
  18. 18. vs. Supercomputer • Code is processed by data nodes. • Node failure is expected and handled. • Low cost of entry. • Flexible, near-linear scalability. Processor reads from the storage cloud. Hardened systems designed not to fail. Big initial spend. Limited capacity and diminishing ROI.
  19. 19. NoSQL
  20. 20. NoSQL • Non-RDBMS Database systems designed to address Big Data challenges. – Volume: “sharded” data, not single server file system. – Velocity: distributed processing, not centralized. – Variety: flexible entity models, not predefined.
  21. 21. NoSQL • RDBMS continues to be a powerful tool for many database use cases. – Maintaining referential integrity in normalized and semi-normalized models. – Atomic, Consistent, Isolated, and Durable (ACID) transactions. – Efficient joins (hashing and merging) across entities.
  22. 22. NoSQL • Types of NoSQL Databases: – Key-Value – Wide Column – Document – Graph
  23. 23. NoSQL: Key-Value • Simple data model consisting of only key- value pairs. • Highly flexible: no predefined structure to limit new types of keys or values.
  24. 24. NoSQL: Wide Column • Column oriented storage. • Large number of columns can be added sparsely, which increases flexibility. • Optimized for reading one column of data.
  25. 25. NoSQL: Document • Data is stored as properties that classify an entity. • Entities can have varying sets of properties, which makes it more flexible. • Similar to instantiated classes, so often considered a good fit for persisting application data.
  26. 26. NoSQL: Graph • Stores data as entities (aka, nodes with properties) and relationships (aka, edges). • New types of nodes and relationships can be defined without changing structure. • Optimized for traversing associative sets, not summarizing large amounts of data.
  27. 27. Machine Learning Arthur Samuel, 1959: "Field of study that gives computers the ability to learn without being explicitly programmed" How many pretzels? Programmed Machine Learning people weather music score lighting time of day season c + x (customers) + y (beers) = pretzels
  28. 28. Machine Learning • Supervised: The computer is presented with example inputs and outputs and learns to predict outputs from similar inputs. • Unsupervised: All data is presented as input and the computer finds structure and patterns in the input.
  29. 29. Machine Learning • Why is machine learning a Big Data technology? – It is computationally intensive and often requires massive scalability. – The explosion of unstructured and semi-structured data has potential for valuable machine learning insights.
  30. 30. Why Should YOU Care About Big Data? • It’s Cool • It’s Growing • It May Work for You
  31. 31. Why Should YOU Care About Big Data? • It’s Cool: The trajectory of Big Data technologies is a case study in innovative solution design and effective software development. – Out-of-the-Box Thinking • Inventing new designs to solve novel problems. • Many sizes fit many, not one size fits all. – Open Source • Engaging a community of contributors to optimize and add functionality to the Big Data toolset. • There is a resurgence in existing Open Source technologies (e.g., Linux, R and Python) that have been extended to Big Data problems.
  32. 32. Why Should YOU Care About Big Data? • It’s Growing: Job opportunities for Big Data developers are increasing geometrically.
  33. 33. Why Should YOU Care About Big Data? • It May Work for You: Even if you are not processing data on a massive scale, Big Data technologies may provide the best solution for your existing use cases. – Handling Variety • Many of us have used “workarounds” in pre-Big Data technologies to cope with variability in our data (e.g., SQL Server XML data type). – Managing Cost • Open source Big Data technologies may significantly reduce licensing costs for solutions we develop.
  34. 34. Simplify your Business Question & Answer
  35. 35. Simplify your Business Next Event: September 25, 2015, 12 - 1 pm Big Data Deep Dive Harit Gohel

Hinweis der Redaktion

  • unstructured/semi-structured
    example: Dominos pizza app -- ordering pizza with eye movements is distilling structure from “unstructured” images
    collect first/understand later
  • 4th D = Value
    The explosion of data has dramatically increased the resolution with which we can describe many aspects of our environment and behavior.
    Distilling these data into more novel, precise and important insights and predictions the opportunity space for Big Data technologies.
    Examples: Targeted advertising, weather prediction, image and voice recognition.
  • Hadoop
    Hadoop Timeline
  • Hadoop
    Hadoop Timeline
  • Hadoop
    Hadoop Timeline
  • Hadoop
    Hadoop Timeline
  • Hadoop
    Hadoop Timeline
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available
  • Atomicity All or nothing Consistency Transaction brings DB from one valid state to another 17 Isolation Multiple transactions can execute in parallel, but they won’t interfere with each other Durability Once a transaction is committed, data is persisted and available

×