Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Getting Started with Big Data and Splunk

105 Aufrufe

Veröffentlicht am

A beginner's introduction to the topic of Big Data, where you find it, how to get it into Splunk, and how to search it and get insights once it is this. Take an investigative journey through my mailbox as I seek to find out which messages could be deleted to make the biggest impact on reducing its footprint before my privileges are cut off!

Veröffentlicht in: Bildung
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

Getting Started with Big Data and Splunk

  1. 1. © 2 0 1 7 S P L U N K I N C .© 2 0 1 7 S P L U N K I N C . Getting Started with Big Data and Splunk Tom Chavez | Sr. Manager, Developer Marketing tchavez@splunk.com | March 2019 | ConFoo.ca
  2. 2. © 2 0 1 7 S P L U N K I N C . During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward-looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved. Forward-Looking Statements THIS SLIDE IS REQUIRED FOR ALL 3 PARTY PRESENTATIONS.
  3. 3. © 2 0 1 7 S P L U N K I N C . What is Big Data?
  4. 4. © 2 0 1 7 S P L U N K I N C . ▶ What is Big Data? • What are its characteristics? ▶ Where do we put Big Data? ▶ Who uses Big Data? What’s the Value? ▶ What is creating Big Data? ▶ What is Splunk? ▶ How do we get Big Data into Splunk? Outline
  5. 5. © 2 0 1 7 S P L U N K I N C . ▶ “Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.” ▶ “computing data held in such large amounts that it can be difficult to process” ▶ “a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.” What is Big Data? Some Definitions
  6. 6. © 2 0 1 7 S P L U N K I N C . ▶ Volume – can be a crazy amount of data! Storage used to be a problem ▶ Velocity – can arrive slowly, or very quickly, and need action now, soon, or later. ▶ Variety – data arrives in all types of formats • Structured: database records of numeric data, CSV, spreadsheets • Unstructured: text docs, email, logs, video, audio, pictures ▶ Variability – data doesn’t always arrive at a constant rate, but can burst • Constant rate: temperature samples, system metrics, blood pressure reading • Bursts: Black Friday sales transactions, tax filings ▶ Complexity – many sources, no standard formats, but need to understand and correlate Some Characteristics of Big Data
  7. 7. © 2 0 1 7 S P L U N K I N C . Who Uses Big Data? “Every company is a data company, they just might not know it” Banking • Understand customers, boost satisfaction • Minimize risk and fraud Education • Identify at-risk students • Better system for evaluation • Support teachers and prinicipals Government • Manage utilities, run agencies • Deal with traffic congestion, prevent crime Health Care • Uncover hidden insights that improve patient care • Comply with regulations Manufacturing • Boost quality and output • Detect issues before becoming problems • Solve problems faster Banking • Understand customers, boost satisfaction • Minimize risk and fraud
  8. 8. © 2 0 1 7 S P L U N K I N C . ▶ With analysis, you can accomplish business-related tasks such as: • Predicting weather and the impact on your transportation business • Suggesting webpages for a web searcher • Promoting products to a customer browsing your website based on their past buying behavior or that of a similar buyer • Detecting fraudulent behavior by employees or buyers • Determining root cause of failures in your software systems, then predicting them before they happen What’s the Value of Big Data? “The importance is not in how much you have, but what you do with it!”
  9. 9. © 2 0 1 7 S P L U N K I N C . ▶ Human Genome: ▶ With ~7.7 billion people in the world, that’s 5.199 zettabytes • (And with compression due to >98% common DNA, we could get that down a bit!) “We” are Big Data! The 2.9 billion base pairs of the haploid human genome correspond to a maximum of about 725 megabytes of data, since very base pair can be coded by 2 bits. -Wikipedia
  10. 10. © 2 0 1 7 S P L U N K I N C . ▶ Supermarket: Every product we buy ▶ Visa: Every credit card transaction ▶ Navigation system: Everywhere we walk or drive (GPS) ▶ Airlines: Everywhere we travel ▶ Browser: Every website we visit ▶ Google: Every search we make, every page we view, every ad we see/click We Are Creating More Data Every Day!
  11. 11. © 2 0 1 7 S P L U N K I N C . ▶ Car sensor data ▶ Airplane sensors everywhere in every system • Engine data – 500Gb per engine per flight! • Analyzed and transmitted during so repairs can be scheduled and parts delivered in advance ▶ Temperature, humidity, environmental sensors ▶ Motion sensors ▶ Factory equipment and sensors ▶ RFID tags, Bluetooth, WiFi networks ▶ Disney’s MagicBand • Everywhere you walk, every time you purchase, every attraction you ride, every photo, … And the IoT data Being Created!
  12. 12. © 2 0 1 7 S P L U N K I N C . ▶ Machine Data is created at 9x the rate of application data And Then There is the Machine Data The data about the data
  13. 13. © 2 0 1 7 S P L U N K I N C . ▶ Data warehouses store vast amounts of structured data with a rigid, predefined schema that is defined before the data arrives • Requires ETL (Extract, Transform, and Load) processes to get data in • Schema is predefined (schema-on write) • Rejects data records that aren’t correctly structured • Populated periodically, maybe a nightly cycle, with batch reporting ▶ A Data lake is a repository for raw data in its natural format, usually object blobs or files • Schema is written at time of analysis (schema-on-read) • Useful for data scientists, data developers, and business analysts • Used for machine learning, predictive analytics, data discover and profiling Data Warehouse or Data Lake?
  14. 14. © 2 0 1 7 S P L U N K I N C . What is Splunk? Any Question, Any Data Embrace Data Chaos
  15. 15. © 2 0 1 7 S P L U N K I N C . ▶ “Splunk makes machine data accessible, usable, and valuable to everyone” ▶ We help you Investigate, Monitor, Analyze and Act on your data ▶ We help you find the jewels in your dark caverns of big data! It’s Spelunking! What is Splunk? A Big Data Platform
  16. 16. © 2 0 1 7 S P L U N K I N C . ▶ Get Data In, monitor for new data ▶ Analyze the data – search, sort, slice and dice, visualize, and understand ▶ Act! Set up reports, alerts, automated actions! How do you Use Splunk?
  17. 17. © 2 0 1 7 S P L U N K I N C . Splunk Home
  18. 18. © 2 0 1 7 S P L U N K I N C . Get Data In
  19. 19. © 2 0 1 7 S P L U N K I N C . Getting Data In
  20. 20. © 2 0 1 7 S P L U N K I N C . ▶ Agriculture ▶ Biology ▶ Climate+Weather ▶ ComplexNetworks ▶ ComputerNetworks ▶ DataChallenges ▶ EarthScience ▶ Economics ▶ Education ▶ Energy ▶ Finance ▶ GIS ▶ Government ▶ Healthcare ▶ ImageProcessing ▶ MachineLearning ▶ Museums ▶ NaturalLanguage ▶ Neuroscience ▶ Physics Thousands of Public Data Sources In case you don’t have your own! ▶ ProstateCancer ▶ Psychology+Cognition ▶ PublicDomains ▶ SearchEngines ▶ SocialNetworks ▶ SocialSciences ▶ Software ▶ Sports ▶ TimeSeries ▶ Transportation
  21. 21. © 2 0 1 7 S P L U N K I N C . Get Data In: Apps
  22. 22. © 2 0 1 7 S P L U N K I N C . Thousands of Apps Even Uber!
  23. 23. © 2 0 1 7 S P L U N K I N C . ▶ How many of you are “Inbox=0” types? • I’m not, and I need to clean up my mailbox! My Big Data Problem
  24. 24. © 2 0 1 7 S P L U N K I N C . Splunkbase Apps to the Rescue!
  25. 25. © 2 0 1 7 S P L U N K I N C . Demo What did I find after indexing my Mailbox?
  26. 26. © 2 0 1 7 S P L U N K I N C . A lot of Mail Messages! Click on imap to see verbose results
  27. 27. © 2 0 1 7 S P L U N K I N C . ▶ Interesting: • Date (year, month, date, day of week, time) • From • Subject • Size ▶ Not Interesting: • Index (all in same index) • Server (all in same server) • Date zone (all in Pacific) • Date_second (only 60 seconds in an hour) • Date_minute (only 60 minutes in an hour) Interesting Fields Click on imap
  28. 28. © 2 0 1 7 S P L U N K I N C . ▶ 2017 was the busiest • Or worst for deleting ▶ 2018 better at keeping clean Spread Across the Years Click on date_year field
  29. 29. © 2 0 1 7 S P L U N K I N C . ▶ Wednesday, then Thursday ▶ Tuesday, Monday, then Friday ▶ Much less on the weekends Most Popular Day of Week for Mail? Click date_wday, then Top Values
  30. 30. © 2 0 1 7 S P L U N K I N C . ▶ 18th - 21st had most mail sent And when in the month?
  31. 31. © 2 0 1 7 S P L U N K I N C . ▶ End and Start of the month had fewest messages And the most Rare?
  32. 32. © 2 0 1 7 S P L U N K I N C . ▶ Jokes from Woz ▶ Meetup ▶ USATODAY.com ▶ LinkedIn messages ▶ Quora ▶ LinkedIn updates ▶ LinkedIn digest ▶ Groupon Top Senders Click on From field
  33. 33. © 2 0 1 7 S P L U N K I N C . ▶ Odd that messages have the exact same size ▶ Or is it? What about Size? Does Top Size Matter?
  34. 34. © 2 0 1 7 S P L U N K I N C . ▶ Disney sent all of my largest emails ▶ CVS.com also sends large emails How Many of the Top Size?
  35. 35. © 2 0 1 7 S P L U N K I N C . ▶ Steve Wozniak wins! More than 1G of email But Who Has Sent the Most Bytes to Me?
  36. 36. © 2 0 1 7 S P L U N K I N C . Next Steps and Resources What do I do now?
  37. 37. © 2 0 1 7 S P L U N K I N C . ▶ Download Splunk for free at splunk.com/en_us/download.html • This gives you 500Mb of indexing every day! ▶ Get free Splunk training online: splunk.com/en_us/training.html ▶ Splunk Quick Reference Card: • splunk.com/pdfs/solution-guides/splunk-quick-reference-guide.pdf Next Steps: Get Splunk
  38. 38. © 2 0 1 7 S P L U N K I N C . ▶ Visit Splunkbase at splunkbase.splunk.com/ for thousands of apps • Search for a specific technology or browse a category • Add apps to Splunk to get data in ▶ Grab a public data set from: • github.com/awesomedata/awesome-public-datasets • kaggle.com/datasets • quora.com/Where-can-I-find-large-datasets-open-to-the-public • toolbox.google.com/datasetsearch • data.cityofchicago.org/browse Get Some Data
  39. 39. © 2 0 1 7 S P L U N K I N C . ▶ Falling in Love ▶ Finding Love ▶ Coffee ▶ Golf swing ▶ CPAP usage ▶ BBQ smoker ▶ Ironman Training ▶ Your neighborhood safety ▶ Public transportation, bike sharing ▶ Girl Scout Cookie Sales ▶ Picking your kid’s name ▶ Predict the next NFL football play ▶ Collecting live audience votes ▶ Brewing Beer ▶ Analyzing NBA basketball data ▶ Display Real-Time Gaming Data ▶ Graph and Analyze NOAA Buoy Data ▶ Analyze Energy and Water Usage ▶ Track Airline Performance Statistics ▶ Analyze Your Favorite TV Show (if your favorite show is Doctor Who) ▶ Party Dashboard Or Just Follow a Recipe! Lots of Blogs about Analyzing Data

×