SlideShare a Scribd company logo
1 of 32
Copyright © 2013 Splunk Inc.




Big Data at the
Speed of Business

Raanan Dagan,          Big Data PM, Splunk

Maciej Jagiellowicz,        Monitoring and Response Senior
Specialist , Allegro
What We’ll Talk About

•   What is Splunk?
•   Real-Time Monitoring and Alerts at Allegro
•   Integration Platform with Splunk Applications
•   Archiving Big Data at Allegro
•   Q&A
• Company (NASDAQ: SPLK)                                          • Online transaction platform
  – Founded 2004, first software                                  • Was formed in 1999
    release in 2006                                               • E-commerce leader in
  – HQ: San Francisco, CA                                             Central and Eastern Europe,
• 5,200+ Enterprise Customers                                         a group of companies
• #1 Big Data Innovator*                                              managing 129 platforms in
                                                                      over 23 countries
• #1 Big Data – Pure Play Vendor**
                                                                  •   More then 12.5 million users
  * Fast Company's Most Innovative Companies Issue (March 2013)
  ** Forbes/Wikibon (Feb 2013)                                    •   Web site: allegro.pl
Big Data Comes from Machines
        Volume | Velocity | Variety | Variability



 Machine-generated data is one of the                              GPS,
                                                                  RFID,
    fastest growing, most complex                            Hypervisor,
and most valuable segments of big data                     Web Servers,
                                                       Email, Messaging
                                                  Clickstreams, Mobile,
                                             Telephony, IVR, Databases,
                                          Sensors, Telematics, Storage,
                                   Servers, Security Devices, Desktops
What Does Machine Data Look Like?
  Sources


Order Processing



  Middleware
     Error




    Care IVR




    Twitter
Machine Data Contains Critical Insights
  Sources
                                      Customer ID                   Order ID           Product ID

Order Processing
                                                         Order ID               Customer ID


  Middleware
     Error
                   Time Waiting On Hold

                                    Customer ID
    Care IVR

                                                    Twitter ID                 Customer’s Tweet


    Twitter
                     Company’s Twitter ID
Splunk: The Platform for Machine Data
 Machine Data                  Operational Intelligence



                               Insight and Visualizations
                                     for Executives


                                  Statistical Analysis


                                 Proactive Monitoring

                Splunk Index
                                Search and Investigation
Serves Needs Across IT and Business
      IT Operations Management                                                     Web Intelligence


                     Application Management                          Business Analytics


                                           Security and Compliance



Customer                                                                                              LOB Owners/
 Support                                                                                               Executives

      Operations                                                                          Website/Business
        Teams                                                                                Analysts

                     System        Application                                     IT
                                                  Security      Auditors
                   Administrator   Developers                                  Executives
                                                  Analysts
                                                     8
Splunk for Real-Time Monitoring and
               Alerts
Why do we like Splunk …

•   Meets strategic needs across IT
•   Scales from laptop to datacenter to cloud
•   For all types of users
•   Users want to use it
Where do we use Splunk
• Real time monitoring
    - Web servers
    - App servers
    - Active Directory
    - Security devices
• Post incident log analyze
    - Historical data analyze
• Application debugging
    - Real time log analyze
Splunk Architecture
•   Concurrent Users = 250
•   Search Heads = 5
•   Indexers = 2
•   Forwarders = 1500

• Total Data Processed
  Per Day = 100GB
Visualizing Real-Time Data in Splunk
Real time monitoring:
• Transactions with financial
   institutions and banks
• Monitoring of key referrals to
   allegro.pl web site
• Monitoring of applications JMS
   queues
• Top areas of application errors
• Business transactions
• Monitoring of SMS and mobile
   devices communications
Key Functions

•   Searching and Reporting (Search Head)

•   Indexing and Search Services (Indexer)

•   Local and Distributed Management (Deployment Server)

•   Data Collection and Forwarding (Forwarder)

    A Splunk install can be one or all roles…
Splunk Components and Scalability
                                                                                 •   Distributed analysis
                                                                                 •   Automatic load balancing
                                                                                     linearly scales indexing
                                Search Heads
                                                                                 •   Role-based security
                                          Offload search load to Splunk Search Heads


         Indexers



                  Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day
Forwarders



  Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
Splunk Real-time Analytics



Data

       Monitor Input    Parsing Pipeline                    Real-time
                        • Source, event typing               Search

       TCP/UDP Input    • Character set
                          normalization
                        • Line breaking
       Scripted Input                                                   Splunk
                        • Timestamp identification    Raw data           Index
                                                     Index Files
Splunk Delivers Big Data in Days or Weeks

  Product-based                   Real-time                  Performance
     Solution                     Platform                     at scale

  Easy to download and        Collects data from tens of    Proven at multi-terabyte
  deploy                      thousands of sources          scale per day
  Pre-integrated, end-to-     Advanced real-time and        Upwards of PB under
  end functionality           historical analysis of data   management
  Enterprise-grade features   Fast, custom visualizations   Thousands of enterprise
                              for IT and business users     customers
Do You Hadoop?
Splunk: A Platform for Big Data Integration

                                                                                    Splunk Dev Platform
                          Ad hoc     Monitor     Report      Custom     Developer
                          search     and alert     and     dashboards    Platform   • API and SDKs to build
                                                 analyze                              Big Data apps




Splunk DB Connect                                                                   Splunk Hadoop Connect
• Real-time integration                                                             • Reliable bi-directional
  to relational DBs                                                                   integration to Hadoop



                                   SQL


                                                 19
Splunk Hadoop Connect




Delivers reliable integration between Splunk and Hadoop
Splunk DB Connect
Reliable, scalable, real-time
integration between Splunk and
traditional relational databases                      Java Bridge Server
                                           Database       Connection    Database
  Enrich search results with additional     Lookup          Pooling      Query
  business context                                              JDBC
  Easily import data into Splunk for
  deeper analysis
  Integrate multiple DBs concurrently       Oracle      Microsoft SQL        Other
                                           Database        Server          Databases
  Simple set-up, non-evasive and secure

                                      21
Splunk Developer Platform
                    1                  2                   3
               Accelerate      Integrate with IT   Build Real- me Data
               Dev & Test      Infrastructure      Applica ons


                     Developer Platform (REST API, SDKs)




Enables enterprise developers to extend the power of Splunk Enterprise with
             robust API and Java, JavaScript and Python SDKs
Splunk Hadoop Monitoring
Splunk HadoopOps                                                                             Splunk HadoopOps
Forwarder Package on every                                                                Dashboards, alerts and notifications,
           host                                                                               powered by Splunk search

                                Add       Collect &   Distributed   Monitor     Rich UI
                             Knowledge   Index Data     Search      & Alert   Framewor
                                                                                   k




         Host




       Operating
        System

     Infrastructure
Archiving Big Data in Hadoop
Hadoop Components

•   Hive
•   Flume
•   Mahout
•   MapReduce
Hadoop Cluster
Why and Where do we Use Hadoop

• Big Data archive
• Web services statistics
• Mail flow statistics
Where we do not use Hadoop

•   Not for Visualization
•   Not for Analytics
•   Not for Real-time
•   Not for Access Control
Where we are today and where do we
       want to be tomorrow
Splunk 5,200+ Licensed Customers


Cloud and Online Services   Education    Energy and Utilities   Financial Services and Insurance




      Government            Healthcare     Manufacturing                    Media




         Retail             Technology   Telecommunications           Travel and Leisure
Splunk Big Data Platform


Product-based    Real-time          Performance
   solution      Platform             at scale




                             Visit Splunk Booth
Copyright © 2013 Splunk Inc.




Thank You
splunk.com/bigdata

More Related Content

What's hot

Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected World
DataWorks Summit
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
DataWorks Summit
 

What's hot (20)

IT @ Intel: Preparing the Future Enterprise with the Internet of Things
IT @ Intel: Preparing the Future Enterprise with the Internet of ThingsIT @ Intel: Preparing the Future Enterprise with the Internet of Things
IT @ Intel: Preparing the Future Enterprise with the Internet of Things
 
Splunk for IT Operations and IT Service Intelligence
Splunk for IT Operations and IT Service IntelligenceSplunk for IT Operations and IT Service Intelligence
Splunk for IT Operations and IT Service Intelligence
 
Big Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of AnalyticsBig Data Expo 2015 - Pentaho The Future of Analytics
Big Data Expo 2015 - Pentaho The Future of Analytics
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
 
SAP Sybase IQ Sunumu-Sybase Türkiye
SAP Sybase IQ Sunumu-Sybase TürkiyeSAP Sybase IQ Sunumu-Sybase Türkiye
SAP Sybase IQ Sunumu-Sybase Türkiye
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Record manager 8.0 presentation
Record manager 8.0  presentationRecord manager 8.0  presentation
Record manager 8.0 presentation
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Monetizing Big Data with Streaming Analytics for Telecoms Service Providers
Monetizing Big Data with Streaming Analytics for Telecoms Service ProvidersMonetizing Big Data with Streaming Analytics for Telecoms Service Providers
Monetizing Big Data with Streaming Analytics for Telecoms Service Providers
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected World
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for Industries
 
Datumize Deck 2019
Datumize Deck 2019 Datumize Deck 2019
Datumize Deck 2019
 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
 
AIOps: Anomalies Detection of Distributed Traces
AIOps: Anomalies Detection of Distributed TracesAIOps: Anomalies Detection of Distributed Traces
AIOps: Anomalies Detection of Distributed Traces
 

Similar to Implementing Big Data at the Speed of Business

Splunk live london_grs
Splunk live london_grsSplunk live london_grs
Splunk live london_grs
jenny_splunk
 

Similar to Implementing Big Data at the Speed of Business (20)

You Can't Protect What you Can't See. AWS Security Best Practices - Session S...
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...You Can't Protect What you Can't See. AWS Security Best Practices - Session S...
You Can't Protect What you Can't See. AWS Security Best Practices - Session S...
 
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
 
Client & Virtual User Experience Monitoring mit Splunk
Client & Virtual User Experience Monitoring mit SplunkClient & Virtual User Experience Monitoring mit Splunk
Client & Virtual User Experience Monitoring mit Splunk
 
Client & Virtual User Experience Monitoring mit Splunk
Client & Virtual User Experience Monitoring mit SplunkClient & Virtual User Experience Monitoring mit Splunk
Client & Virtual User Experience Monitoring mit Splunk
 
Splunk introduction
Splunk introductionSplunk introduction
Splunk introduction
 
Splunk company overview april. 2015
Splunk company overview   april. 2015Splunk company overview   april. 2015
Splunk company overview april. 2015
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT Breakout
 
Splunk live london_grs
Splunk live london_grsSplunk live london_grs
Splunk live london_grs
 
SplunkLive! Nashville - Splunk for ITOps
SplunkLive! Nashville - Splunk for ITOps SplunkLive! Nashville - Splunk for ITOps
SplunkLive! Nashville - Splunk for ITOps
 
SplunkLive! - Splunk for IT Operations
SplunkLive! - Splunk for IT OperationsSplunkLive! - Splunk for IT Operations
SplunkLive! - Splunk for IT Operations
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
Splunk for ITOps
Splunk for ITOpsSplunk for ITOps
Splunk for ITOps
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-Boarding
 
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
Elevate your Splunk Deployment by Better Understanding your Value Breakfast S...
 
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
Splunk MINT for Mobile Intelligence and Splunk App for Stream for Enhanced Op...
 
Splunk for IT Operations
Splunk for IT OperationsSplunk for IT Operations
Splunk for IT Operations
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
SplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary sessionSplunkLive! Paris 2016 - Plenary session
SplunkLive! Paris 2016 - Plenary session
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Secure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & IntelSecure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & Intel
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Implementing Big Data at the Speed of Business

  • 1. Copyright © 2013 Splunk Inc. Big Data at the Speed of Business Raanan Dagan, Big Data PM, Splunk Maciej Jagiellowicz, Monitoring and Response Senior Specialist , Allegro
  • 2. What We’ll Talk About • What is Splunk? • Real-Time Monitoring and Alerts at Allegro • Integration Platform with Splunk Applications • Archiving Big Data at Allegro • Q&A
  • 3. • Company (NASDAQ: SPLK) • Online transaction platform – Founded 2004, first software • Was formed in 1999 release in 2006 • E-commerce leader in – HQ: San Francisco, CA Central and Eastern Europe, • 5,200+ Enterprise Customers a group of companies • #1 Big Data Innovator* managing 129 platforms in over 23 countries • #1 Big Data – Pure Play Vendor** • More then 12.5 million users * Fast Company's Most Innovative Companies Issue (March 2013) ** Forbes/Wikibon (Feb 2013) • Web site: allegro.pl
  • 4. Big Data Comes from Machines Volume | Velocity | Variety | Variability Machine-generated data is one of the GPS, RFID, fastest growing, most complex Hypervisor, and most valuable segments of big data Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops
  • 5. What Does Machine Data Look Like? Sources Order Processing Middleware Error Care IVR Twitter
  • 6. Machine Data Contains Critical Insights Sources Customer ID Order ID Product ID Order Processing Order ID Customer ID Middleware Error Time Waiting On Hold Customer ID Care IVR Twitter ID Customer’s Tweet Twitter Company’s Twitter ID
  • 7. Splunk: The Platform for Machine Data Machine Data Operational Intelligence Insight and Visualizations for Executives Statistical Analysis Proactive Monitoring Splunk Index Search and Investigation
  • 8. Serves Needs Across IT and Business IT Operations Management Web Intelligence Application Management Business Analytics Security and Compliance Customer LOB Owners/ Support Executives Operations Website/Business Teams Analysts System Application IT Security Auditors Administrator Developers Executives Analysts 8
  • 9. Splunk for Real-Time Monitoring and Alerts
  • 10. Why do we like Splunk … • Meets strategic needs across IT • Scales from laptop to datacenter to cloud • For all types of users • Users want to use it
  • 11. Where do we use Splunk • Real time monitoring - Web servers - App servers - Active Directory - Security devices • Post incident log analyze - Historical data analyze • Application debugging - Real time log analyze
  • 12. Splunk Architecture • Concurrent Users = 250 • Search Heads = 5 • Indexers = 2 • Forwarders = 1500 • Total Data Processed Per Day = 100GB
  • 13. Visualizing Real-Time Data in Splunk Real time monitoring: • Transactions with financial institutions and banks • Monitoring of key referrals to allegro.pl web site • Monitoring of applications JMS queues • Top areas of application errors • Business transactions • Monitoring of SMS and mobile devices communications
  • 14. Key Functions • Searching and Reporting (Search Head) • Indexing and Search Services (Indexer) • Local and Distributed Management (Deployment Server) • Data Collection and Forwarding (Forwarder) A Splunk install can be one or all roles…
  • 15. Splunk Components and Scalability • Distributed analysis • Automatic load balancing linearly scales indexing Search Heads • Role-based security Offload search load to Splunk Search Heads Indexers Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day Forwarders Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
  • 16. Splunk Real-time Analytics Data Monitor Input Parsing Pipeline Real-time • Source, event typing Search TCP/UDP Input • Character set normalization • Line breaking Scripted Input Splunk • Timestamp identification Raw data Index Index Files
  • 17. Splunk Delivers Big Data in Days or Weeks Product-based Real-time Performance Solution Platform at scale Easy to download and Collects data from tens of Proven at multi-terabyte deploy thousands of sources scale per day Pre-integrated, end-to- Advanced real-time and Upwards of PB under end functionality historical analysis of data management Enterprise-grade features Fast, custom visualizations Thousands of enterprise for IT and business users customers
  • 19. Splunk: A Platform for Big Data Integration Splunk Dev Platform Ad hoc Monitor Report Custom Developer search and alert and dashboards Platform • API and SDKs to build analyze Big Data apps Splunk DB Connect Splunk Hadoop Connect • Real-time integration • Reliable bi-directional to relational DBs integration to Hadoop SQL 19
  • 20. Splunk Hadoop Connect Delivers reliable integration between Splunk and Hadoop
  • 21. Splunk DB Connect Reliable, scalable, real-time integration between Splunk and traditional relational databases Java Bridge Server Database Connection Database Enrich search results with additional Lookup Pooling Query business context JDBC Easily import data into Splunk for deeper analysis Integrate multiple DBs concurrently Oracle Microsoft SQL Other Database Server Databases Simple set-up, non-evasive and secure 21
  • 22. Splunk Developer Platform 1 2 3 Accelerate Integrate with IT Build Real- me Data Dev & Test Infrastructure Applica ons Developer Platform (REST API, SDKs) Enables enterprise developers to extend the power of Splunk Enterprise with robust API and Java, JavaScript and Python SDKs
  • 23. Splunk Hadoop Monitoring Splunk HadoopOps Splunk HadoopOps Forwarder Package on every Dashboards, alerts and notifications, host powered by Splunk search Add Collect & Distributed Monitor Rich UI Knowledge Index Data Search & Alert Framewor k Host Operating System Infrastructure
  • 24. Archiving Big Data in Hadoop
  • 25. Hadoop Components • Hive • Flume • Mahout • MapReduce
  • 27. Why and Where do we Use Hadoop • Big Data archive • Web services statistics • Mail flow statistics
  • 28. Where we do not use Hadoop • Not for Visualization • Not for Analytics • Not for Real-time • Not for Access Control
  • 29. Where we are today and where do we want to be tomorrow
  • 30. Splunk 5,200+ Licensed Customers Cloud and Online Services Education Energy and Utilities Financial Services and Insurance Government Healthcare Manufacturing Media Retail Technology Telecommunications Travel and Leisure
  • 31. Splunk Big Data Platform Product-based Real-time Performance solution Platform at scale Visit Splunk Booth
  • 32. Copyright © 2013 Splunk Inc. Thank You splunk.com/bigdata

Editor's Notes

  1. let’s examine for a second, one of the fastest growing, most complex and most valuable segments of big data – machine data. All the webservers, applications, network devices – all of the technology infrastructure running your enterprise – generates massive streams of data, in an array of unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner. Why is this “machine data” valuable? Because it contains a trace - a categorical record - of user behavior, cyber-security risks, application behavior, service levels, fraudulent activity and customer experience.For Splunk the last two Vs are very important. Variety of data + Variability of data (change in format. For example, new fields are added to the log file)
  2. Why is this “machine data” valuable? Because it contains a trace - a categorical record - of user behavior, cyber-security risks, application behavior, service levels, fraudulent activity and customer experience.Order Processing = Order of a productMiddleware Error = WebLogic Application Server errorCare IVR = Telephone call to complain about the errorTwitter = Comments on the bad experienceThis information is very hard and time consuming effort to parse the data for a database consumption. The reason it is very hard to normalize this data is because of the last two Vs = Variety of data + Variability of data (change in format. For example, new fields are added to the log file)
  3. Example of a Customer ID that Splunk can correlate between the:Order Processing -> Application Server Error -> Customer calling to complain about the issue -> Twitter record that the customer gave up on waiting
  4. Splunk is the platform for machine data.Optimized for real-time, low latency and interactivitySplunk is the platform for machine data.It reliably collects and indexes all the streaming data from IT systems and technology devices in real-time - tens of thousands of sources in unpredictable formats and types.The Splunk platform indexes the data, making it available for searching, monitoring, analysis and visualizations.It enables you to interact with your data. Gain operational intelligence from your data.1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisions
  5. Both IT and business professionals can analyze machine data to get real-time visibility and operational intelligence.With our data engine and our customers' machine data, organizations can meaningfully improve their performance in a wide range of areas e.g. meet service levels, reduce costs, mitigate security risks, maintain compliance and gain insights.
  6. Splunk can be divided into four logical functions. First, from the bottom up, is forwarding. Splunk forwarders come in two packages; the full Splunk distribution or a dedicated “Universal Forwarder”. The full Splunk distribution can be configured to filter data before transmitting, execute scripts locally, or run SplunkWeb. This gives you several options depending on the footprint size your endpoints can tolerate. The universal forwarder is an ultra-lightweight agent designed to collect data in the smallest possible footprint. Both flavors of forwarder come with automatic load balancing, SSL encryption and data compression, and the ability to route data to multiple Splunk instances or third party systems. To manage your distributed Splunk environment, there is the Deployment Server. Deployment server helps you synchronize the configuration of your search heads during distributed searching, as well as your forwarders to centrally manage your distributed data collection. Of course, Splunk has a simple flat-file configuration system, so feel free to use your own config management tools if your more comfortable with what you already have. The core of the Splunk infrastructure is indexing. An indexer does two things – it accepts and processes new data, adding it to the index and compressing it on disk. The indexer also services search requests, looking through the data it has via it’s indices and returning the appropriate results to the searcher over a compressed communication channel. Indexers scale out almost limitlessly and with almost no degradation in overall performance, allowing Splunk to scale from single-instance small deployments to truly massive Big Data challenges. Finally, the Splunk most users see is the search head. This is the webserver and app interpreting engine that provides the primary, web-based user interface. Since most of the data interpretation happens as-needed at search time, the role of the search head is to translate user and app requests into actionable searches for it’s indexer(s) and display the results. The Splunk web UI is highly customizable, either through our own view and app system, or by embedding Splunk searches in your own web apps via includes or our API.
  7. Splunk uses commodity servers to scale. Splunk customers use the product to harness multiple TB of data per day. 1000s of Forwarders -> Indexers <- Search heads support hundreds or thousands of users all accessing the data
  8. Open Source software, such as Hadoop and Cassandra, require 6 months+ development cycles and specialized development resources.
  9. Splunk DB Connect enables you to enrich and combine machine data with database data. Easily configure database queries and lookups in minutes via the Splunk Enterprise user interface and conduct connection pooling as well as flexible search commands to query database tables.
  10. The Splunk App for HadoopOps provides several specialized features to monitor Hadoop:Monitoring Nodes on cluster – Display a complete view of all of the servers in the cluster. The monitoring allows Hadoop administrator a view into the health of the cluster, track disk usage, CPU, and RAM from one single view rather then opening multiple consoles for information. Cluster visualization can display a rack or a node specific failure.Monitoring MapReduce jobs – Displays information on the Map and Reduce tasks. The information here delivers real-time as well as historical statistics as to how the individual tasks are operating and how they are working together. Information gathered here is used to troubleshoot MapReduce performance issues by comparing similar jobs and drilling from JobIDs to TaskIDs. Furthermore, it correlates between used core slots and MapReduce, and pinpoint the MapReduce attempts that are using them. Monitoring Hadoop Services – Displays information about the health of the Name node, Secondary Name node, and Data node. The services explore HDFS I/O, HDFS capacity per user, HDFS size, and well as the CPU and Memory of the HDFS daemons. Information here is used for monitoring the load and capacity, which can be used to justify hardware and software acquisitions.View Hadoop Configuration – Displays information about the configuration of each node and each daemon in the Hadoop cluster. Hadoop is highly dependent on the hardware and network it uses. Therefore, any changes made to the Hadoop configurations can create service disruption. The information indexed by Splunk allows Hadoop Administrators to view configurations from HDFS, MapReduce, and the entire surrounding environment, which can lead to producing faster resolution times.Search Logs – Splunk distributed search and indexing allows for real-time display of information from all Hadoop, Linux, Database, and Network log files to further enhance the end-to-end debugging of issues.Headlines and Alerts Notifications – Splunk allows for alerts that can be trigger based on a single event as well as a group of events. Per-result Alerting allows users a granular control over the notifications received when one of the Hadoop nodes, MapReduce tasks, or HDFS daemon is failing.
  11. More than 4,800 users in over 85 countries have purchased the enterprise license of Splunk. This includes a majority of the Fortune 100. Enterprises, service providers and government agencies in 80 countries use Splunk to improve service levels, reduce IT operations costs, mitigate security risks and drive new levels of operational visibility.As they gain new visibility into their real-time and historical machine data, Splunk’s customers are finding answers and solving the most challenging issues facing IT and the business.