SlideShare a Scribd company logo
1 of 27
Copyright © 2012 Splunk Inc.




Introducing Splunk –
The Big Data Engine
5th Big Data Usergroup Meeting
Zurich, 21.01.2012
Splunk – The Big Data Company
              Company (NASDAQ: SPLK)
                  Founded 2004, first software release in 2006
                  HQ: San Francisco / Region HQ: London, Hong
                  Kong
                  Over 600 employees, based in 12 countries
                  FY2012 $120 million; +83% year-over-year


              5,000+ Customers
                  Customers in over 80 countries
                  54 of the Fortune 100
                  Largest license: 100 Terabytes per day



              2
Over 3,000 Customers in 70+ Countries

Cloud and Online Services   Education        Energy and Utilities   Financial Services and Insurance




      Government            Healthcare         Manufacturing                    Media




         Retail             Technology       Telecommunications           Travel and Leisure

                                         4
Some Splunk Big Data Customers
Customer          Daily Data Volume

                        12 TB
                         6 TB
                         4 TB
                       1.2 TB
                      900 GB
                      800 GB
              5
Big Data Comes from Machines
           Volume | Velocity | Variety | Variability


                                                                      GPS,
 Machine-generated data is one of the                                RFID,
    fastest growing, most complex                               Hypervisor,
and most valuable segments of big data                        Web Servers,
                                                          Email, Messaging
                                                     Clickstreams, Mobile,
                                                Telephony, IVR, Databases,
                                             Sensors, Telematics, Storage,
                                      Servers, Security Devices, Desktops




                               6
Big Data Technologies
                                                Aster Data        Cassandra
                                                Greenplum         Hbase
                                                                  MongoDB
                                                         Hadoop




 Single      Single           RDBMS               SQL &                  NoSQL
RDBMS       Bigger           Sharding           Map/Reduce
            RDBMS
                                                                        Map / Reduce


          Relational Database (highly structured)                 Key/Value, Tables or     Temporal, Unstructured
                                                                 Other (semi-structured)      Heterogeneous
                                                                                                                    Time
                                                             7
Splunk: the Platform for Machine Data
     Innovative, Easy to Use and Powerful

                             Ad hoc   Monitor     Report and      Custom      Developer
                             search   and alert    analyze      dashboards     Platform




           Data collection
            and indexing



                                      Splunk storage           Other Big Data stores




                                8
Apps and Solutions
Application       IT                                        Web          Business
                              Security     Compliance
Monitoring    Operations                                Intelligence     Analytics



   User Interface                        APIs                     SDK



                           Core Functions
   Access          Stats/
                                     Alerts         Reports            Dashboards
  Controls        Analytics



                                   Search

                                 Indexing

                                Collection

                                                                                     9
Scales to TBs/day and Thousands of Users
  Automatic load balancing linearly scales        Distributed search and MapReduce linearly
  indexing                                        scales search and reporting




                                             10
What Does Machine Data Look Like?
  Sources

Order Processing



  Middleware
     Error




    Care IVR




    Twitter


                               11
Machine Data Contains Critical Insights
  Sources                                   Customer ID    Order ID            Product ID


Order Processing

                                                            Order ID    Customer ID
  Middleware
     Error

                   Time Waiting On Hold


    Care IVR
                                    Customer ID


                                                       Twitter ID      Customer’s Tweet


    Twitter
                     Company’s Twitter ID
                                                  12
What do we do? Collect and index Machine Data

Customer                                                                                                                 Outside the
Facing Data                                                                                                              Datacenter
  Click-stream data                                                                                                       Manufacturing,
  Shopping cart data                                                                                                      logistics…
  Online transaction data                                                                                                 CDRs & IPDRs
                                                                                                                          Power consumption
                                Logfiles   Configs Messages   Traps         Metrics   Scripts     Changes   Tickets       RFID data
                                                              Alerts                                                      GPS data




 Windows                    Linux/Unix           Virtualization              Applications             Databases           Networking
   Registry                  Configurations      & Cloud                       Web logs                 Configurations      Configurations
   Event logs                syslog                Hypervisor                  Log4J, JMS, JMX          Audit/query         syslog
   File system               File system           Guest OS, Apps              .NET events              logs                SNMP
   sysinternals              ps, iostat, top       Cloud                       Code and scripts         Tables              netflow
                                                                                                        Schemas



                                                                       13
What do we do? Collect and index Machine Data

Customer                                                                                                                     Outside the
Facing Data                                                                                                                  Datacenter
  Click-stream data                                                                                                           Manufacturing,
  Shopping cart data                                                                                                          logistics…
  Online transaction data
                                •Any amount, any location, any source.                                                        CDRs & IPDRs
                                                                                                                              Power consumption
                                Logfiles       Configs Messages   Traps         Metrics   Scripts     Changes   Tickets       RFID data
                                                                  Alerts                                                      GPS data
                                                      No upfront schema
                                                      No custom connectors
 Windows                    Linux/Unix               Virtualization
   Registry                  Configuration
                                                      No RDBMS Applications
                                                     & Cloud           Web logs
                                                                                                          Databases
                                                                                                            Configurations
                                                                                                                              Networking
                                                                                                                                Configurations
   Event logs
   File system
                             s
                             syslog
                                                      No need to filter/forward
                                                       Hypervisor      Log4J, JMS, JMX
                                                                       .NET events
                                                                                                            Audit/query
                                                                                                            logs
                                                                                                                                syslog
                                                                                                                                SNMP
                                                       Guest OS, Apps
   sysinternals              File system               Cloud                       Code and scripts         Tables              netflow
                             ps, iostat, top                                                                Schemas



                                                                           14
Inside Universal Indexing

                                          Automatic event boundary identification



Automatic timestamp normalization




 ...enable accurate searching and
 trending by time across all data:


                                     15
Inside Universal Indexing
                                      Segmentation & dense
                                      indexing of every term




     ...enable Boolean search on
    anything in the original event:




                     16
Inside Search-time Knowledge Extraction
              Automatically discovered fields
                                                     And user-defined fields




... enable statistics and precise search on
               specific fields:




                                                17
New Approach to Heterogeneous Data
  Universal Indexing      Search-time Knowledge          Flexibility and
                                                       Fast Time to Value

• No data normalization   • Knowledge applied at      • Normalization as it’s
• Automatically handles     search-time                 needed
  timestamps              • No brittle schema to      • Faster implementation
• Parsers not required      work around               • Easy search language
• Index every term &      • Multiple views into the   • Multiple views into the
  pattern “blindly”         same data                   same data
• No attempt to           • Splunk helps find
  “understand” up front     transactions, patterns
                            and trends


                                      18
Splunk Used Across IT and the Business
                                Application
                               Management

                                Operations
                               Management

                                 Security &
                                Compliance

                                 Web and
                             Business Analytics



                  19
Provides Strong Machine Data Governance
   Provides comprehensive controls for data   Single sign-on integration enables pass-
   security, retention and integrity          through authentication of user credentials




                                         20
Splunk Big Data Strategy
Deliver ease of use, real-time analytics and enterprise capabilities
                                                                         Ad hoc
                                                                         search


                                                                        Monitor
                                                                        and alert

            Data collection
                                                                       Report and
             and indexing                                               analyze


                                Splunk storage
                                                           Other
                                                                         Custom
                                                           Stores      dashboards


                                                                       Developer
                                                                        Platform

                                   21
Deploying New Technologies is a Challenge




                    22
Splunk-Hadoop: Co-existence use cases
                                                          Real-time Analytics
      Side by Side
                                                          ETL / recommendation
                                                                  system


Splunk in-front of Hadoop
                            Collect, Visualize, Report              ETL, Archival, Long Running
                                                                              Queries




   Splunk visualize and
   secure Hadoop Data
                                                         } Combine
                            Splunk Index Hadoop Data
Splunk: Enabling the Big Data Ecosystem
  Real-time       Dashboards,
Collection and      Reports,
   Analysis      Access Controls


                                        Splunk Hadoop Connect
                                        • Reliable Data Export
                                        • Import Hadoop Data
  >      >                              Splunk App for HadoopOps
 >       >                              • End-to-end monitoring,
>         >                               troubleshooting , analysis of
                                          Hadoop environment

                                   24
Splunk Hadoop Connect

               Delivers reliable integration
               between Splunk and Hadoop
                 Export events to Hadoop
                 Explore and Browse Hadoop
                 directories
                 Import and Index Hadoop data
                 into Splunk



          25
Splunk App for HadoopOps
Monitoring the full Hadoop environment – Hadoop, Switch, OS, AS, and Database

     Splunk HadoopOps                                                                             Splunk HadoopOps
     Forwarder Package on every                                                                Dashboards, alerts and notifications,
                host                 Add       Collect &   Distributed   Monitor     Rich UI       powered by Splunk search
                                  Knowledge   Index Data     Search      & Alert   Framewor
                                                                                        k




               Host




         Operating System


           Infrastructure



                                                                  26
Splunk and Big Data

Product-based                   Integrated and                Performance
   Solution                       End-to-end                    at scale
Easy to download and           Collects data from tens of    Proven at multi-terabyte
deploy                         thousands of sources          scale per day
Pre-integrated, end-to-        Advanced real-time and        Upwards of PB under
end functionality              historical analysis of data   management
Enterprise-grade features      Fast, custom visualizations   Thousands of enterprise
                               for IT and business users     customers
                               Developer API, SDKs



                                           27
Thank You

More Related Content

What's hot

Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Splunk
 
Splunk for Enterprise Security and User Behavior Analytics
 Splunk for Enterprise Security and User Behavior Analytics Splunk for Enterprise Security and User Behavior Analytics
Splunk for Enterprise Security and User Behavior AnalyticsSplunk
 
Splunk for ITOps
Splunk for ITOpsSplunk for ITOps
Splunk for ITOpsSplunk
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersSplunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersHarry McLaren
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk OverviewSplunk
 
Microsoft Defender for Endpoint
Microsoft Defender for EndpointMicrosoft Defender for Endpoint
Microsoft Defender for EndpointCheah Eng Soon
 
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...Splunk
 
SplunkLive 2011 Beginners Session
SplunkLive 2011 Beginners SessionSplunkLive 2011 Beginners Session
SplunkLive 2011 Beginners SessionSplunk
 
Splunk for IT Operations
Splunk for IT OperationsSplunk for IT Operations
Splunk for IT OperationsSplunk
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & OrchestrationSplunk
 
Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overviewAlex Fok
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Splunk
 
Splunk Enterprise 6.4
Splunk Enterprise 6.4Splunk Enterprise 6.4
Splunk Enterprise 6.4Splunk
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical OverviewDavid Lutz
 
Power of Splunk Search Processing Language (SPL) ...
Power of Splunk Search Processing Language (SPL)                             ...Power of Splunk Search Processing Language (SPL)                             ...
Power of Splunk Search Processing Language (SPL) ...Splunk
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability SessionSplunk
 
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyKushan Lahiru Perera
 

What's hot (20)

Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On) Getting Started with Splunk (Hands-On)
Getting Started with Splunk (Hands-On)
 
Splunk for Enterprise Security and User Behavior Analytics
 Splunk for Enterprise Security and User Behavior Analytics Splunk for Enterprise Security and User Behavior Analytics
Splunk for Enterprise Security and User Behavior Analytics
 
Splunk for ITOps
Splunk for ITOpsSplunk for ITOps
Splunk for ITOps
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersSplunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy Forwarders
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk Overview
 
Microsoft Defender for Endpoint
Microsoft Defender for EndpointMicrosoft Defender for Endpoint
Microsoft Defender for Endpoint
 
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
 
SplunkLive 2011 Beginners Session
SplunkLive 2011 Beginners SessionSplunkLive 2011 Beginners Session
SplunkLive 2011 Beginners Session
 
Splunk for IT Operations
Splunk for IT OperationsSplunk for IT Operations
Splunk for IT Operations
 
Security Automation & Orchestration
Security Automation & OrchestrationSecurity Automation & Orchestration
Security Automation & Orchestration
 
Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overview
 
Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?Do You Really Need to Evolve From Monitoring to Observability?
Do You Really Need to Evolve From Monitoring to Observability?
 
Splunk Enterprise 6.4
Splunk Enterprise 6.4Splunk Enterprise 6.4
Splunk Enterprise 6.4
 
dlux - Splunk Technical Overview
dlux - Splunk Technical Overviewdlux - Splunk Technical Overview
dlux - Splunk Technical Overview
 
Observability
ObservabilityObservability
Observability
 
Understanding SASE
Understanding SASE Understanding SASE
Understanding SASE
 
Power of Splunk Search Processing Language (SPL) ...
Power of Splunk Search Processing Language (SPL)                             ...Power of Splunk Search Processing Language (SPL)                             ...
Power of Splunk Search Processing Language (SPL) ...
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiency
 

Similar to Introducing Splunk – The Big Data Engine

How a Cloud Computing Provider Reached the Holy Grail of Visibility
How a Cloud Computing Provider Reached the Holy Grail of VisibilityHow a Cloud Computing Provider Reached the Holy Grail of Visibility
How a Cloud Computing Provider Reached the Holy Grail of Visibilityeladgotfrid
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Datafbeckett1
 
Experiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte ScaleExperiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte ScaleDataWorks Summit
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessDataWorks Summit
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentMicrosoft Technet France
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high levelJames Findlay
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)Apigee | Google Cloud
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionDATAVERSITY
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsJ. David Morris
 

Similar to Introducing Splunk – The Big Data Engine (20)

How a Cloud Computing Provider Reached the Holy Grail of Visibility
How a Cloud Computing Provider Reached the Holy Grail of VisibilityHow a Cloud Computing Provider Reached the Holy Grail of Visibility
How a Cloud Computing Provider Reached the Holy Grail of Visibility
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Data
 
Secure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & IntelSecure Big Data Analytics - Hadoop & Intel
Secure Big Data Analytics - Hadoop & Intel
 
Experiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte ScaleExperiences Streaming Analytics at Petabyte Scale
Experiences Streaming Analytics at Petabyte Scale
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of Business
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
IBM Big Data Platform Nov 2012
IBM Big Data Platform Nov 2012IBM Big Data Platform Nov 2012
IBM Big Data Platform Nov 2012
 
Utilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligentUtilisation du cloud dans les systèmes intelligent
Utilisation du cloud dans les systèmes intelligent
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Big data use cases
Big data use casesBig data use cases
Big data use cases
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high level
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)Big Data: Beyond the "Bigness" and the Technology (webcast)
Big Data: Beyond the "Bigness" and the Technology (webcast)
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive Analytics
 
Cetas Predictive Analytics Prezo
Cetas Predictive Analytics PrezoCetas Predictive Analytics Prezo
Cetas Predictive Analytics Prezo
 

More from Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Introducing Splunk – The Big Data Engine

  • 1. Copyright © 2012 Splunk Inc. Introducing Splunk – The Big Data Engine 5th Big Data Usergroup Meeting Zurich, 21.01.2012
  • 2. Splunk – The Big Data Company Company (NASDAQ: SPLK) Founded 2004, first software release in 2006 HQ: San Francisco / Region HQ: London, Hong Kong Over 600 employees, based in 12 countries FY2012 $120 million; +83% year-over-year 5,000+ Customers Customers in over 80 countries 54 of the Fortune 100 Largest license: 100 Terabytes per day 2
  • 3. Over 3,000 Customers in 70+ Countries Cloud and Online Services Education Energy and Utilities Financial Services and Insurance Government Healthcare Manufacturing Media Retail Technology Telecommunications Travel and Leisure 4
  • 4. Some Splunk Big Data Customers Customer Daily Data Volume 12 TB 6 TB 4 TB 1.2 TB 900 GB 800 GB 5
  • 5. Big Data Comes from Machines Volume | Velocity | Variety | Variability GPS, Machine-generated data is one of the RFID, fastest growing, most complex Hypervisor, and most valuable segments of big data Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops 6
  • 6. Big Data Technologies Aster Data Cassandra Greenplum Hbase MongoDB Hadoop Single Single RDBMS SQL & NoSQL RDBMS Bigger Sharding Map/Reduce RDBMS Map / Reduce Relational Database (highly structured) Key/Value, Tables or Temporal, Unstructured Other (semi-structured) Heterogeneous Time 7
  • 7. Splunk: the Platform for Machine Data Innovative, Easy to Use and Powerful Ad hoc Monitor Report and Custom Developer search and alert analyze dashboards Platform Data collection and indexing Splunk storage Other Big Data stores 8
  • 8. Apps and Solutions Application IT Web Business Security Compliance Monitoring Operations Intelligence Analytics User Interface APIs SDK Core Functions Access Stats/ Alerts Reports Dashboards Controls Analytics Search Indexing Collection 9
  • 9. Scales to TBs/day and Thousands of Users Automatic load balancing linearly scales Distributed search and MapReduce linearly indexing scales search and reporting 10
  • 10. What Does Machine Data Look Like? Sources Order Processing Middleware Error Care IVR Twitter 11
  • 11. Machine Data Contains Critical Insights Sources Customer ID Order ID Product ID Order Processing Order ID Customer ID Middleware Error Time Waiting On Hold Care IVR Customer ID Twitter ID Customer’s Tweet Twitter Company’s Twitter ID 12
  • 12. What do we do? Collect and index Machine Data Customer Outside the Facing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data Windows Linux/Unix Virtualization Applications Databases Networking Registry Configurations & Cloud Web logs Configurations Configurations Event logs syslog Hypervisor Log4J, JMS, JMX Audit/query syslog File system File system Guest OS, Apps .NET events logs SNMP sysinternals ps, iostat, top Cloud Code and scripts Tables netflow Schemas 13
  • 13. What do we do? Collect and index Machine Data Customer Outside the Facing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data •Any amount, any location, any source. CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data No upfront schema No custom connectors Windows Linux/Unix Virtualization Registry Configuration No RDBMS Applications & Cloud Web logs Databases Configurations Networking Configurations Event logs File system s syslog No need to filter/forward Hypervisor Log4J, JMS, JMX .NET events Audit/query logs syslog SNMP Guest OS, Apps sysinternals File system Cloud Code and scripts Tables netflow ps, iostat, top Schemas 14
  • 14. Inside Universal Indexing Automatic event boundary identification Automatic timestamp normalization ...enable accurate searching and trending by time across all data: 15
  • 15. Inside Universal Indexing Segmentation & dense indexing of every term ...enable Boolean search on anything in the original event: 16
  • 16. Inside Search-time Knowledge Extraction Automatically discovered fields And user-defined fields ... enable statistics and precise search on specific fields: 17
  • 17. New Approach to Heterogeneous Data Universal Indexing Search-time Knowledge Flexibility and Fast Time to Value • No data normalization • Knowledge applied at • Normalization as it’s • Automatically handles search-time needed timestamps • No brittle schema to • Faster implementation • Parsers not required work around • Easy search language • Index every term & • Multiple views into the • Multiple views into the pattern “blindly” same data same data • No attempt to • Splunk helps find “understand” up front transactions, patterns and trends 18
  • 18. Splunk Used Across IT and the Business Application Management Operations Management Security & Compliance Web and Business Analytics 19
  • 19. Provides Strong Machine Data Governance Provides comprehensive controls for data Single sign-on integration enables pass- security, retention and integrity through authentication of user credentials 20
  • 20. Splunk Big Data Strategy Deliver ease of use, real-time analytics and enterprise capabilities Ad hoc search Monitor and alert Data collection Report and and indexing analyze Splunk storage Other Custom Stores dashboards Developer Platform 21
  • 21. Deploying New Technologies is a Challenge 22
  • 22. Splunk-Hadoop: Co-existence use cases Real-time Analytics Side by Side ETL / recommendation system Splunk in-front of Hadoop Collect, Visualize, Report ETL, Archival, Long Running Queries Splunk visualize and secure Hadoop Data } Combine Splunk Index Hadoop Data
  • 23. Splunk: Enabling the Big Data Ecosystem Real-time Dashboards, Collection and Reports, Analysis Access Controls Splunk Hadoop Connect • Reliable Data Export • Import Hadoop Data > > Splunk App for HadoopOps > > • End-to-end monitoring, > > troubleshooting , analysis of Hadoop environment 24
  • 24. Splunk Hadoop Connect Delivers reliable integration between Splunk and Hadoop Export events to Hadoop Explore and Browse Hadoop directories Import and Index Hadoop data into Splunk 25
  • 25. Splunk App for HadoopOps Monitoring the full Hadoop environment – Hadoop, Switch, OS, AS, and Database Splunk HadoopOps Splunk HadoopOps Forwarder Package on every Dashboards, alerts and notifications, host Add Collect & Distributed Monitor Rich UI powered by Splunk search Knowledge Index Data Search & Alert Framewor k Host Operating System Infrastructure 26
  • 26. Splunk and Big Data Product-based Integrated and Performance Solution End-to-end at scale Easy to download and Collects data from tens of Proven at multi-terabyte deploy thousands of sources scale per day Pre-integrated, end-to- Advanced real-time and Upwards of PB under end functionality historical analysis of data management Enterprise-grade features Fast, custom visualizations Thousands of enterprise for IT and business users customers Developer API, SDKs 27