SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
March 2012
Apache Flume (NG)
Alexander Lorenz | Customer Operations Engineer
Overview

    • Stream data (events, not files) from clients
      to sinks
    • Clients: files, syslog, avro, …
    • Sinks: HDFS files, HBase, …
    • Configurable reliability levels
      – Best effort: “Fast and loose”
      – Guaranteed delivery: “Deliver no matter what”
    • Configurable routing / topology

2                                    Š2012
                       Cloudera, Inc. All Rights Reserved.
Architecture
    Component   Function

    Agent       The JVM running Flume. One per machine. Runs
                many sources and sinks.
    Client      Produces data in the form of events. Runs in a
                separate thread.
    Sink        Receives events from a channel. Runs in a separate
                thread.
    Channel     Connects sources to sinks (like a queue).
                Implements the reliability semantics.
    Event       A single datum; a log record, an avro object, etc.
                Normally around ~4KB.




3                                 Š2012
                    Cloudera, Inc. All Rights Reserved.
Agent

    • Runs many clients and sinks
    • Java properties-based configuration
    • Low overhead (-Xmx20m)
      – But adding RAM increases performance




4                                   Š2012
                      Cloudera, Inc. All Rights Reserved.
Sources

    • Plugin interface
    • Managed by a SourceRunner that controls
      threading and execution model (e.g. polling
      vs. event-based)
    • Included: exec, avro, syslog, …




5                                   Š2012
                      Cloudera, Inc. All Rights Reserved.
Sources
    public class MySource implements PollableSource {
      public Status process() {
        // Do something to create an Event..
        Event e = EventBuilder.withBody(…).build();
        // A channel instance is injected by Flume.
        Transaction tx = channel.getTransaction();
        tx.begin();
        try {
          channel.put(e);
          tx.commit();
        } catch (ChannelException ex) {
          tx.rollback();
          return Status.BACKOFF;
        } finally {
          tx.close();
        }
        return Status.READY;
      }
    }




6                                          Š2012
                             Cloudera, Inc. All Rights Reserved.
Channel

    •   Plugin interface
    •   Transactional
    •   Provide queuing between source / sink
    •   Provide reliability semantics
        – MemoryChannel: Basically a Java
          BlockingQueue.
        – JDBC
        – WAL


7                                     Š2012
                        Cloudera, Inc. All Rights Reserved.
Sinks

    • Plugin interface
    • Managed by a SinkRunner that controls
      threading and execution model
    • Included: HDFS files (various formats)




8                                  Š2012
                     Cloudera, Inc. All Rights Reserved.
Sinks Code
    public class MySink implements PollableSink {
      public Status process() {
        Transaction tx = channel.getTransaction();
        tx.begin();
        try {
          Event e = channel.take();
          if (e != null) {
            // …
            tx.commit();
          } else {
            return Status.BACKOFF;
          }
        } catch (ChannelException ex) {
          tx.rollback();
          return Status.BACKOFF;
        } finally {
          tx.close();
        }
        return Status.READY;
      }
    }




9                                          Š2012
                             Cloudera, Inc. All Rights Reserved.
Tiered collection

 • Send events from agents to another tier of
   agents to aggregate
 • Use an Avro sink (really just a client) to
   send events to an Avro source (really just a
   server) in another machine
 • Failover supported
 • Load balancing (soon)
 • Transactions guarantee handoff

10                               Š2012
                   Cloudera, Inc. All Rights Reserved.
Tiered collection – The handoff

 •   Agent 1: Tx begin
 •   Agent 1: Channel take event
 •   Agent 1: Sink send
 •   Agent 2: Tx begin
 •   Agent 2: Channel put
 •   Agent 2: Tx commit, respond OK
 •   Agent 1: Tx commit (or rollback)


11                                 Š2012
                     Cloudera, Inc. All Rights Reserved.
Configuration

 • done in a single file
 • identifier for flows
 • <identifier>.type.subtype.parameter.config,
   where <identifier> is the name of the agent
 • flow has a client, type, channel, sink
 • can have multiple channels in one flow




12                                Š2012
                    Cloudera, Inc. All Rights Reserved.
Simple Configuration Example
   syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = Console

syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140

syslog-agent.sources.Syslog.channels =
MemoryChannel-1
syslog-agent.sinks.Console.channel = MemoryChannel-1

syslog-agent.sinks.Console.type = logger
syslog-agent.channels.MemoryChannel-1.type = memory



13                                 Š2012
                     Cloudera, Inc. All Rights Reserved.
HDFS Configuration Example
   syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = HDFS-LAB

syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140

syslog-agent.sources.Syslog.channels = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1

syslog-agent.sinks.HDFS-LAB.type = hdfs

syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/
flumetest/'%{host}''
syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles
syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60
syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile
syslog-agent.channels.MemoryChannel-1.type = memory


 14                                     Š2012
                          Cloudera, Inc. All Rights Reserved.
Features

 •   Fan out: One source, many channels
 •   Fan in: Many sources, one channel
 •   Processors (aka: decorators)
 •   Auto-batching of events in RPCs…
 •   Multiplexing channels for datamining
 •   Avro implementation in both ways



15                                 Š2012
                     Cloudera, Inc. All Rights Reserved.
Thank You

 • Web: https://cwiki.apache.org/FLUME/
   getting-started.html
 • ML: flume-user@incubator.apache.org

 • Mail: alexander@cloudera.com
 • Blog: mapredit.blogspot.com
 • Twitter: @mapredit


16                              Š2012
                  Cloudera, Inc. All Rights Reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutesdwmclary
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFSAlexander Alten
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingRapheephan Thongkham-Uan
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
Apache Flume
Apache FlumeApache Flume
Apache FlumeGetInData
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715Cloudera, Inc.
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafkaOmid Vahdaty
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2Alexander Alten
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingStreamNative
 

Was ist angesagt? (20)

Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
Apache flume
Apache flumeApache flume
Apache flume
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 

Ähnlich wie Apache Flume (NG)

Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server InternalsPraveen Gollakota
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementMarketingArrowECS_CZ
 
How we scale DroneCi on demand
How we scale DroneCi on demandHow we scale DroneCi on demand
How we scale DroneCi on demandPatrick Jahns
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingTony Georgiev
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
signalr
signalrsignalr
signalrOwen Chen
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAkshaya Mahapatra
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and CaliforniumJulien Vermillard
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...Amazon Web Services
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceDocker, Inc.
 
Weblogic
WeblogicWeblogic
Weblogicsudeeporcl
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaYevgeniy Brikman
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Docker, Inc.
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Eran Harel
 

Ähnlich wie Apache Flume (NG) (20)

Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Play Framework
Play FrameworkPlay Framework
Play Framework
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
 
How we scale DroneCi on demand
How we scale DroneCi on demandHow we scale DroneCi on demand
How we scale DroneCi on demand
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
signalr
signalrsignalr
signalr
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
 
Hands on with CoAP and Californium
Hands on with CoAP and CaliforniumHands on with CoAP and Californium
Hands on with CoAP and Californium
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Weblogic
WeblogicWeblogic
Weblogic
 
Play Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and ScalaPlay Framework: async I/O with Java and Scala
Play Framework: async I/O with Java and Scala
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 

Mehr von Alexander Alten

Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoTAlexander Alten
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern EnterpriseAlexander Alten
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of EnergyAlexander Alten
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduceAlexander Alten
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction Alexander Alten
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Alexander Alten
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)Alexander Alten
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)Alexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache HadoopAlexander Alten
 

Mehr von Alexander Alten (10)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

KĂźrzlich hochgeladen

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

KĂźrzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Apache Flume (NG)

  • 1. March 2012 Apache Flume (NG) Alexander Lorenz | Customer Operations Engineer
  • 2. Overview • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, … • Sinks: HDFS files, HBase, … • Configurable reliability levels – Best effort: “Fast and loose” – Guaranteed delivery: “Deliver no matter what” • Configurable routing / topology 2 Š2012 Cloudera, Inc. All Rights Reserved.
  • 3. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB. 3 Š2012 Cloudera, Inc. All Rights Reserved.
  • 4. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – But adding RAM increases performance 4 Š2012 Cloudera, Inc. All Rights Reserved.
  • 5. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, … 5 Š2012 Cloudera, Inc. All Rights Reserved.
  • 6. Sources public class MySource implements PollableSource { public Status process() { // Do something to create an Event.. Event e = EventBuilder.withBody(…).build(); // A channel instance is injected by Flume. Transaction tx = channel.getTransaction(); tx.begin(); try { channel.put(e); tx.commit(); } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 6 Š2012 Cloudera, Inc. All Rights Reserved.
  • 7. Channel • Plugin interface • Transactional • Provide queuing between source / sink • Provide reliability semantics – MemoryChannel: Basically a Java BlockingQueue. – JDBC – WAL 7 Š2012 Cloudera, Inc. All Rights Reserved.
  • 8. Sinks • Plugin interface • Managed by a SinkRunner that controls threading and execution model • Included: HDFS files (various formats) 8 Š2012 Cloudera, Inc. All Rights Reserved.
  • 9. Sinks Code public class MySink implements PollableSink { public Status process() { Transaction tx = channel.getTransaction(); tx.begin(); try { Event e = channel.take(); if (e != null) { // … tx.commit(); } else { return Status.BACKOFF; } } catch (ChannelException ex) { tx.rollback(); return Status.BACKOFF; } finally { tx.close(); } return Status.READY; } } 9 Š2012 Cloudera, Inc. All Rights Reserved.
  • 10. Tiered collection • Send events from agents to another tier of agents to aggregate • Use an Avro sink (really just a client) to send events to an Avro source (really just a server) in another machine • Failover supported • Load balancing (soon) • Transactions guarantee handoff 10 Š2012 Cloudera, Inc. All Rights Reserved.
  • 11. Tiered collection – The handoff • Agent 1: Tx begin • Agent 1: Channel take event • Agent 1: Sink send • Agent 2: Tx begin • Agent 2: Channel put • Agent 2: Tx commit, respond OK • Agent 1: Tx commit (or rollback) 11 Š2012 Cloudera, Inc. All Rights Reserved.
  • 12. Configuration • done in a single file • identifier for flows • <identifier>.type.subtype.parameter.config, where <identifier> is the name of the agent • flow has a client, type, channel, sink • can have multiple channels in one flow 12 Š2012 Cloudera, Inc. All Rights Reserved.
  • 13. Simple Configuration Example syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = Console syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.Console.channel = MemoryChannel-1 syslog-agent.sinks.Console.type = logger syslog-agent.channels.MemoryChannel-1.type = memory 13 Š2012 Cloudera, Inc. All Rights Reserved.
  • 14. HDFS Configuration Example syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = HDFS-LAB syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.type = hdfs syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://NN.URI:PORT/ flumetest/'%{host}'' syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles syslog-agent.sinks.HDFS-LAB.hdfs.file.rollInterval = 60 syslog-agent.sinks.HDFS-LAB.hdfs.file.Type = SequenceFile syslog-agent.channels.MemoryChannel-1.type = memory 14 Š2012 Cloudera, Inc. All Rights Reserved.
  • 15. Features • Fan out: One source, many channels • Fan in: Many sources, one channel • Processors (aka: decorators) • Auto-batching of events in RPCs… • Multiplexing channels for datamining • Avro implementation in both ways 15 Š2012 Cloudera, Inc. All Rights Reserved.
  • 16. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit 16 Š2012 Cloudera, Inc. All Rights Reserved.