Flume basic

•Download as PPTX, PDF•

1 like•1,366 views

First slide 1) Apache Flume is a distributed and available service, in which it can collect and move large amount of streaming data from one location to another. 2) Most frequently it will deliver the log data into HDFS. Second slide 1) Event and Client are the logical components of flume. 2) An Event is a Singular unit of data which can be transported by Flume NG from its Source to destination. 3) Typically an Event will be composed of Zero or more headers and a body. Here the headers will be used for contextual routing. This means by using the Header definition we can rout the data to the next eligible destination. 4) Client is an Event generator. It will generate the events and send it to one or more agents. Eg: Apache webservers, which generates continuously a huge amount of log data. Third slide 1) Flume agent is a JVM Daemon service, which holds all Flume-NG components like Sources, Channels, Sinks...etc. 2) Here the Source will send the events to channel and channel will stored it, later the channel will send the events to sink. Fourth slide 1) Source is an active component, which receives data from different locations and places it on one or more Channels. 2) The declaration of source component in “.conf” file of agent “a1” is listed here. In this s1 means Source component, a1 means agent. a1.sources=s1 a1.sources.s1.type=netcat (netcat is one of the Source type) 3) There are different Source types are available like Pollable (Means Auto generating like “tail –F” command and sequencing command), event driven and Netcat. 4) Even we can write our won Source type and specify that Custom class name to source type parameter. Fifth slide 1) A channel is a bridge between Source and Sink. 2) Channel will store the Source events and send it to Sink. 3) There are three different types of Channels like memory channel which is very fast but no guarantee for data loss. And file channel which will store the events in a file system before sending it to sink. And the third one is database channel which will store the events in database. 4) Single Channel can be connected to any number of Sources and Sinks. Sixth slide 1) A sink receives events from one channel only.

Software

Agenda
• What is Flume?
• Core flume-ng Concepts.
• Flow Reliability in Flume.
• Starting an Agent.

What is Flume?
• Apache Flume is a distributed and reliable service
for efficiently collecting, aggregating, and moving
large amounts of log data from one place to
another.
• Its main goal is to deliver streaming data from
applications to Apache Hadoop's HDFS(Most
probably).

Core Concepts: Event, Client
Event:- An Event is a singular unit of data that can
be transported by Flume NG from origin to its final
destination.
An event is composed of zero or more headers
and a body, For contextual routing.
Client:- An entity that generates events and sends
them to one or more agents.
Apache web servers - which generates huge
amount of log files on daily basis.
Logging package like a log4j appender that
directly sends events to Flume NG's source

Core Concepts: agent
• Flume agent is a physical JVM Daemon service,
which holds all Flume-NG components like
Sources, Channels, Sinks ..etc.

Core Components: Source
• Source is an active component, which receives data
from different locations and places it on one or more
Channels.
Different Source types:-
1) Pollable source(Auto-generating): Exec, SEQ
2) Event driven source: Avro source which accepts Avro
RPC calls and converts the RPC payload into a Flume
event.
3)Netcat Source: Syslog, ‘nc’ command line tool running
in server mode.
a1.sources=s1
a1.sources.s1.type=netcat

Core Components: Channel
• A channel is a glue between Source and Sink.
A channel may be in memory, which is fast but
makes no guarantee against data loss, or it can
be file/database (fully durable) where every
event is guaranteed to be delivered to the
connected sink even in failure cases like power
loss.
Single Channel can be connected to any number
of Sources and Sinks.
a1.channels=c1
a1.channels.c1.type=memory

Core Components: Sink
• For a flat Flume NG agent, sink is a destination for
data. Basically Sink will remove the events from
channel and transmits them to next eligible
destination(if exists).
Built-in Sinks:-
1)hdfs, writes events to HDFS.
2)logger, which simply logs all events received.
3)null, Auto-Consuming sinks. … etc.
a1.sinks=k1
a1.sinks.k1.type=logger

Interceptors
• Interceptors: An interceptor is a point in your
data flow where you can inspect and rout Flume
events. You can chain zero or more interceptors
after a source creates an event or before a sink
sends the event wherever it is destined.

channel selectors
• Channel selectors are responsible for how data
moves from a source to one or more channels.
There are two channel selectors.
1) A replicating channel selector (the default)
simply puts a copy of the event into each channel
assuming you have configured more than one.
2) A multiplexing channel selector can write to
different channels depending on certain header
information(Contextual routing).

sink processor
• A Sink Processor is responsible for invoking one
sink from an assigned group of sinks. Here the
Sink Processor is invoked by Sink runner.
Built-in Sink Processors:-
1) Load Balancing Sink Processor
2) Failover Sink Processor
3) Default Sink Processor

Flow Reliability in Flume
 Whenever the Sink commit/end the transaction,
then only the event data will be removed from
channel(passive component).

Basic Flume Agent Configuration
// example.conf file of agent name ‘a1’
a1.sources=s1
a1.channels=c1
a1.sinks=k1
a1.sources.s1.type=netcat
a1.sources.s1.channels=c1
a1.sources.s1.bind=localhost
a1.sources.s1.port=44444
a1.channels.c1.type=memory
a1.sinks.k1.type=logger
a1.sinks.k1.channel=c1

Starting Agent
• $ flume-ng agent --conf conf --conf-file example.conf --name a1
-Dflume.root.logger=INFO,console

What's hot

Inside FlumeCloudera, Inc.

Apache Flume (NG)Alexander Alten-Lorenz

Apache FlumeGetInData

Cloudera's FlumeCloudera, Inc.

Flume and Hadoop performance insightsOmid Vahdaty

Apache flume by Swapnil DubeySwapnil Dubey

Hadoop - Apache PigVibrant Technologies & Computers

ApacheCon-Flume-Kafka-2016Jayesh Thakrar

Stateful stream processing with Apache FlinkKnoldus Inc.

Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty

Ingestion file copy using apexApache Apex

Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward

Data Aggregation At Scale Using Apache FlumeArvind Prabhakar

Apache Apex: Stream Processing Architecture and ApplicationsThomas Weise

Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...InfluxData

DataTorrent Presentation @ Big Data Application MeetupThomas Weise

Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex

Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit

Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex

System Design & ScalabilityJohn DiFini

What's hot (20)

Inside Flume

Apache Flume (NG)

Apache Flume

Cloudera's Flume

Flume and Hadoop performance insights

Apache flume by Swapnil Dubey

Hadoop - Apache Pig

ApacheCon-Flume-Kafka-2016

Stateful stream processing with Apache Flink

Introduction to streaming and messaging flume,kafka,SQS,kinesis

Ingestion file copy using apex

Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...

Data Aggregation At Scale Using Apache Flume

Apache Apex: Stream Processing Architecture and Applications

Building Modern Data Pipelines for Time Series Data on GCP with InfluxData by...

DataTorrent Presentation @ Big Data Application Meetup

Apache Big Data EU 2016: Building Streaming Applications with Apache Apex

Large scale near real-time log indexing with Flume and SolrCloud

Intro to Apache Apex - Next Gen Platform for Ingest and Transform

System Design & Scalability

Similar to Flume basic

Flume lspe-110325145754-phpapp01joahp

Flume DS -JSP.pptxJayesh Patil

Avvo fkafkaNitin Kumar

Data persistency (draco, cygnus, sth comet, quantum leap)Fernando Lopez Aguilar

Session 09 - FlumeAnandMHadoop

Feb 2013 HUG: Large Scale Data Ingest Using Apache FlumeYahoo Developer Network

Low Latency Streaming Data Processing in HadoopInSemble

Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014Steve Hoffman

Spark+flume seattleHari Shreedharan

FIWARE Tech Summit - FIWARE Cygnus and STH-CometFIWARE

Introduction to FlumeRupak Roy

Deploying Apache Flume to enable low-latency analyticsDataWorks Summit

Apache FlumePrabhuSundarraj1

Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru

Cloud lunch and learn real-time streaming in azureTimothy Spann

HP Protects Massive, Global Network with StealthWatchLancope, Inc.

Mhug apache stormJoseph Niemiec

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative

Event driven systems Shatabda Majumdar

Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson

Similar to Flume basic (20)

Flume lspe-110325145754-phpapp01

Flume DS -JSP.pptx

Avvo fkafka

Data persistency (draco, cygnus, sth comet, quantum leap)

Session 09 - Flume

Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume

Low Latency Streaming Data Processing in Hadoop

Chicago Hadoop User Group (CHUG) Presentation on Apache Flume - April 9, 2014

Spark+flume seattle

FIWARE Tech Summit - FIWARE Cygnus and STH-Comet

Introduction to Flume

Deploying Apache Flume to enable low-latency analytics

Apache Flume

Big data components - Introduction to Flume, Pig and Sqoop

Cloud lunch and learn real-time streaming in azure

HP Protects Massive, Global Network with StealthWatch

Mhug apache storm

Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote

Event driven systems

Open Source Big Data Ingestion - Without the Heartburn!

Recently uploaded

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler

Salesforce Implementation Services PPT By ABSYZABSYZ Inc

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Introduction Computer Science - Software Design.pdfFerryKemperman

Post Quantum Cryptography – The Impact on Identityteam-WIBU

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts

How To Manage Restaurant Staff -BTRESTROmotivationalword821

A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Cyber security and its impact on E commercemanigoyal112

PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122

Powering Real-Time Decisions with Continuous Data StreamsSafe Software

Recently uploaded (20)

Recruitment Management Software Benefits (Infographic)

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

Salesforce Implementation Services PPT By ABSYZ

Automate your Kamailio Test Calls - Kamailio World 2024

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

CRM Contender Series: HubSpot vs. Salesforce

Introduction Computer Science - Software Design.pdf

Post Quantum Cryptography – The Impact on Identity

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

Implementing Zero Trust strategy with Azure

Odoo 14 - eLearning Module In Odoo 14 Enterprise

How To Manage Restaurant Staff -BTRESTRO

A healthy diet for your Java application Devoxx France.pdf

Intelligent Home Wi-Fi Solutions | ThinkPalm

Cyber security and its impact on E commerce

PREDICTING RIVER WATER QUALITY ppt presentation

Powering Real-Time Decisions with Continuous Data Streams

Flume basic

1. Welcome

2. ApacheFlume(NG)

3. Agenda • What is Flume? • Core flume-ng Concepts. • Flow Reliability in Flume. • Starting an Agent.

4. What is Flume? • Apache Flume is a distributed and reliable service for efficiently collecting, aggregating, and moving large amounts of log data from one place to another. • Its main goal is to deliver streaming data from applications to Apache Hadoop's HDFS(Most probably).

5. Core Concepts: Event, Client Event:- An Event is a singular unit of data that can be transported by Flume NG from origin to its final destination. An event is composed of zero or more headers and a body, For contextual routing. Client:- An entity that generates events and sends them to one or more agents. Apache web servers - which generates huge amount of log files on daily basis. Logging package like a log4j appender that directly sends events to Flume NG's source

6. Core Concepts: agent • Flume agent is a physical JVM Daemon service, which holds all Flume-NG components like Sources, Channels, Sinks ..etc.

7. Core Components: Source • Source is an active component, which receives data from different locations and places it on one or more Channels. Different Source types:- 1) Pollable source(Auto-generating): Exec, SEQ 2) Event driven source: Avro source which accepts Avro RPC calls and converts the RPC payload into a Flume event. 3)Netcat Source: Syslog, ‘nc’ command line tool running in server mode. a1.sources=s1 a1.sources.s1.type=netcat

8. Core Components: Channel • A channel is a glue between Source and Sink. A channel may be in memory, which is fast but makes no guarantee against data loss, or it can be file/database (fully durable) where every event is guaranteed to be delivered to the connected sink even in failure cases like power loss. Single Channel can be connected to any number of Sources and Sinks. a1.channels=c1 a1.channels.c1.type=memory

9. Core Components: Sink • For a flat Flume NG agent, sink is a destination for data. Basically Sink will remove the events from channel and transmits them to next eligible destination(if exists). Built-in Sinks:- 1)hdfs, writes events to HDFS. 2)logger, which simply logs all events received. 3)null, Auto-Consuming sinks. … etc. a1.sinks=k1 a1.sinks.k1.type=logger

10. Interceptors • Interceptors: An interceptor is a point in your data flow where you can inspect and rout Flume events. You can chain zero or more interceptors after a source creates an event or before a sink sends the event wherever it is destined.

11. channel selectors • Channel selectors are responsible for how data moves from a source to one or more channels. There are two channel selectors. 1) A replicating channel selector (the default) simply puts a copy of the event into each channel assuming you have configured more than one. 2) A multiplexing channel selector can write to different channels depending on certain header information(Contextual routing).

12. sink processor • A Sink Processor is responsible for invoking one sink from an assigned group of sinks. Here the Sink Processor is invoked by Sink runner. Built-in Sink Processors:- 1) Load Balancing Sink Processor 2) Failover Sink Processor 3) Default Sink Processor

13. Flow Reliability in Flume  Whenever the Sink commit/end the transaction, then only the event data will be removed from channel(passive component).

14. Basic Flume Agent Configuration // example.conf file of agent name ‘a1’ a1.sources=s1 a1.channels=c1 a1.sinks=k1 a1.sources.s1.type=netcat a1.sources.s1.channels=c1 a1.sources.s1.bind=localhost a1.sources.s1.port=44444 a1.channels.c1.type=memory a1.sinks.k1.type=logger a1.sinks.k1.channel=c1

15. Starting Agent • $ flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

16. Thank you All

Flume basic

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Flume basic

Similar to Flume basic (20)

More from Uday Vakalapudi

More from Uday Vakalapudi (12)

Recently uploaded

Recently uploaded (20)

Flume basic