Helper methods for plugins- Logging, Configuration, Metrics etc- Common methods across plugins- Consistent plugin developmentv0.14ServerEngine- Supervise plugins as child processes- Restart plugins automatically on failure- Graceful shutdown of plugins- Distributed process group based on tags- Pluggable process supervision- out_forward output pluginv0.14ServerEngineInputFilterBufferOutputInputFilterBufferOutputInputFilterBufferOutputSuperviseTime with Nanosecond Resolution- Time format changed to include nanoseconds- Time
Fluentd is an open source data collector that allows flexible data collection, processing, and output. It supports streaming data from sources like logs and metrics to destinations like databases, search engines, and object stores. Fluentd's plugin-based architecture allows it to support a wide variety of use cases. Recent versions of Fluentd have added features like improved plugin APIs, nanosecond time resolution, and Windows support to make it more suitable for containerized environments and low-latency applications.
Ähnlich wie Helper methods for plugins- Logging, Configuration, Metrics etc- Common methods across plugins- Consistent plugin developmentv0.14ServerEngine- Supervise plugins as child processes- Restart plugins automatically on failure- Graceful shutdown of plugins- Distributed process group based on tags- Pluggable process supervision- out_forward output pluginv0.14ServerEngineInputFilterBufferOutputInputFilterBufferOutputInputFilterBufferOutputSuperviseTime with Nanosecond Resolution- Time format changed to include nanoseconds- Time
Ähnlich wie Helper methods for plugins- Logging, Configuration, Metrics etc- Common methods across plugins- Consistent plugin developmentv0.14ServerEngine- Supervise plugins as child processes- Restart plugins automatically on failure- Graceful shutdown of plugins- Distributed process group based on tags- Pluggable process supervision- out_forward output pluginv0.14ServerEngineInputFilterBufferOutputInputFilterBufferOutputInputFilterBufferOutputSuperviseTime with Nanosecond Resolution- Time format changed to include nanoseconds- Time (20)
Helper methods for plugins- Logging, Configuration, Metrics etc- Common methods across plugins- Consistent plugin developmentv0.14ServerEngine- Supervise plugins as child processes- Restart plugins automatically on failure- Graceful shutdown of plugins- Distributed process group based on tags- Pluggable process supervision- out_forward output pluginv0.14ServerEngineInputFilterBufferOutputInputFilterBufferOutputInputFilterBufferOutputSuperviseTime with Nanosecond Resolution- Time format changed to include nanoseconds- Time
1. Fluentd Overview, Now and Then
Satoshi Tagomori (@tagomoris)
Fluentd meetup in Matsue #fluentdmeetup
4. What’s Fluentd?
Simple core
+ Variety of plugins
Buffering, HA (failover),
Secondary output, etc.
Like syslogd in streaming manner
AN EXTENSIBLE & RELIABLE DATA COLLECTION TOOL
5. Log collection with traditional logrotate + rsync
Log Server
Application
Server A
File FileFile
Hard to analyze!!
Complex text parsers
Application
Server C
File FileFile
Application
Server B
File FileFile
High latency!!
Must wait for a day
6. Streaming way with Fluentd
Log Server
Application
Server A
File FileFile
Application
Server C
File FileFile
Application
Server B
File FileFile
Low latency!
Seconds or minutes
Easy to analyze!!
Parsed and formatted
7. M x N problem for data integration
LOG
script to
parse data
cron job for
loading
filtering
script
syslog
script
Tweet-
fetching
script
aggregation
script
aggregation
script
script to
parse data
rsync
server
20. Error Handling and Recovery
in_tail
/var/log/access.log
/var/log/fluentd/buffer
but_file
Buffering for any outputs
Retrying automatically
With exponential wait
and persistence on a disk
and secondary output
24. Data partitioning by time on HDFS / S3
access.log
buffer
Custom file
formatter
Slice files based on time
2016-01-01/01/access.log.gz
2016-01-01/02/access.log.gz
2016-01-01/03/access.log.gz
…
in_tail
28. Microsoft
Operations Management Suite uses Fluentd: "The core of the agent uses an existing
open source data aggregator called Fluentd. Fluentd has hundreds of existing
plugins, which will make it really easy for you to add new data sources."
Syslog
Linux Computer
Operating System
Apache
MySQL
Containers
omsconfig (DSC)
PS DSC
Providers
OMI Server
(CIM Server)
omsagent
Firewall/proxy
OMSService
Upload Data
(HTTPS)
Pull
configuration
(HTTPS)
29. Atlassian
"At Atlassian, we've been impressed by Fluentd and have chosen to use it in
Atlassian Cloud's logging and analytics pipeline."
Kinesis
Elasticsearch
cluster
Ingestion
service
30. Amazon web services
The architecture of Fluentd (Sponsored by Treasure Data) is very similar to Apache
Flume or Facebook’s Scribe. Fluentd is easier to install and maintain and has better
documentation and support than Flume and Scribe.
Types of DataStoreCollect
Transactional
• Database reads & write (OLTP)
• Cache
Search
• Logs
• Streams
File
• Log files (/val/log)
• Log collectors & frameworks
Stream
• Log records
• Sensors & IoT data
Web Apps
IoTApplicationsLogging
Mobile Apps
Database
Search
File Storage
Stream Storage
32. The Container Era
Server Era Container Era
Service Architecture Monolithic Microservices
System Image Mutable Immutable
Managed By Ops Team DevOps Team
Local Data Persistent Ephemeral
Log Collection syslogd / rsync ?
Metrics Collection Nagios / Zabbix ?
33. Server Era Container Era
Service Architecture Monolithic Microservices
System Image Mutable Immutable
Managed By Ops Team DevOps Team
Local Data Persistent Ephemeral
Log Collection syslogd / rsync ?
Metrics Collection Nagios / Zabbix ?
The Container Era
How should log & metrics collection
be done in The Container Era?
35. The traditional logrotate + rsync on containers
Log Server
Application
Container A
File FileFile
Hard to analyze!!
Complex text parsers
Application
Container C
File FileFile
Application
Container B
File FileFile
High latency!!
Must wait for a day
Ephemeral!!
Could be lost at any time
36. Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
Small & many containers make storages overloaded
Too many
connections from
micro containers!
37. Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
System images are immutable
Too many
connections from
micro containers!
Embedding destination
IPsin ALL Docker images
makes management hard
39. Text logging with --log-driver=fluentd
Server
Container
App
FluentdSTDOUT / STDERR
docker run
--log-driver=fluentd
--log-opt
fluentd-address=localhost:24224
{
“container_id”: “ad6d5d32576a”,
“container_name”: “myapp”,
“source”: stdout
}
40. Metrics collection with fluent-logger
Server
Container
App
Fluentd
from fluent import sender
from fluent import event
sender.setup('app.events', host='localhost')
event.Event('purchase', {
'user_id': 21, 'item_id': 321, 'value': '1'
})
tag = app.events.purchase
{
“user_id”: 21,
“item_id”: 321
“value”: 1,
}
fluent-logger library
41. Shared data volume and tailing
Server
Container
App
Fluentd
<source>
@type tail
path /mnt/nginx/logs/access.log
pos_file /var/log/fluentd/access.log.pos
format nginx
tag nginx.access
</source>
/mnt/nginx/logs
42. Logging methods for each purpose
• Collecting log messages
> --log-driver=fluentd
• Application metrics
> fluent-logger
• Access logs, logs from middleware
> Shared data volume
• System metrics (CPU usage, Disk capacity, etc.)
> Fluentd’s input plugins
(Fluentd pulls those data periodically)
44. Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
Primitive deployment…
Too many
connections from
many containers!
Embedding destination
IPsin ALL Docker images
makes management hard
45. Server 1
Container A
Application
Container B
Application
Fluentd
Server 2
Container C
Application
Container D
Application
Fluentd Kafka
elasticsearch
HDFS
Container
Container
Container
Container
destination is always
localhost from app’s
point of view
Source aggregation decouples config
from apps
46. Server 1
Container A
Application
Container B
Application
Fluentd
Server 2
Container C
Application
Container D
Application
Fluentd
active / standby /
load balancing
Destination aggregation makes storages scalable
for high traffic
Aggregation server(s)
47. Aggregation servers
• Logging directly from microservices makes log
storages overloaded.
> Too many RX connections
> Too frequent import API calls
• Aggregation servers make the logging infrastracture
more reliable and scalable.
> Connection aggregation
> Buffering for less frequent import API calls
> Data persistency during downtime
> Automatic retry at recovery from downtime
48. Fluentd ♡ Container
• Fluentd model fits container based systems
> This is why Treasure Data joined CNCF
> TD wants to improve cloud native ecosystem
• Fluentd, Prometheus, Docker and Kubernetes
collabolation is good for modern systems
• Easy to scale and easy to maintain
• Fluentd logging driver in Docker
• fluent-plugin-prometheus to send application metrics
to prometheus
• EFK for log visualization in Kubernetes
50. • v0.14.0: Released at May 31, 2016
• v0.14.1: Released at Jun 30, 2016
• New Features
• New Plugin APIs, Plugin Helpers & Plugin Storage
• Time with Nanosecond resolution
• ServerEngine based Supervisor
• Windows support
v0.14
51. New Plugin APIs
• Input/Output plugin APIs w/ well-controlled lifecycle
• stop, shutdown, close, terminate
• New Buffer API for delayed commit of chunks
• parallel/async "commit" operation for chunks
• 100% Compatible w/ v0.12 plugins
• compatibility layer for traditional APIs
• it will be supported between v1.x versions
52. Router
buffer_chunk_limit
enqueue: exceed flush_interval
or buffer_chunk_limit
Key pattern:
- BufferedOutput
empty string or specified key
-ObjectBufferedOutput tag
-TimeSlicedOutput time slice
emit emit
Buffer
Queue
buffer_queue_limit
Output
OutputInput / Filter
Tag Time
Record Chunk
Chunk
Chunk Chunk
Chunk
key:foo
key:bar
key:baz
v0.12 buffer design
54. Plugin Storage & Helpers
• Plugin Storage: new plugin type for plugins
• provides key-value storage for plugins
• to persistent intermediate status of plugins
• built-in plugins (in plan): in-memory, local file
• pluggable: 3rd party plugin to store data to Redis?
• Plugin Helpers:
• collections of utility methods for plugins
• making threads, sockets, network servers, ...
• fully integrated with test drivers to run test codes after
setup phase of helpers (e.g., after created threads started)
57. Time with nanosecond
• For sub-second systems: Elasticsearch, InfluxData and etc
• Fluent::EventTime
• behaves as Integer (used as time in v0.12)
• has methods to get sub-second resolution
• be serialized into msgpack using Ext type
• Fluentd core can handle both of Integer and EventTime as
time
• compatible with older versions and software in eco-
system (e.g., fluent-logger, Docker logging driver)
58. ServerEngine based
Supervisor
• Replacing supervisor process with ServerEngine
• it has SocketManager to share listening sockets
between 2 or more worker processes
• Replacing Fluentd's processing model from fork to
spawn
• to support Windows environment
59. Windows support
• Fluentd and core plugin work on Windows
• several companies have already used
v0.14.0.pre version on production
• We will send a patch to popular plugins if
it doesn’t work on Windows
• Use HTTP RPC instead of signals
60. v0.14.x - v1
• v0.14.x (some versions in 2016)
• Symmetric multi-core processing
• Counter API
• TLS/authentication/authorization support
(merging secure forward)
• https://github.com/fluent/fluentd/issues/1000
• v1 (4Q in 2016 or 1Q in 2017)
• Stable version for new APIs / features
• Fully compatible with v0.12
• exclude v0 config syntax and detach_process
61. Symmetric multi core processing
• 2 or more workers share a configuration file
• and share listening sockets via PluginHelper
• under a supervisor process (ServerEngine)
• Multi core scalability for huge traffic
• one input plugin for a tcp port, some filters and
one (or some) output plugin
• buffer paths are managed automatically by
Fluentd core
63. Counter API
• APIs to increment/decrement values
• shared by some processes
• persisted on disk backed by Storage API
• Useful for collecting metrics or stats filters
64. TLS/Authn/Authz support for forward plugin
• secure-forward will be merged into built-in forward
• TLS w/ at-least-one semantics
• Simple authentication/authorization w/ non-SSL
forwarding
• Authentication and Authorization providers
• Who can connect to input plugins?
What tags are permitted for clients?
• New plugin types (3rd party authors can write it)
• Mainly for in/out forward, but available from others