SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Chicago Data Summit: Flume: An Introduction
Flume,[object Object],Logging for the Enterprise,[object Object],Jonathan Hsieh, Henry Robinson, Patrick Hunt, Eric Sammer,[object Object],Cloudera, Inc,[object Object],Chicago Data Summit, 4/26/11,[object Object]
Who Am I?,[object Object],Cloudera:,[object Object],Software Engineer on the Platform Team,[object Object],Flume Project Lead / Designer / Architect,[object Object],U of Washington:,[object Object],“On Leave” from PhD program,[object Object],Research in Systems and Programming Languages,[object Object],Previously: ,[object Object],Computer Security, Embedded Systems.	,[object Object],3,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
An Enterprise Scenario,[object Object],You have a bunch of departments with servers generating log files.,[object Object],You are required keep logs and want to analyze and profit from them.,[object Object],Because of the volume of uncooked data, you’ve started using Cloudera’s Distribution including Apache Hadoop.,[object Object],… and you’ve got some several ad-hoc, legacy scripts/systems that copy data from servers/filers and then to HDFS.,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object],4,[object Object],It’s log, log .. Everyone wants a log!,[object Object]
Ad-hoc gets complicated,[object Object],Black box?,[object Object],What happens if the person who wrote it leaves?,[object Object],Unextensible?,[object Object],Is it one-off or flexible enough to handle future needs?,[object Object],Unmanageable?,[object Object],Do you know when something goes wrong?,[object Object],Unreliable?,[object Object],If something goes wrong, will it recover?,[object Object],Unscalable?,[object Object],Hit a ingestion rate limit?,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object],5,[object Object]
Cloudera Flume,[object Object],Flume is a framework and conduit for collecting and quickly shipping data records from of many sources and to one centralized place for storage and processing.,[object Object],Project Goals:,[object Object],Scalability,[object Object],Reliability,[object Object],Extensibility,[object Object],Manageability,[object Object],Openness,[object Object],6,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
The Canonical Use Case,[object Object],HDFS,[object Object],7,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Collector tier,[object Object],Agent tier,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
The Canonical Use Case,[object Object],HDFS,[object Object],Flume,[object Object],Agent,[object Object],server,[object Object],8,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Collector tier,[object Object],Agent tier,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
The Canonical Use Case,[object Object],HDFS,[object Object],Flume,[object Object],Master,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],9,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Collector tier,[object Object],Agent tier,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
The Canonical Use Case,[object Object],HDFS,[object Object],Flume,[object Object],Master,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],10,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Collector tier,[object Object],Agent tier,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Flume’s Key Abstractions,[object Object],Data path and control path,[object Object],Nodes are in the data path ,[object Object],Nodes have a source and a sink,[object Object],They can take different roles,[object Object],A typical topology has agent nodes and collector nodes.,[object Object],Optionally it has processor nodes.,[object Object],Masters are in the control path.,[object Object],Centralized point of configuration.,[object Object],Specify sources and sinks ,[object Object],Can control flows of data between nodes,[object Object],Use one master or use many with a ZK-backed quorum,[object Object],11,[object Object],node,[object Object],Agent,[object Object],  sink,[object Object],source,[object Object],node,[object Object],Collector,[object Object],  sink,[object Object],source,[object Object],Master,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Flume’s Key Abstractions,[object Object],Data path and control path,[object Object],Nodes are in the data path ,[object Object],Nodes have a source and a sink,[object Object],They can take different roles,[object Object],A typical topology has agent nodes and collector nodes.,[object Object],Optionally it has processor nodes.,[object Object],Masters are in the control path.,[object Object],Centralized point of configuration.,[object Object],Specify sources and sinks ,[object Object],Can control flows of data between nodes,[object Object],Use one master or use many with a ZK-backed quorum,[object Object],12,[object Object],node,[object Object],  sink,[object Object],source,[object Object],node,[object Object],  sink,[object Object],source,[object Object],Master,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Outline,[object Object],What is Flume?,[object Object],Scalability,[object Object],Horizontal scalability of all nodes and masters,[object Object],Reliability,[object Object],Fault-tolerance and High availability ,[object Object],Extensibility,[object Object],Unix principle, all kinds of data, all kinds of sources, all kinds of sinks,[object Object],Manageability,[object Object],Centralized management supporting dynamic reconfiguration ,[object Object],Openness,[object Object],Apache v2.0 License and an active and growing community,[object Object],13,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Scalability,[object Object],14,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
The Canonical Use Case,[object Object],HDFS,[object Object],Flume,[object Object],Agent,[object Object],server,[object Object],15,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Collector tier,[object Object],Agent tier,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Data path is horizontally scalable,[object Object],Add collectors to increase availability and to handle more data,[object Object],Assumes a single agent will not dominate a collector,[object Object],Fewer connections to HDFS that tax the resource constrained NameNode,[object Object],Larger more efficient writes to HDFS and fewer files avoids “small file problem”,[object Object],Simplifies security story when supporting Kerborized HDFS or protected production servers.,[object Object],[object Object],Write log locally to avoid collector disk IO bottleneck and catastrophic failures,[object Object],Compression and batching  (trade cpu for network),[object Object],Push computation into the event collection pipeline (balance IO, Mem, and CPU resource bottlenecks),[object Object],16,[object Object],HDFS,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Node scalability limits and optimization plans,[object Object],17,[object Object],HDFS,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],Collector,[object Object],server,[object Object],Agent,[object Object],server,[object Object],Agent,[object Object],server,[object Object],In most deployments today, a single collector is not saturated. ,[object Object],The current implementation can write at 20MB/s over 1GbE (~1.75 TB/day) due to unoptimized network usage.,[object Object],Assuming 1GbE with aggregate disk able to write at close to GbE rate, we can probably reach:,[object Object],3-5x by batching to get to wire/disk limit (trade latency for throughput),[object Object],5-10x  by compression to trade CPU for throughput (logs highly compressible),[object Object],The limit is probably in the ball park of 40 effective TB/day/collector.,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Control plane is horizontally scalable,[object Object],A master controls dynamic configurations of nodes,[object Object],Uses consensus protocol to keep state consistent,[object Object],Scales well for configuration reads,[object Object],Allows for adaptive repartitioning in the future,[object Object],Nodes can talk to any master.,[object Object],Masters can talk to an existing ZK ensemble,[object Object],ZK1,[object Object],Node,[object Object],Master,[object Object],ZK2,[object Object],Node,[object Object],Master,[object Object],ZK3,[object Object],Master,[object Object],18,[object Object],Node,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Reliability,[object Object],19,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Failures,[object Object],Faults can happen at many levels,[object Object],Software applications can fail,[object Object],Machines can fail,[object Object],Networking gear can fail,[object Object],Excessive networking congestion or machine load,[object Object],A node goes down for maintenance.,[object Object],How do we make sure that events make it to a permanent store?,[object Object],20,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Tunable failure recovery modes,[object Object],HDFS,[object Object],HDFS,[object Object],HDFS,[object Object],Best effort,[object Object],Fire and forget,[object Object],Store on failure + retry,[object Object],Writes to disk on detected failure.,[object Object],One-hop TCP acks,[object Object],Failover when faults detected. ,[object Object],End-to-end reliability,[object Object],Write ahead log on agent,[object Object],Checksums and End-to-end acks,[object Object],Data survives compound failures, and may be retried multiple times,[object Object],Agent,[object Object],Collector,[object Object],Collector,[object Object],Agent,[object Object],Collector,[object Object],Agent,[object Object],21,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Load balancing,[object Object],22,[object Object],Agent,[object Object],[object Object]
Use randomization to pre-specify failovers when many collectors exist Spread load if a collector goes down.,[object Object],Spread load if new collectors added to the system.,[object Object],Collector,[object Object],Agent,[object Object],Agent,[object Object],Collector,[object Object],Agent,[object Object],Agent,[object Object],Collector,[object Object],Agent,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Load balancing and collector failover,[object Object],Agent,[object Object],[object Object]
Use randomization to pre-specify failovers when many collectors exist Spread load if a collector goes down.,[object Object],Spread load if new collectors added to the system.,[object Object],23,[object Object],Collector,[object Object],Agent,[object Object],Agent,[object Object],Collector,[object Object],Agent,[object Object],Agent,[object Object],Collector,[object Object],Agent,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Control plane is Fault Tolerent,[object Object],A master controls dynamic configurations of nodes,[object Object],Uses consensus protocol to keep state consistent,[object Object],Scales well for configuration reads,[object Object],Allows for adaptive repartitioning in the future,[object Object],Nodes can talk to any master.,[object Object],Masters can talk to an existing ZK ensemble,[object Object],ZK1,[object Object],Node,[object Object],Master,[object Object],ZK2,[object Object],Node,[object Object],Master,[object Object],ZK3,[object Object],Master,[object Object],24,[object Object],Node,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Control plane is Fault Tolerent,[object Object],A master controls dynamic configurations of nodes,[object Object],Uses consensus protocol to keep state consistent,[object Object],Scales well for configuration reads,[object Object],Allows for adaptive repartitioning in the future,[object Object],Nodes can talk to any master.,[object Object],Masters can talk to an existing ZK ensemble,[object Object],ZK1,[object Object],Node,[object Object],Master,[object Object],ZK2,[object Object],Master,[object Object],ZK3,[object Object],Master,[object Object],25,[object Object],Node,[object Object],Node,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Control plane is Fault Tolerent,[object Object],A master controls dynamic configurations of nodes,[object Object],Uses consensus protocol to keep state consistent,[object Object],Scales well for configuration reads,[object Object],Allows for adaptive repartitioning in the future,[object Object],Nodes can talk to any master.,[object Object],Masters can talk to an existing ZK ensemble,[object Object],ZK1,[object Object],Node,[object Object],Master,[object Object],ZK2,[object Object],Node,[object Object],Master,[object Object],ZK3,[object Object],Master,[object Object],26,[object Object],Node,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Extensibility,[object Object],27,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
sink,[object Object],sink,[object Object],Flume is easy to extend,[object Object],Simple source and sink APIs,[object Object],An event streaming design,[object Object],Many simple operations composes for complex behavior,[object Object],Plug-in architecture so you can add your own sources, sinks and decorators and sinks,[object Object],28,[object Object],sink,[object Object],source,[object Object],deco,[object Object],fanout,[object Object],deco,[object Object],source,[object Object],deco,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Variety of Connectors,[object Object],Sources produce data,[object Object],Console, Exec, Syslog, Scribe, IRC, Twitter, ,[object Object],In the works: JMS, AMQP, pubsubhubbub/RSS/Atom,[object Object],Sinks consume data,[object Object],Console, Local files, HDFS, S3,[object Object],Contributed: Hive (Mozilla), Hbase (Sematext), Cassandra (Riptano/DataStax), Voldemort, Elastic Search,[object Object],In the works: JMS, AMQP,[object Object],Decorators modify data sent to sinks,[object Object],Wire batching, compression, sampling, projection, extraction, throughput throttling,[object Object],Custom near real-time processing  (Meebo),[object Object],JRuby event modifiers (InfoChimps),[object Object],Cryptographic extensions(Rearden),[object Object],Streaming SQL in-stream-analytics system,[object Object],FlumeBase (Aaron Kimball),[object Object],29,[object Object],source,[object Object],sink,[object Object],deco,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Migrating previous enterprise architecture,[object Object],30,[object Object],HDFS,[object Object],filer,[object Object],HDFS,[object Object],HDFS,[object Object],Flume,[object Object],Collector,[object Object],Agent,[object Object],poller,[object Object],Msg bus,[object Object],Flume,[object Object],Flume,[object Object],Agent,[object Object],amqp,[object Object],Collector,[object Object],Custom app,[object Object],Collector,[object Object],Agent,[object Object],avro,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Data ingestion pipeline pattern,[object Object],31,[object Object],HBase,[object Object],Incremental Search Idx,[object Object],HDFS,[object Object],Flume,[object Object],Agent,[object Object],Hive query,[object Object],Agent,[object Object],Agent,[object Object],Collector,[object Object],Fanout,[object Object],index,[object Object],hbase,[object Object],hdfs,[object Object],Agent,[object Object],svr,[object Object],Pig query,[object Object],Key lookup,[object Object],Range query,[object Object],Search query,[object Object],Faceted query,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Manageability,[object Object],Wheeeeee!,[object Object],32,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Configuring Flume,[object Object],Node: tail(“file”) | filter [ console, roll(1000) { dfs(“hdfs://namenode/user/flume”) } ] ;,[object Object],A concise and precise configuration language for specifying dataflows in a node.,[object Object],Dynamic updates of configurations,[object Object],Allows for live failover changes,[object Object],Allows for handling newly provisioned machines,[object Object],Allows for changing analytics,[object Object],33,[object Object],tail,[object Object],filter,[object Object],fanout,[object Object],roll,[object Object],hdfs,[object Object],console,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Output bucketing,[object Object],Automatic output file management ,[object Object],Write hdfs files in over time based tags,[object Object],34,[object Object],HDFS,[object Object],Collector,[object Object],/logs/web/2010/0715/1200/data-xxx.txt,[object Object],/logs/web/2010/0715/1200/data-xxy.txt,[object Object],/logs/web/2010/0715/1300/data-xxx.txt,[object Object],/logs/web/2010/0715/1300/data-xxy.txt,[object Object],/logs/web/2010/0715/1400/data-xxx.txt,[object Object],…,[object Object],Collector,[object Object],node : collectorSource | collectorSink (“hdfs://namenode/logs/web/%Y/%m%d/%H00”, “data”),[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Configuration is straightforward,[object Object],node001: tail(“/var/log/app/log”) | autoE2ESink;,[object Object],node002: tail(“/var/log/app/log”) | autoE2ESink;,[object Object],…,[object Object],node100: tail(“/var/log/app/log”) | autoE2ESink;,[object Object],collector1: autoCollectorSource | collectorSink(“hdfs://logs/app/”,”applogs”),[object Object],collector2: autoCollectorSource | collectorSink(“hdfs://logs/app/”,”applogs”),[object Object],collector3: autoCollectorSource | collectorSink(“hdfs://logs/app/”,”applogs”),[object Object],35,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Centralized Dataflow Management Interfaces,[object Object],One place to specify node sources, sinks and data flows.,[object Object],Basic Web interface  ,[object Object],Flume Shell,[object Object],Command line interface,[object Object],Scriptable ,[object Object],Cloudera Enterprise,[object Object],Flume Monitor App,[object Object],Graphical web interface,[object Object],36,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Enterprise Friendly,[object Object],Integrated as part of CDH3 and Cloudera Enterprise,[object Object],RPM and DEB packaging for enterprise Linux,[object Object],Flume Node for Windows (beta),[object Object],Cloudera Enterprise Support ,[object Object],24-7 Support SLAs,[object Object],Professional Services,[object Object],Cloudera Flume Features for Enterprises,[object Object],Kerberos Authentication support for writing to “secure” HDFS,[object Object],Detailed JSON-exposed metrics for monitoring integration (beta),[object Object],Log4J collection (beta),[object Object],High Availability via Multiple Master (alpha),[object Object],Encrypted SSL / TLS data path and control path support (dev),[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object],37,[object Object]
An enterprise story,[object Object],38,[object Object],Kerberos HDFS,[object Object],Flume,[object Object],Collector tier,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Win,[object Object],api,[object Object],Department Servers,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Linux,[object Object],api,[object Object],D,[object Object],D,[object Object],D,[object Object],D,[object Object],D,[object Object],D,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Linux,[object Object],api,[object Object],Active Directory,[object Object], / LDAP,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Openness And Community,[object Object],39,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Flume is Open Source,[object Object],Apache v2.0 Open Source License ,[object Object],Independent from Apache Software Foundation,[object Object],You have the right to fork or modify the software,[object Object],GitHub source code repository,[object Object],http://github.com/cloudera/flume,[object Object],Regular tarball update versions every 2-3 months.,[object Object],Regular CDH packaging updates every 3-4 months.,[object Object],Always looking for contributors and committors,[object Object],40,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
Growing user and developer community ,[object Object],41,[object Object],[object Object]
Lots of innovation comes from community
Community folks are willing to tryincomplete features.
Early feedback and community fixes
Many interesting topologies in the communityJonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
                       : Multi Datacenter,[object Object],42,[object Object],HDFS,[object Object],Collector tier,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],API server,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],proc,[object Object],Agent,[object Object],api,[object Object],Processor server,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],proc,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],proc,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
                       : Multi Datacenter,[object Object],43,[object Object],HDFS,[object Object],Collector tier,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],API server,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Relay,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],proc,[object Object],Agent,[object Object],api,[object Object],Processor server,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],proc,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],Collector,[object Object],api,[object Object],Agent,[object Object],api,[object Object],Agent,[object Object],proc,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]
             : Near Real-time Aggregator,[object Object],44,[object Object],HDFS,[object Object],DB,[object Object],Flume,[object Object],Agent,[object Object],Ad svr,[object Object],Collector,[object Object],Tracker ,[object Object],Agent,[object Object],Ad svr,[object Object],Agent,[object Object],Ad svr,[object Object],Agent,[object Object],Ad svr,[object Object],quick,[object Object],reports,[object Object],Hive job,[object Object],verify,[object Object],reports,[object Object],Jonathan Hsieh, Chicago Data Summit  4/26/2011,[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructureDevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructureAngelo Failla
 
Challenges for Deploying a High-Performance Computing Application to the Cloud
Challenges for Deploying a High-Performance Computing Application to the CloudChallenges for Deploying a High-Performance Computing Application to the Cloud
Challenges for Deploying a High-Performance Computing Application to the CloudIntel® Software
 
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle The Linux Foundation
 
SCU 2015 - Hyper-V Replica
SCU 2015 - Hyper-V ReplicaSCU 2015 - Hyper-V Replica
SCU 2015 - Hyper-V ReplicaMike Resseler
 
Texter blue - gdpr watchdog
Texter blue - gdpr watchdogTexter blue - gdpr watchdog
Texter blue - gdpr watchdogLuis Cabaceira
 
hadoop architecture -Big data hadoop
   hadoop architecture -Big data hadoop   hadoop architecture -Big data hadoop
hadoop architecture -Big data hadoopjasikadogra
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
SVC / Storwize: cache partition analysis (BVQ howto)
SVC / Storwize: cache partition analysis  (BVQ howto)   SVC / Storwize: cache partition analysis  (BVQ howto)
SVC / Storwize: cache partition analysis (BVQ howto) Michael Pirker
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...The Linux Foundation
 
Yeti DNS Project
Yeti DNS ProjectYeti DNS Project
Yeti DNS ProjectAPNIC
 

Was ist angesagt? (20)

Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
ApacheCon-HBase-2016
ApacheCon-HBase-2016ApacheCon-HBase-2016
ApacheCon-HBase-2016
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructureDevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
DevopsItalia2015 - DHCP at Facebook - Evolution of an infrastructure
 
Challenges for Deploying a High-Performance Computing Application to the Cloud
Challenges for Deploying a High-Performance Computing Application to the CloudChallenges for Deploying a High-Performance Computing Application to the Cloud
Challenges for Deploying a High-Performance Computing Application to the Cloud
 
Drop the Pressure on your Production Server
Drop the Pressure on your Production ServerDrop the Pressure on your Production Server
Drop the Pressure on your Production Server
 
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
 
SCU 2015 - Hyper-V Replica
SCU 2015 - Hyper-V ReplicaSCU 2015 - Hyper-V Replica
SCU 2015 - Hyper-V Replica
 
Texter blue - gdpr watchdog
Texter blue - gdpr watchdogTexter blue - gdpr watchdog
Texter blue - gdpr watchdog
 
hadoop architecture -Big data hadoop
   hadoop architecture -Big data hadoop   hadoop architecture -Big data hadoop
hadoop architecture -Big data hadoop
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
SVC / Storwize: cache partition analysis (BVQ howto)
SVC / Storwize: cache partition analysis  (BVQ howto)   SVC / Storwize: cache partition analysis  (BVQ howto)
SVC / Storwize: cache partition analysis (BVQ howto)
 
XS 2008 Boston Capacity Planning
XS 2008 Boston Capacity PlanningXS 2008 Boston Capacity Planning
XS 2008 Boston Capacity Planning
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
XS Oracle 2009 Just Run It
XS Oracle 2009 Just Run ItXS Oracle 2009 Just Run It
XS Oracle 2009 Just Run It
 
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
 
Yeti DNS Project
Yeti DNS ProjectYeti DNS Project
Yeti DNS Project
 

Andere mochten auch

Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010Yahoo Developer Network
 
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Cloudera, Inc.
 
Spring for Apache Hadoop
Spring for Apache HadoopSpring for Apache Hadoop
Spring for Apache Hadoopzenyk
 
Designing a reactive data platform: Challenges, patterns, and anti-patterns
Designing a reactive data platform: Challenges, patterns, and anti-patterns Designing a reactive data platform: Challenges, patterns, and anti-patterns
Designing a reactive data platform: Challenges, patterns, and anti-patterns Alex Silva
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detectionhadooparchbook
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoopskaluska
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applicationshadooparchbook
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 

Andere mochten auch (17)

Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010Hadoop - Integration Patterns and Practices__HadoopSummit2010
Hadoop - Integration Patterns and Practices__HadoopSummit2010
 
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
 
Spring for Apache Hadoop
Spring for Apache HadoopSpring for Apache Hadoop
Spring for Apache Hadoop
 
Designing a reactive data platform: Challenges, patterns, and anti-patterns
Designing a reactive data platform: Challenges, patterns, and anti-patterns Designing a reactive data platform: Challenges, patterns, and anti-patterns
Designing a reactive data platform: Challenges, patterns, and anti-patterns
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 
Building Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion PipelinesBuilding Continuously Curated Ingestion Pipelines
Building Continuously Curated Ingestion Pipelines
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 

Ähnlich wie Chicago Data Summit: Flume: An Introduction

Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
An Open Source Case Study
An Open Source Case StudyAn Open Source Case Study
An Open Source Case Studywebhostingguy
 
Serverless (Distributed computing)
Serverless (Distributed computing)Serverless (Distributed computing)
Serverless (Distributed computing)Sri Prasanna
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingPhil Wilkins
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overviewRobel Parvini
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage SystemAmdocs
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage SystemAmdocs
 
Naveen nimmu sdn future of networking
Naveen nimmu sdn   future of networkingNaveen nimmu sdn   future of networking
Naveen nimmu sdn future of networkingsuniltomar04
 
Naveen nimmu sdn future of networking
Naveen nimmu sdn   future of networkingNaveen nimmu sdn   future of networking
Naveen nimmu sdn future of networkingOpenSourceIndia
 
Kiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh Khương
Kiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh KhươngKiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh Khương
Kiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh KhươngLac Viet Computing Corporation
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryZongYing Lyu
 

Ähnlich wie Chicago Data Summit: Flume: An Introduction (20)

Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Hadoop
HadoopHadoop
Hadoop
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
An Open Source Case Study
An Open Source Case StudyAn Open Source Case Study
An Open Source Case Study
 
Serverless (Distributed computing)
Serverless (Distributed computing)Serverless (Distributed computing)
Serverless (Distributed computing)
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About Logging
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overview
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage System
 
Zettabyte File Storage System
Zettabyte File Storage SystemZettabyte File Storage System
Zettabyte File Storage System
 
Naveen nimmu sdn future of networking
Naveen nimmu sdn   future of networkingNaveen nimmu sdn   future of networking
Naveen nimmu sdn future of networking
 
Naveen nimmu sdn future of networking
Naveen nimmu sdn   future of networkingNaveen nimmu sdn   future of networking
Naveen nimmu sdn future of networking
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Kiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh Khương
Kiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh KhươngKiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh Khương
Kiến trúc mạng cho hệ thống VDI - Mr Nguyễn Phạm Vĩnh Khương
 
Libra Library OS
Libra Library OSLibra Library OS
Libra Library OS
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memory
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 

Kürzlich hochgeladen (20)

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 

Chicago Data Summit: Flume: An Introduction

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44. Lots of innovation comes from community
  • 45. Community folks are willing to tryincomplete features.
  • 46. Early feedback and community fixes
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.