Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.


Final Project


Apache Flume




Rapheephan Thongkham-Uan (Nancy)

cscie90 Cloud Computing
Harvard University Extension ...
What is Apache Flume?
▪ Flume is distributed, reliable, and available service for efficiently
collecting, aggregating, and...
Applying Flume to Manufacturing Process
▪ In the factory, there are many machines used in the production.

!
!
!
!
!
▪ If ...
Multi-agent flow image in the production system

AGENT 1

consolidation

AGENT 2

AGENT 4
HDFS

AGENT 3

@Rapheephan

4
My Sample
agent 1
CHANNEL

SOURCE

SINK

HDFS

▪ My system
▪ Java Runtime Environment (Java1.6.0_31)
▪ Cloudera's Distribu...
Prepare the log generation application
▪ Create 2 Virtual Machines for generating machine1’s and machine2’s
log data.
▪ Cr...
Configuration Flume-ng agent on Host
▪ We have to configure all sink, channel, and source in the flow. My
agent name is hd...
Configuration Flume-ng agent on Host (2)
▪ We want to collect the log data and write to the ‘testFlume’
directory on the H...
Start the Flume agent and get result
▪ My configuration file name is ‘flume.conf’, and my agent name is
‘hdfs-agent’.
▪ St...
Next steps
▪ Analyse the log data and Visualise it in the (near) real time.

!
!
!
!
!
!
!
!
!

AGENT 1

MapReduce
Hive

A...
Nächste SlideShare
Wird geladen in …5
×

Apache Flume and its use case in Manufacturing

3.412 Aufrufe

Veröffentlicht am

CSCI E90 Cloud Computing
Harvard University Extension School

Veröffentlicht in: Bildung, Technologie
  • Als Erste(r) kommentieren

Apache Flume and its use case in Manufacturing

  1. 1. 
 Final Project
 Apache Flume
 
 Rapheephan Thongkham-Uan (Nancy) cscie90 Cloud Computing Harvard University Extension School Prof. Zoran B. Djordjević @TakeshiDemonkey 1
  2. 2. What is Apache Flume? ▪ Flume is distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data from many different sources to a centralised data store. (http:// flume.apache.org/FlumeUserGuide.html) ! ! ! ! ! ! ! ▪ Currently available versions are 0.9.x and 1.x ▪ I want to focus on Flume use cases in manufacturing. @Rapheephan 2
  3. 3. Applying Flume to Manufacturing Process ▪ In the factory, there are many machines used in the production. ! ! ! ! ! ▪ If a machine produces 1 log data file when 1 lot of product finishes processing. In one day, there will be a big amount of log data stored in the server. ▪ For the quality control and the production control improvement, analysing these log files in a real time is our objective. ▪ First, we need to collect these log data files from the production lines into the HDFS, then pass them through the analysis process. @Rapheephan 3
  4. 4. Multi-agent flow image in the production system AGENT 1 consolidation AGENT 2 AGENT 4 HDFS AGENT 3 @Rapheephan 4
  5. 5. My Sample agent 1 CHANNEL SOURCE SINK HDFS ▪ My system ▪ Java Runtime Environment (Java1.6.0_31) ▪ Cloudera's Distribution Including Apache Hadoop (CDH4.3) ▪ Working steps 1. Install Apache Flume on the Host machine (Flume installation guide for CDH4: http://www.cloudera.com/content/cloudera-content/ cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_12.html) 2. Create 2 log generation java applications for machine1 and machine2 3. Configure Flume agent 4. Start Flume agent and test the system @Rapheephan 5
  6. 6. Prepare the log generation application ▪ Create 2 Virtual Machines for generating machine1’s and machine2’s log data. ▪ Create a simple socket java program to producing log events to agent’s source with specific port (11111) ! ! ! ! ! ! ! ▪ Export it as an executable JAR file, and move it to the virtual machine1 ▪ Copy and move the other to the virtual machine2 @Rapheephan 6
  7. 7. Configuration Flume-ng agent on Host ▪ We have to configure all sink, channel, and source in the flow. My agent name is hdfs-agent ▪ First, name the components in the agent. ! hdfs-agent.sources = log-collect = memoryChannel ! hdfs-agent.channelshdfs-write hdfs-agent.sinks = ! ▪ Next, define the source’s properties as follow ! hdfs-agent.sources.log-collect.type = netcat ! hdfs-agent.sources.log-collect.bind = 133.196.211.209 hdfs-agent.sources.log-collect.port = 11111 !! hdfs-agent.sources.log-collect.channels = memoryChannel ! ▪ My source type is netcat-like source, that listens on the port ‘11111’ ▪ Don’t forget to define the channel used by the source. @Rapheephan 7
  8. 8. Configuration Flume-ng agent on Host (2) ▪ We want to collect the log data and write to the ‘testFlume’ directory on the HDFS cluster. Therefore, the sink should be defined as follow. ! hdfs-agent.sinks.hdfs-write.type = hdfs ! hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://<namenode>/user/ <myusername>/testflume = Text ! hdfs-agent.sinks.hdfs-write.hdfs.writeFormatDataStream hdfs-agent.sinks.hdfs-write.hdfs.fileType = !! hdfs-agent.sinks.hdfs-write.channel = memoryChannel ! ▪ Don’t forget to specify the channel used by the sink. ▪ Finally, configure the channel ! hdfs-agent.channels.memoryChannel.type = memory hdfs-agent.channels.memoryChannel.capacity = 1000 ! ▪ The channel will store the log data in-memory with the maximum 1000 events. @Rapheephan 8
  9. 9. Start the Flume agent and get result ▪ My configuration file name is ‘flume.conf’, and my agent name is ‘hdfs-agent’. ▪ Start the Flume agent using the following command. $! flume-ng agent --conf-file flume.conf --name hdfs-agent ▪ Execute the genLog.jar on both machines ▪ On Flume master, you will be able to see something like this ! ! ! ! 13/12/17 14:36:13 INFO hdfs.BucketWriter: Creating hdfs://<namenode>: 8020/user/<my userid>/testflume/FlumeData.1387258572230.tmp 13/12/17 14:36:19 INFO hdfs.BucketWriter: Renaming hdfs://cmccldULL6400.toshiba.co.jp:8020/user/g0092010/testflume/FlumeData. 1387258572230.tmp to hdfs://<namenode>:8020/user/<my userid>/testflume/ FlumeData.1387258572230 ▪ Verify that the log data has stored as events on the HDFS g0092010@cmc-cldULL6400:~$ hadoop fs -cat 2013-12-17 14:32:19: This is a sample log 2013-12-17 14:32:24: This is a sample log 2013-12-17 14:32:27: This is a sample log 2013-12-17 14:32:29: This is a sample log @Rapheephan testflume/*30 file from machine file from machine file from machine file from machine 1. 1. 2. 1. 9
  10. 10. Next steps ▪ Analyse the log data and Visualise it in the (near) real time. ! ! ! ! ! ! ! ! ! AGENT 1 MapReduce Hive AGENT 2 AGENT 4 Mahout Visualisation tools HDFS Impala AGENT 3 ▪ Improving throughputs of the system. ▪ Analysing and Predicting the future trend. ▪ etc. @Rapheephan 10

×