SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Hooking up Flume with HBase
      LA-HUG Aug’11

        -Dani Abel Rayan
Who am I ?
•   Big Data Ninja at Riot Games
•   Flume Contributor
•   Cloudera Intern Alum
•   Graduated with Masters CS
    from Georgia Tech.
What am I presenting here ?
•   Flume event model
•   HBase data model
•   Compelling reasons to hook ‘em up
•   Configuration examples
•   What are the new upcoming Sinks ?
•   How to write new Flume-Sink.
What is needed before we start ..
• Understanding of Flume’s architecture
• Usage of Flume’s abstractions such as
  Plugins, Events, Sources, Sinks, Escape Sequences
  and Decorators*
• Understanding of HBase and Hadoop
• Regex
• That’s it!
*http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
A Quick Glance …
Flume Event Model
• A Flume event has these six main fields: Unix
  timestamp, Nanosecond timestamp, Priority,
  Source host, Body and a Metadata table with
  an arbitrary number of attribute value pairs.
• The body is the raw log entry body. The
  default is to truncate the body to a maximum
  of 32KB per event. This is a configurable.
• One can custom bucket attributes with help of
  escape sequences.
HBase Data Model
What is a Flume Sink ?
Reasons For HBase Sink
• Near Real-Time aggregation of Streaming Data
• Low Latency access to the aggregated data
• Offline Big Data Analytics
Types of Flume HBase Sink
1. hbase(): Highly expressive
hbase("table", "rowkey", "cf1", "c1", "val1"[,"cf2", "c2", "val2", ....] {,
writeBufferSize=int, writeToWal=true|false})


2. attr2hbase(): Flexible and powerful semantics
but could be confusing (at first glance)
attr2hbase("table"[,"sysFamily"[,"writeBody"[,"attrPrefix"[,"writeBufferSize"
[,"writeToWal"]]]]])
How to Use a Plugin ?
• Compile. Add the jar with the new plugin
  classes to flume’s classpath.
• In flume-site.xml, add the class names of the
  new sources, sinks, and/or decorators to the
  flume.plugin.classes property
• Restart the Flume nodes (Including Master)
• Verify that your plugin is loaded is to check if
  it is displayed on this page http://flume-
  master:35871/masterext.jsp
hbase()
Source: tail(“/proc/vmstat/”)

nr_free_pages 594693
nr_inactive_anon 1392
nr_active_anon 45259
nr_inactive_file 107132
nr_active_file 141458


Sink:
regexAll(“w+)s+(w+)”,”colname”,”value")
                            Flume Events

        timestamp                 24353457
                                  24353456
                                  24353455
        colname                   nr_active_anon
                                  nr_inactive_anon
                                  nr_free_pages
        value                     45259
                                  1392
                                  594693
hbase()
• hbase("tablename", ”%s", ”stats", ”%{colname}", ”%{value}")
use %{nanos} instead of %s if you want nano-second timestamp



  Rowkey    Timestamp    Column Family: stats

  24353455 T1            nr_free_pages = 594693

  24353456 T2             nr_inactive_anon = 1392

  24353457 T3            nr_active_anon = 45259
hbase()
• Thus the FDL syntax would be:

• node: tail(”/proc/vmstat") |
regexAll("(w+)s+(w+)", ”colname", ”value")
collector(300000) { hbase("table", ”%s", ”stats",
”%{colname}", "%{value}") }
Demo
attr2hbase()
• Don’t have to list all possible event attributes
  you want to store in HBase along with their
  destination column families and qualifiers

• Source and/or decorators can produce any
  (reasonable) number of attributes, with
  dynamic names (e.g. depending on the values)
  and they will be written into HBase
attr2hbase
• attr2hbase("table"[,"sysFamily"[,"writeBody"[,
  "attrPrefix"[,"writeBufferSize"
  [,"writeToWal"]]]]])
• sysFamily holds the name of the column
  family that is used to store “system” data
  (event timestamp, host, priority).
• In case this parameter is absent or equals “”,
  the sink doesn’t write “system” data
attr2hbase
• writeBody indicates whether event body
  should be written with other “system” data.
  By default, (when this parameter is absent or
  equals ””) the attribute body is not written.
• This parameter should have the “column-
  family:qualifier” format in order for the sink to
  write the body to the specific column-
  family:qualifier.
attr2hbase
• attrPrefix defines which attributes will be written to HBase:
  every attribute with the name prefixed with attrPrefix
  parameter’s value is written. The attribute key should be in
  the following format to be properly written into HBase:
  “<attrPrefix><colfam>:<qual>”
• The default value of attrPrefix is “2hb_”. This means that all
  attributes with names “2hb_<colfam>:<qual>” should be
  written to HBase.
• Attribute with key “<attrPrefix>” must contain row key for
  Put, otherwise, if no row can be extracted, the event is
  skipped and no record is written to the HBase table.
attr2hbase example
• node: tail("/proc/vmstat”) | regexAll("(w+)s+(w+)",
  "colname","value") value("2hb_","%{colname}%s", escape=true)
   value("2hb_stat:value", "%{value}", escape=true)
  attr2hbase("table-attr2hbase","system","body:contents")]



     Rowkey             Timestamp        Column Family:
                                         stat
     pgpgin1313244007   t1               value=985543
     pgpgin1313244008   t2               value=985543
     pgpgin1313244009   t3               value=985543
Demo Time
What are the New Plugins ?
• https://cwiki.apache.org/FLUME/flume-
  plugins.html

• I pushed OpenTSDB Sink just few weeks back
How to Contribute a new Plugin ?
• Extend EventSink.Base
• Override Open() : Have your connections
  setup to the Store
• Override Append(): Every new Event gets
  processed here. Doing the “Puts” into Store
• Override Close (): Yay! Cleanup the
  connections and flushing etc. to the Store.
• Implement a SinkBuilder builder()
My Contacts
• drayan@riotgames.com
• dr@verticalengine.com
• Twitter: rayanandi

             P.S. We are Hiring!
GOOD LUCK,
 HAVE FUN!
           Play Free!
http://www.leagueoflegends.com/

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Amazon Route53へのドメイン移管
Amazon Route53へのドメイン移管Amazon Route53へのドメイン移管
Amazon Route53へのドメイン移管
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 
The tale of 100 cve's
The tale of 100 cve'sThe tale of 100 cve's
The tale of 100 cve's
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
 
Terraform 0.9 + good practices
Terraform 0.9 + good practicesTerraform 0.9 + good practices
Terraform 0.9 + good practices
 
Logstash
LogstashLogstash
Logstash
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
High Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. ThreadedHigh Performance Ruby: Evented vs. Threaded
High Performance Ruby: Evented vs. Threaded
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Observability with Consul Connect
Observability with Consul ConnectObservability with Consul Connect
Observability with Consul Connect
 
Ground Control to Nomad Job Dispatch
Ground Control to Nomad Job DispatchGround Control to Nomad Job Dispatch
Ground Control to Nomad Job Dispatch
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 

Andere mochten auch

Andere mochten auch (6)

Manual de programacion_con_robots_para_la_escuela
Manual de programacion_con_robots_para_la_escuelaManual de programacion_con_robots_para_la_escuela
Manual de programacion_con_robots_para_la_escuela
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...
 
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 

Ähnlich wie Flume HBase

Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
Hikmat Dhamee
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 

Ähnlich wie Flume HBase (20)

Migrating To PostgreSQL
Migrating To PostgreSQLMigrating To PostgreSQL
Migrating To PostgreSQL
 
Data Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache HadoopData Processing with Cascading Java API on Apache Hadoop
Data Processing with Cascading Java API on Apache Hadoop
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
 
Into The Box 2018 - CBT
Into The Box 2018 - CBTInto The Box 2018 - CBT
Into The Box 2018 - CBT
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
 
Advanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutesAdvanced technic for OS upgrading in 3 minutes
Advanced technic for OS upgrading in 3 minutes
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Writing Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & AkkaWriting Asynchronous Programs with Scala & Akka
Writing Asynchronous Programs with Scala & Akka
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Flume HBase

  • 1. Hooking up Flume with HBase LA-HUG Aug’11 -Dani Abel Rayan
  • 2. Who am I ? • Big Data Ninja at Riot Games • Flume Contributor • Cloudera Intern Alum • Graduated with Masters CS from Georgia Tech.
  • 3. What am I presenting here ? • Flume event model • HBase data model • Compelling reasons to hook ‘em up • Configuration examples • What are the new upcoming Sinks ? • How to write new Flume-Sink.
  • 4. What is needed before we start .. • Understanding of Flume’s architecture • Usage of Flume’s abstractions such as Plugins, Events, Sources, Sinks, Escape Sequences and Decorators* • Understanding of HBase and Hadoop • Regex • That’s it! *http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
  • 6. Flume Event Model • A Flume event has these six main fields: Unix timestamp, Nanosecond timestamp, Priority, Source host, Body and a Metadata table with an arbitrary number of attribute value pairs. • The body is the raw log entry body. The default is to truncate the body to a maximum of 32KB per event. This is a configurable. • One can custom bucket attributes with help of escape sequences.
  • 8. What is a Flume Sink ?
  • 9. Reasons For HBase Sink • Near Real-Time aggregation of Streaming Data • Low Latency access to the aggregated data • Offline Big Data Analytics
  • 10. Types of Flume HBase Sink 1. hbase(): Highly expressive hbase("table", "rowkey", "cf1", "c1", "val1"[,"cf2", "c2", "val2", ....] {, writeBufferSize=int, writeToWal=true|false}) 2. attr2hbase(): Flexible and powerful semantics but could be confusing (at first glance) attr2hbase("table"[,"sysFamily"[,"writeBody"[,"attrPrefix"[,"writeBufferSize" [,"writeToWal"]]]]])
  • 11. How to Use a Plugin ? • Compile. Add the jar with the new plugin classes to flume’s classpath. • In flume-site.xml, add the class names of the new sources, sinks, and/or decorators to the flume.plugin.classes property • Restart the Flume nodes (Including Master) • Verify that your plugin is loaded is to check if it is displayed on this page http://flume- master:35871/masterext.jsp
  • 12. hbase() Source: tail(“/proc/vmstat/”) nr_free_pages 594693 nr_inactive_anon 1392 nr_active_anon 45259 nr_inactive_file 107132 nr_active_file 141458 Sink: regexAll(“w+)s+(w+)”,”colname”,”value") Flume Events timestamp 24353457 24353456 24353455 colname nr_active_anon nr_inactive_anon nr_free_pages value 45259 1392 594693
  • 13. hbase() • hbase("tablename", ”%s", ”stats", ”%{colname}", ”%{value}") use %{nanos} instead of %s if you want nano-second timestamp Rowkey Timestamp Column Family: stats 24353455 T1 nr_free_pages = 594693 24353456 T2 nr_inactive_anon = 1392 24353457 T3 nr_active_anon = 45259
  • 14. hbase() • Thus the FDL syntax would be: • node: tail(”/proc/vmstat") | regexAll("(w+)s+(w+)", ”colname", ”value") collector(300000) { hbase("table", ”%s", ”stats", ”%{colname}", "%{value}") }
  • 15. Demo
  • 16. attr2hbase() • Don’t have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers • Source and/or decorators can produce any (reasonable) number of attributes, with dynamic names (e.g. depending on the values) and they will be written into HBase
  • 17. attr2hbase • attr2hbase("table"[,"sysFamily"[,"writeBody"[, "attrPrefix"[,"writeBufferSize" [,"writeToWal"]]]]]) • sysFamily holds the name of the column family that is used to store “system” data (event timestamp, host, priority). • In case this parameter is absent or equals “”, the sink doesn’t write “system” data
  • 18. attr2hbase • writeBody indicates whether event body should be written with other “system” data. By default, (when this parameter is absent or equals ””) the attribute body is not written. • This parameter should have the “column- family:qualifier” format in order for the sink to write the body to the specific column- family:qualifier.
  • 19. attr2hbase • attrPrefix defines which attributes will be written to HBase: every attribute with the name prefixed with attrPrefix parameter’s value is written. The attribute key should be in the following format to be properly written into HBase: “<attrPrefix><colfam>:<qual>” • The default value of attrPrefix is “2hb_”. This means that all attributes with names “2hb_<colfam>:<qual>” should be written to HBase. • Attribute with key “<attrPrefix>” must contain row key for Put, otherwise, if no row can be extracted, the event is skipped and no record is written to the HBase table.
  • 20. attr2hbase example • node: tail("/proc/vmstat”) | regexAll("(w+)s+(w+)", "colname","value") value("2hb_","%{colname}%s", escape=true) value("2hb_stat:value", "%{value}", escape=true) attr2hbase("table-attr2hbase","system","body:contents")] Rowkey Timestamp Column Family: stat pgpgin1313244007 t1 value=985543 pgpgin1313244008 t2 value=985543 pgpgin1313244009 t3 value=985543
  • 22. What are the New Plugins ? • https://cwiki.apache.org/FLUME/flume- plugins.html • I pushed OpenTSDB Sink just few weeks back
  • 23. How to Contribute a new Plugin ? • Extend EventSink.Base • Override Open() : Have your connections setup to the Store • Override Append(): Every new Event gets processed here. Doing the “Puts” into Store • Override Close (): Yay! Cleanup the connections and flushing etc. to the Store. • Implement a SinkBuilder builder()
  • 24. My Contacts • drayan@riotgames.com • dr@verticalengine.com • Twitter: rayanandi P.S. We are Hiring!
  • 25. GOOD LUCK, HAVE FUN! Play Free! http://www.leagueoflegends.com/

Hinweis der Redaktion

  1. ----- Meeting Notes (8/17/11 16:51) -----Good Evening GentlemenI&apos;m Dani----- Meeting Notes (8/17/11 17:01) -----Lets see how to hook up these guys Flume and HBase
  2. ----- Meeting Notes (8/17/11 16:51) -----Just a brief background Several Patches to Flume:1. Flogger2. Few things in HBase sink3. recently contributed OpenTSDB sink
  3. ----- Meeting Notes (8/17/11 16:51) -----My assumption is that folks here know what Flume does and HBase doesSo focusing on
  4. ----- Meeting Notes (8/17/11 17:08) -----If anyone haven&apos;t used Flume or HBase .. let me know.
  5. ----- Meeting Notes (8/17/11 17:08) -----I can take up more questions at end of presentation
  6. ----- Meeting Notes (8/17/11 17:11) -----Single ROWMillion Column names
  7. ----- Meeting Notes (8/17/11 17:11) -----Check out Flume User Guide
  8. ----- Meeting Notes (8/17/11 17:14) -----HBase is integrated with Hive and MR
  9. ----- Meeting Notes (8/17/11 17:14) -----Those who haven&apos;t used: Just think about it as &quot;which of the overloaded functions&quot; Flume has to use.You can change the parameters at run time.
  10. ----- Meeting Notes (8/17/11 17:15) -----In daemon mode - flume-env.sh
  11. Just put LAHUG in subject line
  12. WE WOULD LOVE TO HOST NEXT HADOOP MEETUPOpenTSDB …. It goes a step further and gives you awesome graphs …… for your data.