SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Hadoop and
                Subsystems
                     in
                 livedoor
                Hadoop Conference Japan
                        2011 Fall
                      2011/09/26
                       tagomoris
2011   9   26
2011   9   26
we are hiring!



2011   9   26
what's livedoor?



2011   9   26
2011   9   26
large scale web services

                     2800+ servers
                      3200+ hosts

                   530+ web servers


2011   9   26
20 Aug 2009




2011   9   26
Aug 2011


                15Gbps
                (10Gbps + CDN 5Gbps)



2011   9   26
Hadoop in livedoor
                • 10 nodes (1+9)
                 • 36 core, 32TB HDFS
                • CDH3b2
                 • with libhdfs, fuse-hdfs
                 • Hive 0.6.0 (community package)




2011   9   26
Hadoop in livedoor
                   data mining
                     reporting
                  page views, unique users,
                  traffic amount per page,
                             ...
2011   9   26
super large scale
                 'sed | grep | wc'
                         with
                Hadoop Streaming + Hive


2011   9   26
httpd logs

                from 96 servers
                   (apache / nginx)




                580GB/day (raw)

2011   9   26
overview



                           hourly
                            daily     on
                  hourly
                                    demand

2011   9   26
topics

                •log delivery network with scribe
                 •and 'scribeline'
                •hive client web application 'shib'


2011   9   26
overview



                           hourly
                            daily     on
                  hourly
                                    demand

2011   9   26
scribe
                       log delivery daemon
                          based on Thrift
                         scalable, reliable
                          supports HDFS

                https://github.com/facebook/scribe



2011   9   26
scribe nodes
                               scribed




                               scribed




                               scribed




2011   9   26
deliver node traffic




2011   9   26
scribe nodes
                               scribed




                               scribed




                               scribed




2011   9   26
what we want
                from scribe agent
                •easy to deploy
                •works w/o any httpd configurations
                •delivery target failover/takeback
                •lightweight (without JVM)
                •stable
2011   9   26
scribe nodes
                               scribed




                               scribed




                scribeline
                               scribed




2011   9   26
scribeline
                     log delivery agent tool
                        python 2.4, thrift

              easy to setup and start/stop
         works without any httpd configurations
            works with logrotate-ed log files
       automatic delivery target failover/takeback

            https://github.com/tagomoris/scribe_line
2011   9   26
how to setup scribeline
                     in livedoor

                       1. yum install scribeline
                (tar xzf && cd && sudo make install)

                     2. vi /etc/scribeline.conf
                       blog /var/log/httpd/access_log
                      blogimg /var/log/nginx/access_log



                   3. /etc/init.d/scribeline start

2011   9   26
scribe nodes
                               scribed




                               scribed




                               scribed




2011   9   26
overview



                           hourly
                            daily     on
                  hourly
                                    demand

2011   9   26
what we want
                 about hive client
                •easy to experiment
                 •from PC on our desks
                •result caching
                •protection against data loss
                •friendly look & feel
2011   9   26
shib
                   hive client web application
                   node.js, thrift, kyoto tycoon

                       query history browser
                query editor, based on copy&paste
                result caching & download tsv/csv
                   filter INSERT/DROP/CREATE ...

                https://github.com/tagomoris/shib
2011   9   26
2011   9   26
shib system overview




2011   9   26
what shib cannot
                     do now
                •access control
                •graph & chart
                •hive 0.7.0+ features support
                 •database, authentication and ...
                •mapreduce status notification
2011   9   26
what we are trying now

                •New cluster
                 •more nodes
                 •CDH3b2 + Hive 0.6.0 -> CDH3u1
                •New tools
                 •Hoop (instead of fuse-hdfs)
                 •Any stream processing framework
2011   9   26
thanks!



2011   9   26

Weitere ähnliche Inhalte

Was ist angesagt?

Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
Arthur Berezin
 

Was ist angesagt? (20)

CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...
 
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with ScyllaScylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
 
Benchmarking Redis by itself and versus other NoSQL databases
Benchmarking Redis by itself and versus other NoSQL databasesBenchmarking Redis by itself and versus other NoSQL databases
Benchmarking Redis by itself and versus other NoSQL databases
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, PivotalBack your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
 
Understanding Storage I/O Under Load
Understanding Storage I/O Under LoadUnderstanding Storage I/O Under Load
Understanding Storage I/O Under Load
 
Scylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and BeyondScylla Summit 2018: Scylla 3.0 and Beyond
Scylla Summit 2018: Scylla 3.0 and Beyond
 
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2018: Getting the Most Out of Scylla on KubernetesScylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
 
Persistent Storage for Containerized Applications
Persistent Storage for Containerized ApplicationsPersistent Storage for Containerized Applications
Persistent Storage for Containerized Applications
 
Ansible and CloudStack
Ansible and CloudStackAnsible and CloudStack
Ansible and CloudStack
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 

Ähnlich wie Hadoop and subsystems in livedoor #Hcj11f

Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
Deepak Garg
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
DATAVERSITY
 
Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
Damien Dallimore
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
Timothy Spann
 

Ähnlich wie Hadoop and subsystems in livedoor #Hcj11f (20)

Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
 
Manuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4octManuel Hurtado. Couchbase paradigma4oct
Manuel Hurtado. Couchbase paradigma4oct
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
Mining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through VisualizationMining Your Logs - Gaining Insight Through Visualization
Mining Your Logs - Gaining Insight Through Visualization
 
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Splunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gxSplunk as a_big_data_platform_for_developers_spring_one2gx
Splunk as a_big_data_platform_for_developers_spring_one2gx
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
 
The path to a serverless-native era with Kubernetes
The path to a serverless-native era with KubernetesThe path to a serverless-native era with Kubernetes
The path to a serverless-native era with Kubernetes
 
OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...
OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...
OpenStack cloud for ConoHa, Z.com and GMO AppsCloud in okinawa opendays 2015 ...
 
Build DynamoDB-Compatible Apps with Python
Build DynamoDB-Compatible Apps with PythonBuild DynamoDB-Compatible Apps with Python
Build DynamoDB-Compatible Apps with Python
 
Couchbase Data Pipeline
Couchbase Data PipelineCouchbase Data Pipeline
Couchbase Data Pipeline
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
dodai grizzly
dodai grizzlydodai grizzly
dodai grizzly
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 

Mehr von SATOSHI TAGOMORI

Mehr von SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
To Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT ToTo Have Own Data Analytics Platform, Or NOT To
To Have Own Data Analytics Platform, Or NOT To
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Hadoop and subsystems in livedoor #Hcj11f

  • 1. Hadoop and Subsystems in livedoor Hadoop Conference Japan 2011 Fall 2011/09/26 tagomoris 2011 9 26
  • 2. 2011 9 26
  • 5. 2011 9 26
  • 6. large scale web services 2800+ servers 3200+ hosts 530+ web servers 2011 9 26
  • 8. Aug 2011 15Gbps (10Gbps + CDN 5Gbps) 2011 9 26
  • 9. Hadoop in livedoor • 10 nodes (1+9) • 36 core, 32TB HDFS • CDH3b2 • with libhdfs, fuse-hdfs • Hive 0.6.0 (community package) 2011 9 26
  • 10. Hadoop in livedoor data mining reporting page views, unique users, traffic amount per page, ... 2011 9 26
  • 11. super large scale 'sed | grep | wc' with Hadoop Streaming + Hive 2011 9 26
  • 12. httpd logs from 96 servers (apache / nginx) 580GB/day (raw) 2011 9 26
  • 13. overview hourly daily on hourly demand 2011 9 26
  • 14. topics •log delivery network with scribe •and 'scribeline' •hive client web application 'shib' 2011 9 26
  • 15. overview hourly daily on hourly demand 2011 9 26
  • 16. scribe log delivery daemon based on Thrift scalable, reliable supports HDFS https://github.com/facebook/scribe 2011 9 26
  • 17. scribe nodes scribed scribed scribed 2011 9 26
  • 19. scribe nodes scribed scribed scribed 2011 9 26
  • 20. what we want from scribe agent •easy to deploy •works w/o any httpd configurations •delivery target failover/takeback •lightweight (without JVM) •stable 2011 9 26
  • 21. scribe nodes scribed scribed scribeline scribed 2011 9 26
  • 22. scribeline log delivery agent tool python 2.4, thrift easy to setup and start/stop works without any httpd configurations works with logrotate-ed log files automatic delivery target failover/takeback https://github.com/tagomoris/scribe_line 2011 9 26
  • 23. how to setup scribeline in livedoor 1. yum install scribeline (tar xzf && cd && sudo make install) 2. vi /etc/scribeline.conf blog /var/log/httpd/access_log blogimg /var/log/nginx/access_log 3. /etc/init.d/scribeline start 2011 9 26
  • 24. scribe nodes scribed scribed scribed 2011 9 26
  • 25. overview hourly daily on hourly demand 2011 9 26
  • 26. what we want about hive client •easy to experiment •from PC on our desks •result caching •protection against data loss •friendly look & feel 2011 9 26
  • 27. shib hive client web application node.js, thrift, kyoto tycoon query history browser query editor, based on copy&paste result caching & download tsv/csv filter INSERT/DROP/CREATE ... https://github.com/tagomoris/shib 2011 9 26
  • 28. 2011 9 26
  • 30. what shib cannot do now •access control •graph & chart •hive 0.7.0+ features support •database, authentication and ... •mapreduce status notification 2011 9 26
  • 31. what we are trying now •New cluster •more nodes •CDH3b2 + Hive 0.6.0 -> CDH3u1 •New tools •Hoop (instead of fuse-hdfs) •Any stream processing framework 2011 9 26
  • 32. thanks! 2011 9 26