SlideShare a Scribd company logo
1 of 12
Analyzing Logs/
Configs of 200'000
Systems with
Hadoop
christoph.schnidrig@netapp.com




                                 1
What is AutoSupport?


¡  AutoSupport is NetApp's 'phone home'
    mechanism

¡  Collection of
  –    Logfiles
  –    XML files
  –    Command output capture
  –    Counter Manager output


                                           2
Business Challenges




   Gateways                ETL               Data Warehouse                        Reporting
•  600K ASUPs        •  Data needs to   •  Only 5% of data goes into the   •  Numerous mining
   every week           be parsed and      data warehouse                     requests are not satisfied
                        loaded in 15    •  Oracle DBMS struggling to          currently
•  40% coming over
   the weekend          mins               scale, maintenance and          •  Huge untapped potential
                                           backups challenging                of valuable information for
•  2TB growth over
                                        •  No easy way to access this         lead generation,
   week
                                           unstructured content               supportability, and BI




        Finally, the incoming load doubles every 16 months!
                                                                                                       4
Hadoop Architecture




                      7
Solution Architecture




                        8
Client Apps – how the customer sees it




                                         11
Physical Architecture
                                                                                                                                                                            FAS	
  2040
                                                                                                                                                                                                         FAS2040                     A
                                                                                                                                                                                                                                         1	
  GB	
  Ethernet
                                                                                                                                                                                                                                     B




12 data nodes : 12 cores , 48 GB RAM each
3 E-series storage arrays (~600TB)                                                                                                                                                                                                             Secondary	
  
                                                                                                                    Job	
  Tracker                                                                     Name	
  Node                            Name	
  Node
                                                                                                                                             10	
  GB/s	
  Ethernet




                                                              2          4   2        4     2       4   2       4                                            2          4   2        4     2       4   2       4                                                  2          4   2        4     2       4   2       4
                                            Port 1   Port 2        8              8             8           8          Lnk    Lnk          Port 1   Port 2        8              8             8           8       Lnk    Lnk                   Port 1   Port 2        8              8             8           8       Lnk    Lnk




                                                                  Ch 1           Ch 2 FCHost Ch 3       Ch 4                   Drive                             Ch 1           Ch 2 FCHost Ch 3       Ch 4                Drive                                      Ch 1           Ch 2 FCHost Ch 3       Ch 4                Drive
                                                                                                                             Expansion                                                                                   Expansion                                                                                            Expansion
                                                                                                ID/Diag                                                                                        ID/Diag                                                                                              ID/Diag




                                                                                                                                         E	
  2600	
  Storage	
  Array
Some performance numbers

Metrics                           Hadoop

Raw ASUP ingest                   1000 ASUPs/min
Throughput                        or 1.5 GB/min

ASUP Configuration data parse &   1000 ASUP/min
Load

Event messages (EMS) Process &    < 1 Hour for 2 Billion records
Load                              ~= > 200 GB/Hour

EMS Ad-hoc analysis               4-6M records/sec ~=
                                  200 MB/sec on compressed
                                  (LZO) data



                                                                   14
                                                                    14
New possibilities with Hadoop

                ¡  Correlate disk latency (hot) with
                    disk type
                  –  24 billion records
                  –  4 weeks to run query
                  –  Hadoop implementation 10.5 hours
                ¡  Bug detection through pattern
                    matching
                  –  240 billion records – Too large to
                     run
                  –  Hadoop implementation 18 hours




                                                          15
Incoming AutoSupport Volumes
and TB Consumption
                              Flat-File Storage Requirement
3500
3000
                                Total Usage (tb)
2500
2000                            Projected Total Usage (tb)
1500                            Doubles
1000
500
  0
  Jan-05   Jan-06   Jan-07   Jan-08   Jan-09   Jan-10   Jan-11   Jan-12   Jan-13   Jan-14   Jan-15   Jan-16


¡  At projected current rate of growth,
    total storage requirements continue
    doubling every 16 months
¡  Cost Model:
    > $15M per year Ecosystem costs


                                                                                                        16
References
¡  NetApp Accelerates AutoSupport Analytics with
    NetApp Open Solution for Hadoop
    http://media.netapp.com/documents/asup-hadoop.pdf

¡  NetApp Open Solution for Hadoop Solutions Guide
    http://media.netapp.com/documents/tr-3969.pdf

¡  ESG: Lab Validation Report
    http://media.netapp.com/documents/ar-esg-netapp-
    open-solution.pdf
18

More Related Content

Viewers also liked

Forex e book-easy-forex
Forex e book-easy-forexForex e book-easy-forex
Forex e book-easy-forex
India Rocks
 
Ecoplast Fiber presentation
Ecoplast Fiber presentationEcoplast Fiber presentation
Ecoplast Fiber presentation
Yana Cholakova
 
Apuntes bm 1
Apuntes bm 1Apuntes bm 1
Apuntes bm 1
Sierras89
 
Boletín domingo 2 de marzo
Boletín domingo 2 de marzo Boletín domingo 2 de marzo
Boletín domingo 2 de marzo
europeanecc2014
 

Viewers also liked (19)

Entrevista a Cortázar por Sara Castro
Entrevista a Cortázar por Sara CastroEntrevista a Cortázar por Sara Castro
Entrevista a Cortázar por Sara Castro
 
Rev julio sept 2005
Rev julio   sept 2005Rev julio   sept 2005
Rev julio sept 2005
 
Swiss Culinary Cup 2016 - Regolamento in italiano
Swiss Culinary Cup 2016 - Regolamento in italianoSwiss Culinary Cup 2016 - Regolamento in italiano
Swiss Culinary Cup 2016 - Regolamento in italiano
 
Sistemas..
Sistemas..Sistemas..
Sistemas..
 
Diploma titles 2016 17
Diploma titles 2016 17Diploma titles 2016 17
Diploma titles 2016 17
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
 
Forex e book-easy-forex
Forex e book-easy-forexForex e book-easy-forex
Forex e book-easy-forex
 
Ecoplast Fiber presentation
Ecoplast Fiber presentationEcoplast Fiber presentation
Ecoplast Fiber presentation
 
Make the Most of Hosted Unified Communications
Make the Most of Hosted Unified CommunicationsMake the Most of Hosted Unified Communications
Make the Most of Hosted Unified Communications
 
Ley Gener[1]
Ley Gener[1]Ley Gener[1]
Ley Gener[1]
 
Common Carbon Metric in Buildings in Putrajaya
Common Carbon Metric in Buildings in PutrajayaCommon Carbon Metric in Buildings in Putrajaya
Common Carbon Metric in Buildings in Putrajaya
 
The New Design Workflow
The New Design WorkflowThe New Design Workflow
The New Design Workflow
 
Apuntes bm 1
Apuntes bm 1Apuntes bm 1
Apuntes bm 1
 
Boletín domingo 2 de marzo
Boletín domingo 2 de marzo Boletín domingo 2 de marzo
Boletín domingo 2 de marzo
 
Sl boston 05_12_15_ener_noc_final_public
Sl boston 05_12_15_ener_noc_final_publicSl boston 05_12_15_ener_noc_final_public
Sl boston 05_12_15_ener_noc_final_public
 
Creating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
Creating Enchantment with Referring Physicians - Cleveland Clinic - GelbCreating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
Creating Enchantment with Referring Physicians - Cleveland Clinic - Gelb
 
Laser Hair Removal
Laser Hair RemovalLaser Hair Removal
Laser Hair Removal
 
Biodiversity of karnataka at a glan
Biodiversity of karnataka at a glanBiodiversity of karnataka at a glan
Biodiversity of karnataka at a glan
 
Corpus iuris civilis
Corpus iuris civilisCorpus iuris civilis
Corpus iuris civilis
 

Similar to 16.07.12 Analyzing Logs/Configs of 200'000 Systems with Hadoop (Christoph Schnidrig, NetApp)

Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10
mbasford
 
Shmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxy
Shannon McFarland
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
Heiko Joerg Schick
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentation
xKinAnx
 
Sun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationSun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentation
xKinAnx
 
Jaguar x86 Core Functional Verification
Jaguar x86 Core Functional VerificationJaguar x86 Core Functional Verification
Jaguar x86 Core Functional Verification
DVClub
 
Hds brcd solutions_tech_summit
Hds brcd solutions_tech_summitHds brcd solutions_tech_summit
Hds brcd solutions_tech_summit
Steve Lee
 

Similar to 16.07.12 Analyzing Logs/Configs of 200'000 Systems with Hadoop (Christoph Schnidrig, NetApp) (20)

Netgear ReadyNAS Comparison
Netgear ReadyNAS ComparisonNetgear ReadyNAS Comparison
Netgear ReadyNAS Comparison
 
Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10Bobcat hotchips final 8 2 10
Bobcat hotchips final 8 2 10
 
Shmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxyShmcfarl slb66-slb64-nat64-proxy
Shmcfarl slb66-slb64-nat64-proxy
 
Castoro / RubyKaigi2010
Castoro / RubyKaigi2010Castoro / RubyKaigi2010
Castoro / RubyKaigi2010
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
16Gb Fibre Channel Deployment Guide
16Gb Fibre Channel Deployment Guide16Gb Fibre Channel Deployment Guide
16Gb Fibre Channel Deployment Guide
 
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
QPACE QCD Parallel Computing on the Cell Broadband Engine™ (Cell/B.E.)
 
Sun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentationSun storage tek 6140 technical presentation
Sun storage tek 6140 technical presentation
 
NetApp Product training
NetApp Product trainingNetApp Product training
NetApp Product training
 
SDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptxSDC20 ScaleFlux.pptx
SDC20 ScaleFlux.pptx
 
Sun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentationSun storage tek 6140 customer presentation
Sun storage tek 6140 customer presentation
 
A comparison of segment routing data-plane encodings
A comparison of segment routing data-plane encodingsA comparison of segment routing data-plane encodings
A comparison of segment routing data-plane encodings
 
Jaguar x86 Core Functional Verification
Jaguar x86 Core Functional VerificationJaguar x86 Core Functional Verification
Jaguar x86 Core Functional Verification
 
Operational Issues inIPv6 --from vendors' point of view--
Operational Issues inIPv6 --from vendors' point of view--Operational Issues inIPv6 --from vendors' point of view--
Operational Issues inIPv6 --from vendors' point of view--
 
A comparison of Segment Routing Data-Plane encodings
A comparison of Segment Routing Data-Plane encodingsA comparison of Segment Routing Data-Plane encodings
A comparison of Segment Routing Data-Plane encodings
 
NetApp FAS2200 Series Portfolio
NetApp FAS2200 Series PortfolioNetApp FAS2200 Series Portfolio
NetApp FAS2200 Series Portfolio
 
QsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsQsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale Systems
 
Hds brcd solutions_tech_summit
Hds brcd solutions_tech_summitHds brcd solutions_tech_summit
Hds brcd solutions_tech_summit
 
Ip Networking Over Satelite Course Sampler
Ip Networking Over Satelite Course SamplerIp Networking Over Satelite Course Sampler
Ip Networking Over Satelite Course Sampler
 

More from Swiss Big Data User Group

Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
Swiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
Swiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
Swiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
Swiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

16.07.12 Analyzing Logs/Configs of 200'000 Systems with Hadoop (Christoph Schnidrig, NetApp)

  • 1. Analyzing Logs/ Configs of 200'000 Systems with Hadoop christoph.schnidrig@netapp.com 1
  • 2. What is AutoSupport? ¡  AutoSupport is NetApp's 'phone home' mechanism ¡  Collection of –  Logfiles –  XML files –  Command output capture –  Counter Manager output 2
  • 3. Business Challenges Gateways ETL Data Warehouse Reporting •  600K ASUPs •  Data needs to •  Only 5% of data goes into the •  Numerous mining every week be parsed and data warehouse requests are not satisfied loaded in 15 •  Oracle DBMS struggling to currently •  40% coming over the weekend mins scale, maintenance and •  Huge untapped potential backups challenging of valuable information for •  2TB growth over •  No easy way to access this lead generation, week unstructured content supportability, and BI Finally, the incoming load doubles every 16 months! 4
  • 6. Client Apps – how the customer sees it 11
  • 7. Physical Architecture FAS  2040 FAS2040 A 1  GB  Ethernet B 12 data nodes : 12 cores , 48 GB RAM each 3 E-series storage arrays (~600TB) Secondary   Job  Tracker Name  Node Name  Node 10  GB/s  Ethernet 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 Port 1 Port 2 8 8 8 8 Lnk Lnk Port 1 Port 2 8 8 8 8 Lnk Lnk Port 1 Port 2 8 8 8 8 Lnk Lnk Ch 1 Ch 2 FCHost Ch 3 Ch 4 Drive Ch 1 Ch 2 FCHost Ch 3 Ch 4 Drive Ch 1 Ch 2 FCHost Ch 3 Ch 4 Drive Expansion Expansion Expansion ID/Diag ID/Diag ID/Diag E  2600  Storage  Array
  • 8. Some performance numbers Metrics Hadoop Raw ASUP ingest 1000 ASUPs/min Throughput or 1.5 GB/min ASUP Configuration data parse & 1000 ASUP/min Load Event messages (EMS) Process & < 1 Hour for 2 Billion records Load ~= > 200 GB/Hour EMS Ad-hoc analysis 4-6M records/sec ~= 200 MB/sec on compressed (LZO) data 14 14
  • 9. New possibilities with Hadoop ¡  Correlate disk latency (hot) with disk type –  24 billion records –  4 weeks to run query –  Hadoop implementation 10.5 hours ¡  Bug detection through pattern matching –  240 billion records – Too large to run –  Hadoop implementation 18 hours 15
  • 10. Incoming AutoSupport Volumes and TB Consumption Flat-File Storage Requirement 3500 3000 Total Usage (tb) 2500 2000 Projected Total Usage (tb) 1500 Doubles 1000 500 0 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16 ¡  At projected current rate of growth, total storage requirements continue doubling every 16 months ¡  Cost Model: > $15M per year Ecosystem costs 16
  • 11. References ¡  NetApp Accelerates AutoSupport Analytics with NetApp Open Solution for Hadoop http://media.netapp.com/documents/asup-hadoop.pdf ¡  NetApp Open Solution for Hadoop Solutions Guide http://media.netapp.com/documents/tr-3969.pdf ¡  ESG: Lab Validation Report http://media.netapp.com/documents/ar-esg-netapp- open-solution.pdf
  • 12. 18