SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Leveraging Hadoop Cluster for Carrier grade application




                             Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012
No Personalization


Service
discovery




                      Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   2
   600- 800 GB of CDR per day
                    ◦   GPRS Signaling 50GB/day
                    ◦   3G Signaling 300GB/day
                    ◦   Voice 100GB/day
                    ◦   SMS 200GB/day
                   100 - 200 GB/day of Web Data



Mammoth Data
                                         Data Analysis




               Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   3
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   4
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   5
   Framework for distributed processing of large data sets
    across clusters
   Consists of
    ◦ Hadoop Distributed File System aka HDFS (File system)
    ◦ Hadoop MapReduce (programming model )
   Characteristics
    ◦ Performance shall scale linearly
    ◦ Compute should move to data
    ◦ Simple core, Modular and Extensible



                                    Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   6
   Current Bottleneck

    ◦ Data resides in multiple nodes/zones/VM instance & no elegant,
      reliable and efficient way of extracting data

    ◦ Loading terabytes of data into database is slow

    ◦ Parallel computing not a possibility in Conventional BI ETL

    ◦ User profile and application data resides in DB which can scale
      only vertically




                                    Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   7
   Structured Data



         sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as-
          sequencefile



   Un Structured Data




                                        Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   8
   A Distributed data Collection server
    ◦   Scalable
    ◦   Configurable
    ◦   Extensible
    ◦   Manageable


   Built around the concept of flows
    ◦ A single flow corresponds to a type of data source
    ◦ Supports compression, batching & reliability setups per flow


   Data come in through a source
    ◦ Optionally processed by one or more decorators
    ◦ And transmitted out via sink




                                    Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   9
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   10
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   11
   Map Reduce is very powerful, but:
    ◦ It requires a Java programmer
    ◦ User has to re-invent common
    ◦ functionality (join, filter, etc.)

   Execution engine atop Hadoop

   Pig provides a higher level language Pig Latin

   Opens the system to non-Java programmers

   Provides common operations like join, group, filter, sort




                                       Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   12
   Web log processing.
   Data processing for web search platforms.
   Ad hoc queries across large data sets.
   Rapid prototyping of algorithms for processing large data
    sets.
   Pig runs on local machine and job gets executed in hadoop
    cluster
       $ cd /usr/share/cloudera/pig/
       $ bin/pig –x local
       grunt>
           Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);
           grpd = GROUP log BY user;
           cntd = FOREACH grpd GENERATE group, COUNT(log);
           STORE cntd INTO ‘output’;




                                        Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   13
   System for querying and managing structured data
   Built on top of hadoop
   Uses map reduce for execution
   SQL like syntax; supports
    ◦   From clause subquery
    ◦   ANSO Join (equi join )
    ◦   Multi-table insert
    ◦   Multi group-by
    ◦   Sampling
    ◦   Object traversal
   Engagement
    ◦ Summarization
    ◦ Ad hoc analysis
    ◦ Spam detection



                                 Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   14
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   15
Feature                          Hive                              Pig
Language                         SQL-like                          PigLatin
Schemas/Types                    Yes (explicit)                    Yes (implicit)
Partitions                       Yes                               No
Server                           Optional(thirft)                  No
User Defined Functions           Yes                               Yes
Custom Serializer/Deserializer   Yes                               Yes
DFS Direct Access                Yes (implicit)                    Yes (explicit)
Join/Order/Sort                  Yes                               Yes
Shell                            Yes                               Yes
Streaming                        Yes                               No
Web Interface                    Yes                               No
JDBC/ODBC                        Yes (limited)                     No




                                       Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   16
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   17
Copyright © 2011 Flytxt B.V. All rights reserved.   1/17/2012   18

Weitere ähnliche Inhalte

Ähnlich wie Hadoop for carrier

Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and HadoopFlytxt
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopFlytxt
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveClaudioFahey1
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
An Introduction to Spring Data
An Introduction to Spring DataAn Introduction to Spring Data
An Introduction to Spring DataOliver Gierke
 
GlassFish in Production Environments
GlassFish in Production EnvironmentsGlassFish in Production Environments
GlassFish in Production EnvironmentsBruno Borges
 
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...Severalnines
 
Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013Connor McDonald
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Benoit Hudzia
 
HTML5 WebSocket Introduction
HTML5 WebSocket IntroductionHTML5 WebSocket Introduction
HTML5 WebSocket IntroductionMarcelo Jabali
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireJohn Blum
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearningLigong Duan
 
Java EE 7 - Embracing the Cloud and HTML 5
Java EE 7 - Embracing the Cloud and HTML 5Java EE 7 - Embracing the Cloud and HTML 5
Java EE 7 - Embracing the Cloud and HTML 5Amit Naik
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutesdwmclary
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...Neo4j
 

Ähnlich wie Hadoop for carrier (20)

Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and Hadoop
 
Hadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep DiveHadoop Analytics on Isilon Deep Dive
Hadoop Analytics on Isilon Deep Dive
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
An Introduction to Spring Data
An Introduction to Spring DataAn Introduction to Spring Data
An Introduction to Spring Data
 
GlassFish in Production Environments
GlassFish in Production EnvironmentsGlassFish in Production Environments
GlassFish in Production Environments
 
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
Slides: Introducing the new ClusterControl 1.2.10 for MySQL, MongoDB and Post...
 
Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013Tom Kyte and and Cary Milsap - 2013
Tom Kyte and and Cary Milsap - 2013
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
HTML5 WebSocket Introduction
HTML5 WebSocket IntroductionHTML5 WebSocket Introduction
HTML5 WebSocket Introduction
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearning
 
Flume intro-100717
Flume intro-100717Flume intro-100717
Flume intro-100717
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
Java EE 7 - Embracing the Cloud and HTML 5
Java EE 7 - Embracing the Cloud and HTML 5Java EE 7 - Embracing the Cloud and HTML 5
Java EE 7 - Embracing the Cloud and HTML 5
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
 

Mehr von Flytxt

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochureFlytxt
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraFlytxt
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experienceFlytxt
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageFlytxt
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer AnalyticsFlytxt
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochureFlytxt
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Flytxt
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingFlytxt
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Flytxt
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experienceFlytxt
 

Mehr von Flytxt (12)

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochure
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital era
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experience
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital age
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing Engagement
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analytics
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochure
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]
 
Warid uganda big data experience
Warid uganda   big data experienceWarid uganda   big data experience
Warid uganda big data experience
 

Kürzlich hochgeladen

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Hadoop for carrier

  • 1. Leveraging Hadoop Cluster for Carrier grade application Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012
  • 2. No Personalization Service discovery Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 2
  • 3. 600- 800 GB of CDR per day ◦ GPRS Signaling 50GB/day ◦ 3G Signaling 300GB/day ◦ Voice 100GB/day ◦ SMS 200GB/day  100 - 200 GB/day of Web Data Mammoth Data Data Analysis Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 3
  • 4. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 4
  • 5. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 5
  • 6. Framework for distributed processing of large data sets across clusters  Consists of ◦ Hadoop Distributed File System aka HDFS (File system) ◦ Hadoop MapReduce (programming model )  Characteristics ◦ Performance shall scale linearly ◦ Compute should move to data ◦ Simple core, Modular and Extensible Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 6
  • 7. Current Bottleneck ◦ Data resides in multiple nodes/zones/VM instance & no elegant, reliable and efficient way of extracting data ◦ Loading terabytes of data into database is slow ◦ Parallel computing not a possibility in Conventional BI ETL ◦ User profile and application data resides in DB which can scale only vertically Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 7
  • 8. Structured Data  sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as- sequencefile  Un Structured Data Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 8
  • 9. A Distributed data Collection server ◦ Scalable ◦ Configurable ◦ Extensible ◦ Manageable  Built around the concept of flows ◦ A single flow corresponds to a type of data source ◦ Supports compression, batching & reliability setups per flow  Data come in through a source ◦ Optionally processed by one or more decorators ◦ And transmitted out via sink Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 9
  • 10. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 10
  • 11. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 11
  • 12. Map Reduce is very powerful, but: ◦ It requires a Java programmer ◦ User has to re-invent common ◦ functionality (join, filter, etc.)  Execution engine atop Hadoop  Pig provides a higher level language Pig Latin  Opens the system to non-Java programmers  Provides common operations like join, group, filter, sort Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 12
  • 13. Web log processing.  Data processing for web search platforms.  Ad hoc queries across large data sets.  Rapid prototyping of algorithms for processing large data sets.  Pig runs on local machine and job gets executed in hadoop cluster  $ cd /usr/share/cloudera/pig/  $ bin/pig –x local  grunt>  Log = LOAD ‘excite-small.log’ AS (user, timestamp, query);  grpd = GROUP log BY user;  cntd = FOREACH grpd GENERATE group, COUNT(log);  STORE cntd INTO ‘output’; Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 13
  • 14. System for querying and managing structured data  Built on top of hadoop  Uses map reduce for execution  SQL like syntax; supports ◦ From clause subquery ◦ ANSO Join (equi join ) ◦ Multi-table insert ◦ Multi group-by ◦ Sampling ◦ Object traversal  Engagement ◦ Summarization ◦ Ad hoc analysis ◦ Spam detection Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 14
  • 15. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 15
  • 16. Feature Hive Pig Language SQL-like PigLatin Schemas/Types Yes (explicit) Yes (implicit) Partitions Yes No Server Optional(thirft) No User Defined Functions Yes Yes Custom Serializer/Deserializer Yes Yes DFS Direct Access Yes (implicit) Yes (explicit) Join/Order/Sort Yes Yes Shell Yes Yes Streaming Yes No Web Interface Yes No JDBC/ODBC Yes (limited) No Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 16
  • 17. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 17
  • 18. Copyright © 2011 Flytxt B.V. All rights reserved. 1/17/2012 18