SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Wrangling Customer Usage
Data with Hadoop
Clearwire – Thursday, June 27th
Carmen Hall – IT Director
Mathew Johnson – Sr. IT Manager
Starting With…
• …a little ingenuITy!
ingenuITy Day @ Clearwire
• Opportunity for everyone in IT to innovate and present
new and even crazy ideas
• One of those crazy ideas was from Roger Hosto
• Roger had the solution for Clearwire’s Big Data
problem: Hadoop
But Wait!
• Now we had a solution for Big Data
• We needed a Big Data opportunity
• We had just the thing…
The Perfect Problem
• Customer Usage Data – our commodity to Wholesale
partners
Totally (un)Wired
• Americans used more than 1,304 petabytes of
wireless data in 2012 - an increase of 69.3% over the
previous 12 months' usage (827 TB)
• Clearwire processes over 3B individual usage detail
records each month
Shifting Landscape
• The U.S. wireless industry is a $195.5 billion
enterprise - larger than publishing, agriculture, hotels
and lodging, air transportation and movies – just to
name a few
• Prepaid/Pay-As-You-Go services' share of overall
market penetration is 23.4% driving higher exposure of
lost revenue if usage delivery is delayed.
• In some cases, a customer can consume data faster
than we can bill for it
Anatomy Of Latency - Legacy
IT Usage
Processing
ASN GW PTS SPB Wholesale
Partners
Internet
AAA
OSS SDU
1 Hour Up to 90 Minutes
Let’s Talk Numbers
• Assume a 2GB plan
• An HD movie from Netflix consumes 2+ GB per hour
• Assume wholesale price = $6/GB
• Assume the retail price for a GB of data (as top up or
overage) ranges from $20 – $100
As if that wasn’t enough -
• Clearwire was locked into a very expensive vendor
contract which handled both network provisioning and
usage delivery needs
• Legacy solution was not adaptable or flexible
• We needed something innovative, reliable, internally
supportable, scalable – and we needed it fast
Putting ingenuITy to Work!
• Roger’s idea was suddenly a project
• We needed to build a platform to ingest, process, and
provide cleaned usage data for downstream
applications – and quickly
• We needed:
• A Hadoop Cluster
• 24x7 Operations
• Code to ingest data and handle a myriad of business
rules
• Integration with legacy and new systems
Atlas was Born
• Development work began immediately on Clearwire’s
private cloud infrastructure
• Selected BigTop Packaging of Apache Hadoop v1.0.1
• Custom code leveraging Hive and other common tools
to ingest and process data was written
• Infrastructure was built
Hybrid Approach to Hadoop
• Virtual Edge Nodes
• Leveraged our existing private cloud
• Physical Data Nodes
• Per Unit Cost (Storage & CPU) was lower than
existing infrastructure
• Smaller and more efficient than you think
• 24 data nodes, each with 3TB of usable storage
• Gives us 72TB of usable space
• 3x block replication for production data
• Deployed identical DR/Analytics platform
Operational in No Time
• 2.5 months from project approval to production
• Leveraged our existing support organizations
• Solution leveraged common tools, did not require
specialized teams
• Fault tolerance inherent within Hadoop helps us
minimize late night calls
• An endless supply of data was quickly flowing through
the system
• The results were looking good!
Real Results
• 65% improvement in end to end delivery times
• From 2.5 hours to 1.3 hours
• Reduced catch up time from upstream outages by
more than half
• Reduced outage impacts by introducing flexibility to
deliver partial files
• Eliminated 4 hour weekly usage delivery outages tied
to provisioning system maintenance
Anatomy of Latency - Now
ASN GW PTS SPB Wholesale
Partners
Internet
AAA
OSS SDU
1 Hour Average of 15 Minutes
Atlas Medusa
~6 Minutes ~9 Minutes
Real (Financial) Results
• 6 month return on investment
• Delivered at 1/3 the cost of competing solutions
• Foundational – Enabling Wholesale support plan of
legacy platform migration
• Saving Clearwire 10’s of millions of dollars over life of
contract and internalizing support and development
The Intangibles
• Proved to internal and external partners that we
deliver what we promise with limited negative impacts
to ongoing business
• This was KEY to the speed at which we were able to
migrate our billing platform
• Delivered more than just a single, targeted process –
delivered an enterprise usage platform to grow from
• Kept true to our innovative spirit and the commitment
to IT professionals that they can make a difference
Evolution – Proving More
The Atlas Hadoop platform is now a go-to IT solution
• LTE Usage Data – Now in production
• Other Data Sources - ESR Data
• Data Replication and real-time ETL
• Exploring opportunities with network team to move
closer to usage generation
• Changing mindset of what IT can mean to an
organization
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 

Was ist angesagt? (20)

Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-p...
 
Hortonworks HDP, Is it goog enough ?
Hortonworks HDP, Is it goog enough ?Hortonworks HDP, Is it goog enough ?
Hortonworks HDP, Is it goog enough ?
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 

Ähnlich wie Wrangling Customer Usage Data with Hadoop

GraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in GraphdatenbankenGraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in Graphdatenbanken
Neo4j
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Aitp presentation ed holub - october 23 2010
Aitp presentation   ed holub - october 23 2010Aitp presentation   ed holub - october 23 2010
Aitp presentation ed holub - october 23 2010
AITPHouston
 

Ähnlich wie Wrangling Customer Usage Data with Hadoop (20)

Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
GraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in GraphdatenbankenGraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in Graphdatenbanken
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016
NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016
NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Customer Use Case Featuring Hightail
Customer Use Case Featuring HightailCustomer Use Case Featuring Hightail
Customer Use Case Featuring Hightail
 
Aitp presentation ed holub - october 23 2010
Aitp presentation   ed holub - october 23 2010Aitp presentation   ed holub - october 23 2010
Aitp presentation ed holub - october 23 2010
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
 
Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...
Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...
Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Wrangling Customer Usage Data with Hadoop

  • 1. Wrangling Customer Usage Data with Hadoop Clearwire – Thursday, June 27th Carmen Hall – IT Director Mathew Johnson – Sr. IT Manager
  • 2. Starting With… • …a little ingenuITy!
  • 3. ingenuITy Day @ Clearwire • Opportunity for everyone in IT to innovate and present new and even crazy ideas • One of those crazy ideas was from Roger Hosto • Roger had the solution for Clearwire’s Big Data problem: Hadoop
  • 4. But Wait! • Now we had a solution for Big Data • We needed a Big Data opportunity • We had just the thing…
  • 5. The Perfect Problem • Customer Usage Data – our commodity to Wholesale partners
  • 6. Totally (un)Wired • Americans used more than 1,304 petabytes of wireless data in 2012 - an increase of 69.3% over the previous 12 months' usage (827 TB) • Clearwire processes over 3B individual usage detail records each month
  • 7. Shifting Landscape • The U.S. wireless industry is a $195.5 billion enterprise - larger than publishing, agriculture, hotels and lodging, air transportation and movies – just to name a few • Prepaid/Pay-As-You-Go services' share of overall market penetration is 23.4% driving higher exposure of lost revenue if usage delivery is delayed. • In some cases, a customer can consume data faster than we can bill for it
  • 8. Anatomy Of Latency - Legacy IT Usage Processing ASN GW PTS SPB Wholesale Partners Internet AAA OSS SDU 1 Hour Up to 90 Minutes
  • 9. Let’s Talk Numbers • Assume a 2GB plan • An HD movie from Netflix consumes 2+ GB per hour • Assume wholesale price = $6/GB • Assume the retail price for a GB of data (as top up or overage) ranges from $20 – $100
  • 10. As if that wasn’t enough - • Clearwire was locked into a very expensive vendor contract which handled both network provisioning and usage delivery needs • Legacy solution was not adaptable or flexible • We needed something innovative, reliable, internally supportable, scalable – and we needed it fast
  • 11. Putting ingenuITy to Work! • Roger’s idea was suddenly a project • We needed to build a platform to ingest, process, and provide cleaned usage data for downstream applications – and quickly • We needed: • A Hadoop Cluster • 24x7 Operations • Code to ingest data and handle a myriad of business rules • Integration with legacy and new systems
  • 12. Atlas was Born • Development work began immediately on Clearwire’s private cloud infrastructure • Selected BigTop Packaging of Apache Hadoop v1.0.1 • Custom code leveraging Hive and other common tools to ingest and process data was written • Infrastructure was built
  • 13. Hybrid Approach to Hadoop • Virtual Edge Nodes • Leveraged our existing private cloud • Physical Data Nodes • Per Unit Cost (Storage & CPU) was lower than existing infrastructure • Smaller and more efficient than you think • 24 data nodes, each with 3TB of usable storage • Gives us 72TB of usable space • 3x block replication for production data • Deployed identical DR/Analytics platform
  • 14. Operational in No Time • 2.5 months from project approval to production • Leveraged our existing support organizations • Solution leveraged common tools, did not require specialized teams • Fault tolerance inherent within Hadoop helps us minimize late night calls • An endless supply of data was quickly flowing through the system • The results were looking good!
  • 15. Real Results • 65% improvement in end to end delivery times • From 2.5 hours to 1.3 hours • Reduced catch up time from upstream outages by more than half • Reduced outage impacts by introducing flexibility to deliver partial files • Eliminated 4 hour weekly usage delivery outages tied to provisioning system maintenance
  • 16. Anatomy of Latency - Now ASN GW PTS SPB Wholesale Partners Internet AAA OSS SDU 1 Hour Average of 15 Minutes Atlas Medusa ~6 Minutes ~9 Minutes
  • 17. Real (Financial) Results • 6 month return on investment • Delivered at 1/3 the cost of competing solutions • Foundational – Enabling Wholesale support plan of legacy platform migration • Saving Clearwire 10’s of millions of dollars over life of contract and internalizing support and development
  • 18. The Intangibles • Proved to internal and external partners that we deliver what we promise with limited negative impacts to ongoing business • This was KEY to the speed at which we were able to migrate our billing platform • Delivered more than just a single, targeted process – delivered an enterprise usage platform to grow from • Kept true to our innovative spirit and the commitment to IT professionals that they can make a difference
  • 19. Evolution – Proving More The Atlas Hadoop platform is now a go-to IT solution • LTE Usage Data – Now in production • Other Data Sources - ESR Data • Data Replication and real-time ETL • Exploring opportunities with network team to move closer to usage generation • Changing mindset of what IT can mean to an organization
  • 20. Q & A