The idea was to predict the customer experience, and their perception of the O2 network at both the user and area levels to drive the network and marketing investments. Here is why and how we got there.
In order to measure and predict customer network experience, O2 needed a streaming big data solution which would consume billions of events coming in from the network, in real-time, to measure the performance of the network as experienced by the customer. It was important to build a platform to gather all the relevant data; to co-relate that with the customer satisfaction index (CSI) surveys to understand the relationship of metrics to score. We applied machine learning methods to predict the CSI for all users on the network. Customer insights from the network helped us to build customer segmentations which are shaping various marketing and digital propositions at O2.
- The overall solution was based on a hybrid architecture, where Open Source technologies were brought together with Tableau visualization which enabled O2 to keep the maintenance cost down to a minimum.
- In order to have quick ROI, the solution was built as the prototype which continued to evolve and now currently handles 30 billion transactions a day, continuously streaming into the platform, and predicting customer experience for 35m+ users.
The O2 solution continued to expand every year to accommodate multi-fold growth in traffic, and to accommodate additional features. The decision to move from a community edition Hadoop to the Hortonworks-based platform enabled us to have a supported, faster, and more reliable service. The migration to Hortonworks was completed in October 2018 which has given us the reliable platform to expand the analytics use cases across the wider O2 businesses.
Driving Network and Marketing Investments at O2 by Focusing on Improving the Customer Experience
1. Driving Network and Marketing Investments at O2 by Focusing on Improving the
Customer Experience
Ankur Agarwal
Network Data & Analytics Lead
Ajay Kaushik
Data Platform Design Lead
Evolution of O2 Network Data Analytics Hub
2. About the Speakers today
2
Ankur Agarwal
Ankur currently drives the data strategy and roadmap of Network Data & Analytics in
O2. Prior to this role, Ankur led the BI and Big data design at O2 where he owned the
data architecture, design and validation of data deliveries.
Before joining O2, Ankur was leading the Presales and Solutions function for Data and
Analytics practice at TCS where he has helped a number of clients in UK and EU on
their data initiatives.
Ajay Kaushik
Ajay is a Platform Design Lead in Telefónica’s Big Data Analytics team with a diverse
background in systems engineering and platform design. He has wide ranging
experience in the digital and network domain evangelising DevOps.
3. O2 UK - Telefónica
3
O2 is a mobile network operator and the principal commercial brand of Telefónica UK Limited, which is part
of the global telecommunications group Telefónica S.A, headquartered in Spain and operating in Europe,
and North, Central and South America.
O2 was awarded Best Network Coverage in 2019 by uSwitch, for a second year in a row, and with over 32 million
connections to the network, it runs 2G, 3G and 4G services across the UK, as well as operating its nationwide O2
Wifi service.
The company is the network of choice for mobile virtual network operators such as giffgaff, Sky Mobile and Lyca
Mobile as well as managing a 50:50 joint venture with Tesco for Tesco Mobile.
O2 has around 6,700 employees and over 450 retail stores and sponsors England Rugby, The O2 and 19 O2
Academy music venues across the UK. Through a comprehensive sustainability strategy O2 is also creating work
experience opportunities for 16-24 year olds via its GoThinkBig platform, enabling customers to reduce their impact
on the environment by recycling their old devices through O2 Recycle and, in partnership with the NSPCC, helping
parents to keep their children safe online.
O2 is the only mobile operator in the 2018 Social Mobility Employer Index and was named as one of the best places
to work in the 2019 Glassdoor Employee’s Choice Award.
4. Background
4
Specific focus was to be
given to keep the
maintenance cost ultra low
O2 needed a data platform to ingest
network events for measuring &
predicting customer network
experience
There was lots of manual
intervention required co-
relate the data from a
number of data sources
Very large volume of structured and
semi structured data had to be
ingested in real time without
overhead on network
Complex enrichment
required to co-relate the data
in the platform during
ingestion
Requirement for consistent
transformation rules and
data governance & quality
monitoring
5. Key Architectural Decision for building the platform
5
Build vs Buy
Open Source or
not
Transform & Enrich
Data while in
motion or at rest
Start small with a
prototype & evolveOn-premise vs
Cloud
To Hadoop or not
6. HDP
Data Sources Data Ingestion
Streaming and mini batches
Data Storage & Analytics Data Presentation Data Consumption
Features
KPIs
M/L Models
Hive
Data feeds
Tableau Extracts
Master data
Aggregated views
Web
Mobility
Probes
CRM
DPI
GSMA
Surveys
Alarms
Tickets
Micro services
based ETL
Engine
HDF
API
High Level Architecture
MongoDB ODS
30TB RAM, 4500 VCores, 2.5 PB
20TB RAM,
3800 VCores,
650 TB
30B+ Daily Events
7. Technology Selection and Considerations
7
Data Loading / Storage /
Lineage
• Exploiting the capability of
HDF and HDP to meet the
business requirements.
• Easily expandable for
future Use Cases.
• Deployed on standard
commodity hardware.
ETL
• Micro services based
ETL platform
• Complex enrichment
capability during
ingestion
Reporting
• O2 selected tool for
Discovery &
Visualisation.
• Hosted in-premise &
(coming up in) Cloud.
Data Science Toolkit
• Open Source tools
chosen having
continuous
contribution from
developers
• Deployed on GPU
machines
8. Platform Evolution
8
42 Nodes 200 Nodes 260 Nodes 320 Nodes
2014 2015-16 2017 2018 2019
Micro Service Based
ETL
Spark Adoption for
performance
Introduced Self
Service
Migration to
Supported HDP
Data Governance Tools
Policy Based Security
E2E Lineage
Data Encryption
Analytics
Data
Management
Platform
Capability
Cluster Size
Virtual Drive
Test
Experience
Customer
Segmentation
(Personalization)
CSI
Predictor
(NCX)
NCX Predictor
(Voice)
Data APIs
ML
Capability
NLP
NCX What-if
Hybrid Cloud
Analytics APIs
Hybrid
9. CSI Predictor (NCX)
Weblogs
IPRF
Magnet
Arcanum
GSMA
User Catalogue
Fanbase
Web Analytics
Mobility
Signalling
Calls
Feature
Table (FT)
Day 7
Feature Table
Aggregated
(FT)
Feature Table
Aggregated Scored
(NCX)
MME
Feature
Table (FT)
Day 1
ML
CRM
Hive
10. NCX driving focus on Network & Marketing
investments
• Marketing - NCX as a driver of customer communication
• Always on marketing campaign highlighting customers who have had an x
improvement in their NCX score
• Use individual NCX as a post-disruption targeting mechanism
• Experience / improvement reinforcement message to people who we know
have their experience improved
• Customer Service
• Use NCX to identify if there is a tipping point when customers churn or complain
• Networks
• Driving end to end network performance
• Impact on customer experience due to network roll out and changes
• Strategic network forecasting
• Using NCX to prioritise capacity, coverage and technology investment
10
11. O2 Network
Data &
Analytics
Platform
O2 Labs
Marketing
D&A
TEF
Research &
Innovation
Digital
Network
Ops &
Performance
Revenue
Assurance
GiffGaff/Sky
Smart
Metering
Data, Platform & Analytics capabilities consumed
by all the spokes
Analytics products like automated Anomaly
Detections are developed jointly between
Netpulse & TEF Research & Innovation Teams
Collaboration between Netpulse, O2 labs and
Marketing D&A team for jointly evolve more re-
usable analytics product like NLP – initially built by
Netpulse.
Extending the analytics development capability to
Digital team to enhance Smartsteps & Smartcities
products using customer insight generated from
Network data.
And now we are.. Network Data & Analytics Hub
12. Considerations and Lessons Learnt
12
Build the Datalake on specific Business Value
Always build a Datalake on defined Use Cases that have business value from
day-1, this will ensure that the lake won’t turn into a very expensive resource with
no financial return for the business.
Utilise the experts – Hortonworks
1) Helped correctly size environment (Nifi, Data Nodes, Edge Nodes)
2) Installed all software and setup initial environment.
3) Part of the core team, answering queries and responding to technical tickets.
4) Provided subject matter experts, architectural guidance, design and security
knowledge.
13. Considerations and Lessons Learnt
13
Our approach to Hybrid Environment (Cloud and on-premise)
• Quick time to market for capacity expansion
• Avoid huge Cloud cost by keeping the hot skeleton site active and bursting it
on demand – leveraging cloud elasticity
• Determine any data security and residency requirements.
• Plug in readily available cognitive services APIs in Cloud in the Analytics
pipeline to rapidly experiment the model.
• Option to explore the alternative architecture options with bucket storage
with auto scaling compute & APIs
• Cold/Warm Storage capability
Explain the need of a data and analytics platform to ingest and analyze the network events.
Various team used to gather and co-related data from various systems which was time consuming and was not giving the insight when it was needed.
Platform was to be scaled to handle 30 billion events a day streaming continuously from the network and co-relate those in real time to measure and monitor the customer experience. A number of metrics to be created to assess the impact of those with actual customer experience.
Instead of various team co-relating data with various definition, this platform was to create consistent transformation/enrichment rules to provide co-herant analysis.
For building the data platform, here are some of the key decisions which were taken.
There are good lesson learnt from some of them - explained in the slides later.
Open Source or not – Decision was taken to go open source in line with our strategy.
Transform & Enrich Data while in motion or at rest – To avoid re-processing huge data set again which would have added to additional latency. Proof of concept was conducted to compare the non functional for enrichment while data in motion or at rest.
To Hadoop or not – Decision was taken to go community Hadoop against any MPP relational databases available at that time
Onpremise vs Cloud – Decision was taken to go on-premise to leverage in-house build capability. Cloud was not the strategy at that time due to uncertainty of cost for large data platform and sensitive data.
Build vs Buy – No scalable technology available to handle the scale without burning our pockets. Compatibility with Hadoop was not proven for many of them. Data in motion co-relation a question mark for such a volume. Decision was to build and not buy.
Prototype and Evolve – Instead of going big bang and setting up massive data lake, smaller use cases were chosen for quick ROI and those were evolved.
If you ask us today, how many decision would we like to revert - may be half of them
This slide describes the architecture on which types of source data is currently used in the platform and various types of formats that data is produced by the network. Point to note here is that specific focus was given not to create overhead on the network to integrate the data for this platform and therefore standard network logs were used.
A very performant modular micro services based ingestion layer (ETL engine) was built to consume, parse, transform/enrich and load the data into Hadoop cluster (HDFS). Additional a more dynamic and agile HDF technology was deployed for simplified ETL which could be written by technical analysts instead of expensive developers.
In the data storage layer, there are 3 pipelines – a) which generate master data from events e.g. how many users use O2 network include MVNO and in-roamers b) KPI generation for network and customer reporting c) feature generation which measure various performance parameter which are then fed into machine learning models to predict the customer experience.
Data is then exposed by Hive and Tableau technologies to end user tools like Zepplin, Tableau, Ambari and data feeds. We are also building APIs which will take the insight from the platform directly to customer and agent facing channels to help them handle/orchestrate the customer interaction appropriately.
Stats provided to show the size and design of the platform to process and store 30B+ events daily.
This slide shows our tool of choice of each layer mentioned in the previous slide.
This slide shows our journey from 2014 across various dimensions of data & analytics – Cluster size, platform capability, Data management & Analytics.
2014 – 2016 – we started small and delivered initial use case called virtual drive test analytics tool which provide more accurate and timely view of network experience at network element level at every hour. Previously, O2 used to use Drive test results from various agencies coming to us parodically. Power of the data in the platform also enabled Web & Location Analytics to support Smartsteps, revenue assurance, Weve & Personalization use case and that’s when we started getting ROI.
Major break thru happened in 2017 with the launch of NCX – which started driving network & marketing investment mentioned in slide 9
2019 will be the year to integrate the NCX with customer journey in channels to drive CX.
This slide shows more technical data flow and data science process which is used to predict the NCX.
Work with Network Performance team and the regional teams to drive performance E2E. Regional teams identify customers groups/regions with a poor NCX score < 40 then identify if there is a known issue in the area or a new radio issue.
Platform and the team continued to grow and there was a increasing interest from various parts of the business to utilize the analytics from networks data. We now act as the hub and server various spokes in the business for their data & analytics needs.
Where needed we collaborate with other analytics team to jointly develop analytics products which have much wider use cases.
Data from the platform helps MVNOs detect anomaly in customer network profile which is notified to MVNOs to take appropriate actions. Platform will soon be opened up to MVNOs for reporting.