SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Big Data CDR Analyzer



Project Supervisors-            080201N – M.K.P.R. Jayawardhana
Mr. Thilina Anjitha – hSenid
                                080254D – P.K.A.M. Kumara
Dr.Shahani Markus Weerawarana
                                080331L – W.D.A.I. Paranawithana
                                080357V – T.D.K. Perera
Overview
•   Background
•   Current Situation
•   Scope and Assumptions
•   Kanthaka – big data CDR Analyzer System
•   Technology Comparison
       - Map Reduce
       - No SQL Databases
•   Architecture
•   Project Plan
•   Risks and Possible Remedies
•   References
Background
Mobile Promotions
Current Situation
• Promotions based only on their network usage
• Use only active call switch for triggering
  promotions
• No way of analyzing and processing high
  volume CDR records
• No efficient CDR analyzing method
• No access to historical data
• Complex rules not supported
                                             &@$*
                                             #
to rescue
• Selecting eligible users for both commercial
  organizations based and network usage based
  promotions.
  Eg- giving 20% discount for pizza lovers within age group 16-40 who
     have called pizza hut more than 5 times a month
• High volume CDR analysis.
• Near real time selection of eligible users for
  promotions.
• CDR Analyzer system which
 ▫ can process 30 million records per day
 ▫ can produce results within 10-15 seconds
 ▫ provides a GUI to define dynamic rules
 ▫ can be used to offer real-time sales promotions
   for mobile subscribers
Scope and Assumptions
Scope




  30 M                    30 M
  Multiple Rules          Single Rule
  Offer Promotion         Select eligibilities
                            for promotion only

  Real system operation    Operation expect by Kanthaka
Assumptions

• CDR records can be only in .CSV format.
• Event type can be in different types like SMS,
  Voice call, MMS, USSD, Top-up, GPRS, LBS.
• CDR can be received as batches to the system
  asynchronously.
• Only 6 attributes out of many attributes will be
  considered during processing.
Technology Comparison
Lot of data + higher speed
                  --> Scale out system
Map Reduce
  Hadoop map-reduce
 • Can handle lot of data
 • Latency is high that not suitable where results are expected in near real time




To count words of size of 100KB file
          Start time                 = 01.04.44
          End time                   =01.05.12
          Total time                 = 28 sec
DB Technology Comparison

• RDMS
 ▫   Provide ACID properties
 ▫   Use sharding to scale up
 ▫   Managing overhead is huge in scaling up
 ▫   Performance degrade with higher data load
 ▫   Less partition tolerant
DB Technology Comparison Ctd.

• NoSQL
 ▫ Lot of available options(Cassandra, HBase,
   MongoDB, Hive)
 ▫ Promised easy scale up(Lot of big users –
   Facebook, Twitter)
 ▫ Provide BASE properties under CAP theorem
 ▫ Hard to model the system into limited data model
 ▫ Partition tolerant
 ▫ More memory --> Higher performance
DB Technology Comparison Ctd.
• NewSQL
 ▫   Provide ACID properties
 ▫   Familiar relational data model
 ▫   Options available(ScaleDB, VoltDB)
 ▫   Totally run on memory, hence need lot of memory
 ▫   Promised speed
 ▫   Persistency achieved by replaying logs
With persistency, less restricted hardware,
           proven performance,
        best to try out is NoSQL.

• Cassandra – a key-value pair column family
  store(Used at Facebook, Twitter, eBay)
• HBase – a key value pair column family store
  (Facebook)
• MongoDB – document store(Adobe)
• Hive – HDFS based database
YCSB Benchmarks




• With more big users, active mailing lists, most
  promising technologies (secondary index,
  counters) best to try out is Cassandra.
Technology selection
Technologies left behind         Technologies selected

• Complex Event Processing       • NoSQL DB - Cassandra
  engines(CEP)
  ▫ No persistency
• Rules Engine
  ▫ More layers  More latency
• Hadoop
• NoSQL DB- Hbase, MongoDB,
  Hive
Architecture
Project Plan
Milestones                              Target date   Status
First chapters of final report                -       Done
ERU abstracts                                 -       Accepted
ERU Paper                               31/07/2012    Due
Architecture                            06/06/2012    Done
Setting up the Cassandra cluster        06/06/2012    Done
GUI for rule define                     15/06/2012    On going
Bulk data load to Cassandra             15/06/2012    On going
System Requirement Specification        20/06/2012    Due
Query data from database periodically   26/06/2012    Due
Initial Design Document                 27/06/2012    Due
Algorithm for Pre-processing            10/07/2012    Due
Testing                                 10/07/2012    Due
Final report                            10/08/2012    Due
Risks and Possible
Remedies

• NoSQL databases
  High performance More memory
Use an external cluster with descent memory

• In the long run
  Performance degrade  More data
Archiving
• Concurrency issues handling
  Low speed  Locking database
Use shadow copy

• NoSQL fails to achieve requirements
  Options :
  NewSQL– VoltDB (totally run on memory)
  CEP (Need actions to preserve persistency )

• Handling sudden peaks
  Should have an auto balancing mechanism ready
Final Deliverables
• Big Data CDR Analyzer system
• Research Paper
• Final Report
References

• http://www.slideshare.net/gvdinesh/cap-and-
  base-8169489
• B. F. Cooper, A. Silberstein, E. Tam, R.
  Ramakrishnan, and R. Sears, “Benchmarking
  cloud serving systems with YCSB,” 2010, pp.
  143–154.

Visit us at Kanthaka
Thank You!

Weitere ähnliche Inhalte

Andere mochten auch

Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Armando Vieira
 
MARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDEMARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDEhdalkie
 
TOP TEN Road and Travel Apps
TOP TEN Road and Travel AppsTOP TEN Road and Travel Apps
TOP TEN Road and Travel AppsMOTC Qatar
 
telecom analytics ppt
telecom analytics ppttelecom analytics ppt
telecom analytics pptvineeth menon
 
Road map for_education_results(ccer)_may
Road map for_education_results(ccer)_mayRoad map for_education_results(ccer)_may
Road map for_education_results(ccer)_maysremala
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry Persontyle
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationRising Media Ltd.
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersDataWorks Summit
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014Sri Ambati
 
Inventory Control Final Ppt
Inventory Control Final PptInventory Control Final Ppt
Inventory Control Final Pptrajnikant
 
Can We Assess Creativity?
Can We Assess Creativity?Can We Assess Creativity?
Can We Assess Creativity?John Spencer
 

Andere mochten auch (18)

Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning Customer insights from telecom data using deep learning
Customer insights from telecom data using deep learning
 
Social Media Road Map
Social Media Road MapSocial Media Road Map
Social Media Road Map
 
MARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDEMARKET ANALYZER USER GUIDE
MARKET ANALYZER USER GUIDE
 
Ativ1 4 rafaelaam
Ativ1 4 rafaelaamAtiv1 4 rafaelaam
Ativ1 4 rafaelaam
 
TOP TEN Road and Travel Apps
TOP TEN Road and Travel AppsTOP TEN Road and Travel Apps
TOP TEN Road and Travel Apps
 
telecom analytics ppt
telecom analytics ppttelecom analytics ppt
telecom analytics ppt
 
Data Science Strategy
Data Science StrategyData Science Strategy
Data Science Strategy
 
Road map for_education_results(ccer)_may
Road map for_education_results(ccer)_mayRoad map for_education_results(ccer)_may
Road map for_education_results(ccer)_may
 
Big Data Telecom
Big Data TelecomBig Data Telecom
Big Data Telecom
 
A Road Map To Perfect Duplication
A Road Map To Perfect DuplicationA Road Map To Perfect Duplication
A Road Map To Perfect Duplication
 
Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry  Benefiting from Big Data - A New Approach for the Telecom Industry
Benefiting from Big Data - A New Approach for the Telecom Industry
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in Telecommunication
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Inventory Control
Inventory ControlInventory Control
Inventory Control
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Inventory Control Final Ppt
Inventory Control Final PptInventory Control Final Ppt
Inventory Control Final Ppt
 
Can We Assess Creativity?
Can We Assess Creativity?Can We Assess Creativity?
Can We Assess Creativity?
 

Ähnlich wie Kanthaka - High Volume CDR Analyzer

Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemSerendio Inc.
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Tuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraTuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraNenad Bozic
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemDataWorks Summit
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0Amr Kamel Deklel
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 

Ähnlich wie Kanthaka - High Volume CDR Analyzer (20)

Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Oracle Big Data Cloud service
Oracle Big Data Cloud serviceOracle Big Data Cloud service
Oracle Big Data Cloud service
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Tuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache CassandraTuning Java Driver for Apache Cassandra
Tuning Java Driver for Apache Cassandra
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data engineering
Data engineeringData engineering
Data engineering
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop Ecosystem
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 

Mehr von Pushpalanka Jayawardhana

Authorization for workloads in a dynamically scaling heterogeneous system
Authorization for workloads in a  dynamically scaling heterogeneous systemAuthorization for workloads in a  dynamically scaling heterogeneous system
Authorization for workloads in a dynamically scaling heterogeneous systemPushpalanka Jayawardhana
 
The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand Pushpalanka Jayawardhana
 
Identity mediation for enterprise identity bus
Identity mediation for enterprise identity busIdentity mediation for enterprise identity bus
Identity mediation for enterprise identity busPushpalanka Jayawardhana
 
Threads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsThreads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsPushpalanka Jayawardhana
 
Approximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule MiningApproximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule MiningPushpalanka Jayawardhana
 
Leveraging federation capabilities of identity server for api gateway
Leveraging federation capabilities  of identity server for api gatewayLeveraging federation capabilities  of identity server for api gateway
Leveraging federation capabilities of identity server for api gatewayPushpalanka Jayawardhana
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systemsPushpalanka Jayawardhana
 

Mehr von Pushpalanka Jayawardhana (10)

Authorization for workloads in a dynamically scaling heterogeneous system
Authorization for workloads in a  dynamically scaling heterogeneous systemAuthorization for workloads in a  dynamically scaling heterogeneous system
Authorization for workloads in a dynamically scaling heterogeneous system
 
The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand
 
Frictionless Adaption of PSD2 with WSO2
Frictionless Adaption of PSD2 with WSO2Frictionless Adaption of PSD2 with WSO2
Frictionless Adaption of PSD2 with WSO2
 
Identity mediation for enterprise identity bus
Identity mediation for enterprise identity busIdentity mediation for enterprise identity bus
Identity mediation for enterprise identity bus
 
Threads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsThreads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread Pools
 
Approximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule MiningApproximate Protocol for Privacy Preserving Associate Rule Mining
Approximate Protocol for Privacy Preserving Associate Rule Mining
 
Leveraging federation capabilities of identity server for api gateway
Leveraging federation capabilities  of identity server for api gatewayLeveraging federation capabilities  of identity server for api gateway
Leveraging federation capabilities of identity server for api gateway
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systems
 
Experience at WSO2 as an Intern
Experience at WSO2 as an InternExperience at WSO2 as an Intern
Experience at WSO2 as an Intern
 
Cosmology in general
Cosmology in generalCosmology in general
Cosmology in general
 

Kürzlich hochgeladen

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Kürzlich hochgeladen (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Kanthaka - High Volume CDR Analyzer

  • 1. Big Data CDR Analyzer Project Supervisors- 080201N – M.K.P.R. Jayawardhana Mr. Thilina Anjitha – hSenid 080254D – P.K.A.M. Kumara Dr.Shahani Markus Weerawarana 080331L – W.D.A.I. Paranawithana 080357V – T.D.K. Perera
  • 2. Overview • Background • Current Situation • Scope and Assumptions • Kanthaka – big data CDR Analyzer System • Technology Comparison - Map Reduce - No SQL Databases • Architecture • Project Plan • Risks and Possible Remedies • References
  • 4. Current Situation • Promotions based only on their network usage • Use only active call switch for triggering promotions • No way of analyzing and processing high volume CDR records • No efficient CDR analyzing method • No access to historical data • Complex rules not supported &@$* #
  • 5. to rescue • Selecting eligible users for both commercial organizations based and network usage based promotions. Eg- giving 20% discount for pizza lovers within age group 16-40 who have called pizza hut more than 5 times a month • High volume CDR analysis. • Near real time selection of eligible users for promotions.
  • 6. • CDR Analyzer system which ▫ can process 30 million records per day ▫ can produce results within 10-15 seconds ▫ provides a GUI to define dynamic rules ▫ can be used to offer real-time sales promotions for mobile subscribers
  • 7. Scope and Assumptions Scope  30 M  30 M  Multiple Rules  Single Rule  Offer Promotion  Select eligibilities for promotion only Real system operation Operation expect by Kanthaka
  • 8. Assumptions • CDR records can be only in .CSV format. • Event type can be in different types like SMS, Voice call, MMS, USSD, Top-up, GPRS, LBS. • CDR can be received as batches to the system asynchronously. • Only 6 attributes out of many attributes will be considered during processing.
  • 10. Lot of data + higher speed --> Scale out system
  • 11. Map Reduce Hadoop map-reduce • Can handle lot of data • Latency is high that not suitable where results are expected in near real time To count words of size of 100KB file Start time = 01.04.44 End time =01.05.12 Total time = 28 sec
  • 12. DB Technology Comparison • RDMS ▫ Provide ACID properties ▫ Use sharding to scale up ▫ Managing overhead is huge in scaling up ▫ Performance degrade with higher data load ▫ Less partition tolerant
  • 13. DB Technology Comparison Ctd. • NoSQL ▫ Lot of available options(Cassandra, HBase, MongoDB, Hive) ▫ Promised easy scale up(Lot of big users – Facebook, Twitter) ▫ Provide BASE properties under CAP theorem ▫ Hard to model the system into limited data model ▫ Partition tolerant ▫ More memory --> Higher performance
  • 14. DB Technology Comparison Ctd. • NewSQL ▫ Provide ACID properties ▫ Familiar relational data model ▫ Options available(ScaleDB, VoltDB) ▫ Totally run on memory, hence need lot of memory ▫ Promised speed ▫ Persistency achieved by replaying logs
  • 15. With persistency, less restricted hardware, proven performance, best to try out is NoSQL. • Cassandra – a key-value pair column family store(Used at Facebook, Twitter, eBay) • HBase – a key value pair column family store (Facebook) • MongoDB – document store(Adobe) • Hive – HDFS based database
  • 16. YCSB Benchmarks • With more big users, active mailing lists, most promising technologies (secondary index, counters) best to try out is Cassandra.
  • 17. Technology selection Technologies left behind Technologies selected • Complex Event Processing • NoSQL DB - Cassandra engines(CEP) ▫ No persistency • Rules Engine ▫ More layers  More latency • Hadoop • NoSQL DB- Hbase, MongoDB, Hive
  • 19. Project Plan Milestones Target date Status First chapters of final report - Done ERU abstracts - Accepted ERU Paper 31/07/2012 Due Architecture 06/06/2012 Done Setting up the Cassandra cluster 06/06/2012 Done GUI for rule define 15/06/2012 On going Bulk data load to Cassandra 15/06/2012 On going System Requirement Specification 20/06/2012 Due Query data from database periodically 26/06/2012 Due Initial Design Document 27/06/2012 Due Algorithm for Pre-processing 10/07/2012 Due Testing 10/07/2012 Due Final report 10/08/2012 Due
  • 20. Risks and Possible Remedies • NoSQL databases High performance More memory Use an external cluster with descent memory • In the long run Performance degrade  More data Archiving
  • 21. • Concurrency issues handling Low speed  Locking database Use shadow copy • NoSQL fails to achieve requirements Options : NewSQL– VoltDB (totally run on memory) CEP (Need actions to preserve persistency ) • Handling sudden peaks Should have an auto balancing mechanism ready
  • 22. Final Deliverables • Big Data CDR Analyzer system • Research Paper • Final Report
  • 23. References • http://www.slideshare.net/gvdinesh/cap-and- base-8169489 • B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with YCSB,” 2010, pp. 143–154. Visit us at Kanthaka