Kanthaka - High Volume CDR Analyzer

Big Data CDR Analyzer

Project Supervisors- 080201N – M.K.P.R. Jayawardhana
Mr. Thilina Anjitha – hSenid
080254D – P.K.A.M. Kumara
Dr.Shahani Markus Weerawarana
080331L – W.D.A.I. Paranawithana
080357V – T.D.K. Perera

Overview
• Background
• Current Situation
• Scope and Assumptions
• Kanthaka – big data CDR Analyzer System
• Technology Comparison
- Map Reduce
- No SQL Databases
• Architecture
• Project Plan
• Risks and Possible Remedies
• References

Current Situation
• Promotions based only on their network usage
• Use only active call switch for triggering
promotions
• No way of analyzing and processing high
volume CDR records
• No efficient CDR analyzing method
• No access to historical data
• Complex rules not supported
&@$*
#

to rescue
• Selecting eligible users for both commercial
organizations based and network usage based
promotions.
Eg- giving 20% discount for pizza lovers within age group 16-40 who
have called pizza hut more than 5 times a month
• High volume CDR analysis.
• Near real time selection of eligible users for
promotions.

• CDR Analyzer system which
▫ can process 30 million records per day
▫ can produce results within 10-15 seconds
▫ provides a GUI to define dynamic rules
▫ can be used to offer real-time sales promotions
for mobile subscribers

Scope and Assumptions
Scope

 30 M  30 M
 Multiple Rules  Single Rule
 Offer Promotion  Select eligibilities
for promotion only

Real system operation Operation expect by Kanthaka

Assumptions

• CDR records can be only in .CSV format.
• Event type can be in different types like SMS,
Voice call, MMS, USSD, Top-up, GPRS, LBS.
• CDR can be received as batches to the system
asynchronously.
• Only 6 attributes out of many attributes will be
considered during processing.

Lot of data + higher speed
--> Scale out system

Map Reduce
Hadoop map-reduce
• Can handle lot of data
• Latency is high that not suitable where results are expected in near real time

To count words of size of 100KB file
Start time = 01.04.44
End time =01.05.12
Total time = 28 sec

DB Technology Comparison

• RDMS
▫ Provide ACID properties
▫ Use sharding to scale up
▫ Managing overhead is huge in scaling up
▫ Performance degrade with higher data load
▫ Less partition tolerant

DB Technology Comparison Ctd.

• NoSQL
▫ Lot of available options(Cassandra, HBase,
MongoDB, Hive)
▫ Promised easy scale up(Lot of big users –
Facebook, Twitter)
▫ Provide BASE properties under CAP theorem
▫ Hard to model the system into limited data model
▫ Partition tolerant
▫ More memory --> Higher performance

DB Technology Comparison Ctd.
• NewSQL
▫ Provide ACID properties
▫ Familiar relational data model
▫ Options available(ScaleDB, VoltDB)
▫ Totally run on memory, hence need lot of memory
▫ Promised speed
▫ Persistency achieved by replaying logs

With persistency, less restricted hardware,
proven performance,
best to try out is NoSQL.

• Cassandra – a key-value pair column family
store(Used at Facebook, Twitter, eBay)
• HBase – a key value pair column family store
(Facebook)
• MongoDB – document store(Adobe)
• Hive – HDFS based database

YCSB Benchmarks

• With more big users, active mailing lists, most
promising technologies (secondary index,
counters) best to try out is Cassandra.

Technology selection
Technologies left behind Technologies selected

• Complex Event Processing • NoSQL DB - Cassandra
engines(CEP)
▫ No persistency
• Rules Engine
▫ More layers  More latency
• Hadoop
• NoSQL DB- Hbase, MongoDB,
Hive

Project Plan
Milestones Target date Status
First chapters of final report - Done
ERU abstracts - Accepted
ERU Paper 31/07/2012 Due
Architecture 06/06/2012 Done
Setting up the Cassandra cluster 06/06/2012 Done
GUI for rule define 15/06/2012 On going
Bulk data load to Cassandra 15/06/2012 On going
System Requirement Specification 20/06/2012 Due
Query data from database periodically 26/06/2012 Due
Initial Design Document 27/06/2012 Due
Algorithm for Pre-processing 10/07/2012 Due
Testing 10/07/2012 Due
Final report 10/08/2012 Due

Risks and Possible
Remedies

• NoSQL databases
High performance More memory
Use an external cluster with descent memory

• In the long run
Performance degrade  More data
Archiving

• Concurrency issues handling
Low speed  Locking database
Use shadow copy

• NoSQL fails to achieve requirements
Options :
NewSQL– VoltDB (totally run on memory)
CEP (Need actions to preserve persistency )

• Handling sudden peaks
Should have an auto balancing mechanism ready

Final Deliverables
• Big Data CDR Analyzer system
• Research Paper
• Final Report

References

• http://www.slideshare.net/gvdinesh/cap-and-
base-8169489
• B. F. Cooper, A. Silberstein, E. Tam, R.
Ramakrishnan, and R. Sears, “Benchmarking
cloud serving systems with YCSB,” 2010, pp.
143–154.

Visit us at Kanthaka

Kanthaka - High Volume CDR Analyzer

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Kanthaka - High Volume CDR Analyzer

Ähnlich wie Kanthaka - High Volume CDR Analyzer (20)

Mehr von Pushpalanka Jayawardhana

Mehr von Pushpalanka Jayawardhana (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Kanthaka - High Volume CDR Analyzer