'Kanthaka' is an attempt to bring the benefits of Big Data technologies to telecom industry. The objective of the system is to analyze the CDRs (Caller Detail Record) and give results in near real time.
This is carried out as a final year project for my degree B. Sc. of Engineering (Hons) at University of Moratuwa as a team with 3 more colleagues, under the supervision of a senior lecturer and an industry expert.
The presentation exhibits the background, findings after literature review and proposing architecture of the system as for now. Any feed backs on improvements that can be made, are warmly welcome!
Unraveling Multimodality with Large Language Models.pdf
Kanthaka - High Volume CDR Analyzer
1. Big Data CDR Analyzer
Project Supervisors- 080201N – M.K.P.R. Jayawardhana
Mr. Thilina Anjitha – hSenid
080254D – P.K.A.M. Kumara
Dr.Shahani Markus Weerawarana
080331L – W.D.A.I. Paranawithana
080357V – T.D.K. Perera
2. Overview
• Background
• Current Situation
• Scope and Assumptions
• Kanthaka – big data CDR Analyzer System
• Technology Comparison
- Map Reduce
- No SQL Databases
• Architecture
• Project Plan
• Risks and Possible Remedies
• References
4. Current Situation
• Promotions based only on their network usage
• Use only active call switch for triggering
promotions
• No way of analyzing and processing high
volume CDR records
• No efficient CDR analyzing method
• No access to historical data
• Complex rules not supported
&@$*
#
5. to rescue
• Selecting eligible users for both commercial
organizations based and network usage based
promotions.
Eg- giving 20% discount for pizza lovers within age group 16-40 who
have called pizza hut more than 5 times a month
• High volume CDR analysis.
• Near real time selection of eligible users for
promotions.
6. • CDR Analyzer system which
▫ can process 30 million records per day
▫ can produce results within 10-15 seconds
▫ provides a GUI to define dynamic rules
▫ can be used to offer real-time sales promotions
for mobile subscribers
7. Scope and Assumptions
Scope
30 M 30 M
Multiple Rules Single Rule
Offer Promotion Select eligibilities
for promotion only
Real system operation Operation expect by Kanthaka
8. Assumptions
• CDR records can be only in .CSV format.
• Event type can be in different types like SMS,
Voice call, MMS, USSD, Top-up, GPRS, LBS.
• CDR can be received as batches to the system
asynchronously.
• Only 6 attributes out of many attributes will be
considered during processing.
10. Lot of data + higher speed
--> Scale out system
11. Map Reduce
Hadoop map-reduce
• Can handle lot of data
• Latency is high that not suitable where results are expected in near real time
To count words of size of 100KB file
Start time = 01.04.44
End time =01.05.12
Total time = 28 sec
12. DB Technology Comparison
• RDMS
▫ Provide ACID properties
▫ Use sharding to scale up
▫ Managing overhead is huge in scaling up
▫ Performance degrade with higher data load
▫ Less partition tolerant
13. DB Technology Comparison Ctd.
• NoSQL
▫ Lot of available options(Cassandra, HBase,
MongoDB, Hive)
▫ Promised easy scale up(Lot of big users –
Facebook, Twitter)
▫ Provide BASE properties under CAP theorem
▫ Hard to model the system into limited data model
▫ Partition tolerant
▫ More memory --> Higher performance
14. DB Technology Comparison Ctd.
• NewSQL
▫ Provide ACID properties
▫ Familiar relational data model
▫ Options available(ScaleDB, VoltDB)
▫ Totally run on memory, hence need lot of memory
▫ Promised speed
▫ Persistency achieved by replaying logs
15. With persistency, less restricted hardware,
proven performance,
best to try out is NoSQL.
• Cassandra – a key-value pair column family
store(Used at Facebook, Twitter, eBay)
• HBase – a key value pair column family store
(Facebook)
• MongoDB – document store(Adobe)
• Hive – HDFS based database
16. YCSB Benchmarks
• With more big users, active mailing lists, most
promising technologies (secondary index,
counters) best to try out is Cassandra.
17. Technology selection
Technologies left behind Technologies selected
• Complex Event Processing • NoSQL DB - Cassandra
engines(CEP)
▫ No persistency
• Rules Engine
▫ More layers More latency
• Hadoop
• NoSQL DB- Hbase, MongoDB,
Hive
19. Project Plan
Milestones Target date Status
First chapters of final report - Done
ERU abstracts - Accepted
ERU Paper 31/07/2012 Due
Architecture 06/06/2012 Done
Setting up the Cassandra cluster 06/06/2012 Done
GUI for rule define 15/06/2012 On going
Bulk data load to Cassandra 15/06/2012 On going
System Requirement Specification 20/06/2012 Due
Query data from database periodically 26/06/2012 Due
Initial Design Document 27/06/2012 Due
Algorithm for Pre-processing 10/07/2012 Due
Testing 10/07/2012 Due
Final report 10/08/2012 Due
20. Risks and Possible
Remedies
• NoSQL databases
High performance More memory
Use an external cluster with descent memory
• In the long run
Performance degrade More data
Archiving
21. • Concurrency issues handling
Low speed Locking database
Use shadow copy
• NoSQL fails to achieve requirements
Options :
NewSQL– VoltDB (totally run on memory)
CEP (Need actions to preserve persistency )
• Handling sudden peaks
Should have an auto balancing mechanism ready
23. References
• http://www.slideshare.net/gvdinesh/cap-and-
base-8169489
• B. F. Cooper, A. Silberstein, E. Tam, R.
Ramakrishnan, and R. Sears, “Benchmarking
cloud serving systems with YCSB,” 2010, pp.
143–154.
Visit us at Kanthaka