SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Log, monitoring and
QoS platform of a large
scale service
Phan Huy Hoàng – Lead Engineer
Web Mobile – Zing
hoangph@vng.com.vn
Agenda
• 1/ Quality of Service Platform: Why we need pay
attention to QoS
• 2/ Logging: Handling lots of data
• 3/ Monitoring
• 4/ Questions & Answers
Quality of Service Platform
Why we need to pay attention to QoS ?
Why we need QoS?
• How your app actually works in real-life ?
• Users are using which functions ?
Chat, social, search nearby… ?
• Do those functions even work at all ?
• If it does work, how good, in which environment ?
A few numbers
• Lots of data
• 3M users
• 28M messages chat/day
• 600M million events/day
Too much data
Logging
Handling lots of data
Logging Flow v1
Logging Flow v2
Monitoring
Charting
• Live data
• Trends data
Dashboard
• Simplified your life
• Concentrate on drastic change
Anomaly data points detection
• How to deal with stuff like this ?
• Too many data point deviated from normal deviation
should trigger an alert
• You will get a lot of false positive
What’s next ?
• Sending alert
• By Zalo, SMS, email
• Happy life
Questions?
Server   log, monitoring and qo s platform of a messaging app

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
University of Liverpool: TERMINALFOUR & App Development- Making the Most of y...
 
A Bird and the Web
A Bird and the WebA Bird and the Web
A Bird and the Web
 
The Value of Digital Outsourcing
The Value of Digital Outsourcing The Value of Digital Outsourcing
The Value of Digital Outsourcing
 
PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...
PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...
PLNOG 13: Grzegorz Janoszka: Peering vs Tranzyt – Czy peering jest naprawdę s...
 
Basecamp presentation
Basecamp presentationBasecamp presentation
Basecamp presentation
 
Autobahn primer
Autobahn primerAutobahn primer
Autobahn primer
 
ppt-basecamp
ppt-basecampppt-basecamp
ppt-basecamp
 
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
 
Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering Meteor Revolution: From DDP to Blaze Reactive Rendering
Meteor Revolution: From DDP to Blaze Reactive Rendering
 
RPA on Azure with .Net
RPA on Azure with .NetRPA on Azure with .Net
RPA on Azure with .Net
 
Performance metrics for a social network
Performance metrics for a social networkPerformance metrics for a social network
Performance metrics for a social network
 
Fashiolista
FashiolistaFashiolista
Fashiolista
 
10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today10 Things You Can Do to Speed Up Your Web App Today
10 Things You Can Do to Speed Up Your Web App Today
 
Single page applications the basics
Single page applications the basicsSingle page applications the basics
Single page applications the basics
 
Urbanesia - Development History
Urbanesia - Development HistoryUrbanesia - Development History
Urbanesia - Development History
 

Andere mochten auch (7)

Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
 
Tips and tricks to attack memory problem in android programming
Tips and tricks to attack memory problem in android programmingTips and tricks to attack memory problem in android programming
Tips and tricks to attack memory problem in android programming
 
Experience lessons from architecture of zalo real time system
Experience lessons from architecture of zalo real time systemExperience lessons from architecture of zalo real time system
Experience lessons from architecture of zalo real time system
 
Sơ lược kiến trúc hệ thống Zing Me
Sơ lược kiến trúc hệ thống Zing MeSơ lược kiến trúc hệ thống Zing Me
Sơ lược kiến trúc hệ thống Zing Me
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed System
 
Zingme practice for building scalable website with PHP
Zingme practice for building scalable website with PHPZingme practice for building scalable website with PHP
Zingme practice for building scalable website with PHP
 
Develop Android app using Golang
Develop Android app using GolangDevelop Android app using Golang
Develop Android app using Golang
 

Ähnlich wie Server log, monitoring and qo s platform of a messaging app

Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Vn mobile day2013 - zalo sharing
Vn mobile day2013 - zalo sharingVn mobile day2013 - zalo sharing
Vn mobile day2013 - zalo sharing
Thanh Dao
 

Ähnlich wie Server log, monitoring and qo s platform of a messaging app (20)

Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Migration from IBM Domino to IBM Verse
Migration from IBM Domino to IBM VerseMigration from IBM Domino to IBM Verse
Migration from IBM Domino to IBM Verse
 
Vn mobile day2013 - zalo sharing
Vn mobile day2013 - zalo sharingVn mobile day2013 - zalo sharing
Vn mobile day2013 - zalo sharing
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
 
From SLAs to XLAs | Shift to pro-active service delivery
From SLAs to XLAs | Shift to pro-active service deliveryFrom SLAs to XLAs | Shift to pro-active service delivery
From SLAs to XLAs | Shift to pro-active service delivery
 
Webinar: Migration from IBM Domino to IBM Verse
Webinar: Migration from IBM Domino to IBM VerseWebinar: Migration from IBM Domino to IBM Verse
Webinar: Migration from IBM Domino to IBM Verse
 
Inside Zalo: Developing a mobile messenger for the audience of millions
Inside Zalo: Developing a mobile messenger for the audience of millionsInside Zalo: Developing a mobile messenger for the audience of millions
Inside Zalo: Developing a mobile messenger for the audience of millions
 
Pest Systems that truly manage your business
Pest Systems that truly manage your businessPest Systems that truly manage your business
Pest Systems that truly manage your business
 
State vision
State visionState vision
State vision
 
Free Netflow analyzer training - diagnosing_and_troubleshooting
Free Netflow analyzer  training - diagnosing_and_troubleshootingFree Netflow analyzer  training - diagnosing_and_troubleshooting
Free Netflow analyzer training - diagnosing_and_troubleshooting
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
Mobile apps that win
Mobile apps that winMobile apps that win
Mobile apps that win
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
InfoQ QCon San Francisco 2009
InfoQ QCon San Francisco 2009InfoQ QCon San Francisco 2009
InfoQ QCon San Francisco 2009
 
Linkedin NUS QCon 2009 slides
Linkedin NUS QCon 2009 slidesLinkedin NUS QCon 2009 slides
Linkedin NUS QCon 2009 slides
 
Visual studio 2015 - Application Insights
Visual studio 2015 - Application InsightsVisual studio 2015 - Application Insights
Visual studio 2015 - Application Insights
 
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At IntuitHadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
Hadoop Summit 2016 - Evolution of Big Data Pipelines At Intuit
 
Lotus Notes Application to SharePoint Migration Process
Lotus Notes Application to SharePoint Migration ProcessLotus Notes Application to SharePoint Migration Process
Lotus Notes Application to SharePoint Migration Process
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Improving Citizen Outcomes with Robotic Process Automation (RPA)
Improving Citizen Outcomes with Robotic Process Automation (RPA)Improving Citizen Outcomes with Robotic Process Automation (RPA)
Improving Citizen Outcomes with Robotic Process Automation (RPA)
 

Server log, monitoring and qo s platform of a messaging app

Hinweis der Redaktion

  1. A lot of modules to keep track of25 main logic modules span across 6 application servers (75 logic instances)30 database instances span across 9 database server45 servers in totalDifferent application platform & networkMobile operating system iOS, Android, S40, Window PhoneMobile operators Viettel, Vina, Mobifone, foreign mobile operatorsNetwork typeWifi, 3G, EDGE, HSPDA, LTE, …
  2. 2 versions1st version Scribe as log shipperFast logHadoop as data storage Data replicationHive for ad hoc querying Acceptable query-time on large dataset Distributed calculationYou don’t have the real-time stats (30 minutes later or more)You cannot insert logic mid-streamYou don’t have access to the actual log files (Our System Operators have to setup sync scripts to sync logs to our servers)
  3. How could we improve ?Keep your enemy closeStart with what you have and slowly improve itWrite to localhost Scribe proxy worker won’t get stuck if remote server goes down.Write aggregator server using scribe interfaceKeep a local copy of logAggregate data, write to rrd databasePerform trends analysis Chaining log from our log aggregator to HadoopAggregate data don’t have to be very preciseApproximate data often workData written into RRD will be normalize anywayLocal copy of log don’t have to persist instantly Cache data and flush to disk in chunksEasy to extend once write exceed local capacitySetup a small HDFS ring and write directly to itRRD advantages Storage efficient & very easy to queryLots of plug-in to display RRD
  4. Nagios for alertingCacti for graphing chartsCustom build toolsProvide real-time aggregation information on multiple dimension both client & server sideTotal request of each actionFailure rateAverage execution time
  5. Things happen when you least expectedDetect anomaly data base on trends & anomaly detectionKeep improving your algorithm
  6. Reduce the number of alert can help greatly