8. PROS
• Data structures in MySQL
• Effective memory caching implementations
• Effective SEO implementations
• Effective search server implementations
• Urbanesia is successfully consumed as a
Directory
9. CONS
• No effective separation of Backend & Frontend web
applications
• Source Code = Spaghetti Code
• Storing low value, high volume data in MySQL
• Many queries using GROUP BY with highly populated tables
• A warm boot will cause +20 seconds to generate any page
• Difficult to scale horizontally & vertically
• Very low concurrency
• The product’s identity is weak
• So many features left unused by users
10. WHAT WE LEARNED
• Do NOT use MySQL as session storage
• Use NoSQL database for low value, high volume
data
• Separate backend & frontend web application,
create APIs for backends
• Use output caching where available
• When using PHP-APC, make sure apc.stat = 0
• Increase concurrency by reverse proxying
requests to Apache
11. CHALLENGES
• Handle Google Bots traffic of over 1 TB/month
with only 2 servers
• Do output caching with Codeigniter
• Achieving sub second page generation even in
warm boots
• Redesign backend by creating an API for our
native apps
13. PROS
• Achieved sub second page generation in warm boots
• Aggressive & effective caching mechanism
• Optimized MY_Controller
• Session storage handled by Memcache
• MySQL read/write access lowered from ~400 qps to only 1 qps
• Lean memory usage in database server
• Created an OAUTH enabled API
• Concurrency increased by using nginx as reverse proxy
• The same server setup can theoretically handle 10x the current traffic
without scaling horizontally
• Google bots are only limited by bandwidth instead of efficient codes
• Index properly with MySQL
• Don’t use MySQL, used custom built MySQL alternative: Percona Server
14. CONS
• Source code = Spaghetti code
• Unpredictable behavior of codes because of V0 inheritance,
when more rows fill, queries are bottlenecks
• Subqueries still exists
• Everything is still synchronous, no message queue yet
• The end product fails to impress the illusion of speed (fast)
to users
• New hires have a steeper learning curve because of the
inherited complexity added with V1’s own complex
• Still difficult to scale horizontally & vertically
15. WHAT WE LEARNED
• CodeIgniter is enabling fast product delivery but optimization &
efficiency of codes are questionable at best
• Need to enable asynchronous architecture
• Do not do things realtime, instead offload to message queues
• To impress users with the illusion of speed, JavaScript must be
thoroughly implemented
• Emails should not be handled by ourselves, use third party email
solutions like AWS SES
• Offload server side international bandwidth to clients, for
Facebook, use Facebook JS SDK instead of the PHP SDK
• The product gains more engagements with contents that are more
focused (thematic)
• Speed of content delivery is important to engagement metrics
16. CHALLENGES
• Build a third iteration with a strong identity based on users’
personas
• Focus more on verticals, create the illusion of a
discovery/recommendation platform
• Progressive Disclosure of contents
• A JavaScript framework that is light, fast and minimal
dependencies
• Make everything asynchronous and message/event based
• Redefine Urbanesia’s atomic data structure
• Do MySQL JOINs in server side
• Get the data first FAST, compute later
17. PRODUCTS & TECHNOLOGIES
Does the product makes the technology
or the technology makes the product?
19. REAL WORLD EXAMPLES
• We need to know which part of Urbanesia will
really work for users
• Store the preferences for each users’ dynamic
activity
• Make calculations of other contents a user
might consume
• Present the content unobtrusively
• Do it fast and almost realtime
20. TECHNICAL SPEAK
We need to know which part of Urbanesia will really work
for users
• Mine all user’s data each time they visit, including
anonymous users
• Log everything FAST and asynchronously
• Low value & high volume data
• Avoid MySQL at all cost
• Model data based on choosen NoSQL database model
21. TECHNICAL SPEAK
Introducing Redis
• Read/Write data from memory
• Stores data on disk
• Key/Value similarity with Memcache
• Ability to perform atomic tasks without worrying states
• Redis’ primitive data types are very simple
• Ideal for low value/high volume data
• Less is more!
22. TECHNICAL SPEAK
Store the preferences for each users’ dynamic activity
• Simple increments
• Perfect for Sorted Hashmaps in Redis
• Need them sorted so analytics functions is supported
primitively by Redis == High Performance
• Fire & Forget – Consider using async frameworks like
Node.js & trigger using JavaScript
• Why trigger with JavaScript? To make sure at the very
least that it’s actually users accessing the page
23. TECHNICAL SPEAK
Node.js & Socket.io
• Node.js is a Network ready daemon with Chrome’s V8
JavaScript engine inside
• Node.js is asynchronous by default (event based)
• Socket.io is the transport used for data
• Socket.io is abstracted to fallback gracefully between
Websocket, Flash and plain AJAX
• JavaScript clients should only subscribe to onFailed
events to minimize overhead
24. TECHNICAL SPEAK
Make calculations of other contents a user
might consume
• Use Machine Learning algorithms to learn
users behaviors
• Naïve Bayes Classifier to the rescue
• Independent per keyword assumptions
• Proven algorithm used by many big websites
25. TECHNICAL SPEAK
Naïve Bayes Classifier
• There is no wrong or right assumptions, only
accuracy
• Accuracy is increased with more data and better
classifications
• Relatively easy to code
• Lots of libraries out there in different languages
26. TECHNICAL SPEAK
Present the content unobtrusively
• Giving users the illusion that we understand
them
• Do not make this feature dominant
• Show it where you want the content look
smart
27. TECHNICAL SPEAK
Do it fast and almost realtime
• Fast is an illusion
• Realtime is overrated
• If you don’t have enough resource to do so,
schedule it and pre generate content
• Scale vertically
31. NAÏVE BAYES CLASSIFIER
First Iteration:
• Took ~1000 seconds to classify 1 keyword
• MySQL as storage
• No micro optimizations
32. NAÏVE BAYES CLASSIFIER
Second Iteration:
• Took ~400 seconds to classify 1 keyword
• MongoDB as storage
• Macro optimization trimmed 600 of 1000
seconds
• No micro optimizations
33. NAÏVE BAYES CLASSIFIER
Third Iteration:
• Took ~1 second to classify 1 keyword
• Redis as storage
• Insane macro optimization boost
• No micro optimizations
34. NAÏVE BAYES CLASSIFIER
Fourth Iteration:
• Took 0.01428 second to classify 1 keyword
• Redis as storage
• Reworked classification algorithm
• Get the data first and compute later
• More memory usage, faster execution time
35. NAÏVE BAYES CLASSIFIER
Fifth Iteration:
• Reworked the trainer methods
• Created deTrain method to update data
• Created helpers to do keyword blacklists
• Consistent performance from CLI or HTTP
36. NAÏVE BAYES CLASSIFIER
What we learned:
• Always be open to new things
• Geek Talk with peers from the industry
• Very talented people will always come up with smarter and
better way to do something
• Decide, get smart or get smarter?
• Algorithms are the engine but it doesn’t mean anything
without implementation
• Consider opening up source codes for others to examine,
the smarter the population, the better products we create
• Focus on USERS instead of technology
37. NAÏVE BAYES CLASSIFIER
More insights below:
http://www.bango29.com/go/blog/2012/naive-
bayes-classifier-revisited
51. WHAT’S NEXT
• A rework from scratch both in Product Design
and Technical Implementation
• Focusing more on users and our RICH content
• A social network useful for everyday city life
• Machine learning implementation for our
recommendation engine
52. WHAT’S NEXT
Live Beta opening soon!
Email to dev@urbanesia.com for access
54. KEY TAKEAWAYS
• Empower people working with you
• Invest in company culture
• Focus on USERS, not technology
• Macro to Micro optimizations & scaling
• Be open to new ideas (things)
• Geek Talks over whatever like Basketball or Beer
• Good is not Great
• Whatever WORKS