Urbanesia - Development History

URBANESIA
Development History
Business Connect – 29 Oktober 2012

Prepared by: Batista Harahap

URBANESIA BETA V0
The first public iteration of Urbanesia

PROS

• Data structures in MySQL
• Effective memory caching implementations
• Effective SEO implementations
• Effective search server implementations
• Urbanesia is successfully consumed as a
Directory

CONS

• No effective separation of Backend & Frontend web
applications
• Source Code = Spaghetti Code
• Storing low value, high volume data in MySQL
• Many queries using GROUP BY with highly populated tables
• A warm boot will cause +20 seconds to generate any page
• Difficult to scale horizontally & vertically
• Very low concurrency

• The product’s identity is weak
• So many features left unused by users

WHAT WE LEARNED

• Do NOT use MySQL as session storage
• Use NoSQL database for low value, high volume
data
• Separate backend & frontend web application,
create APIs for backends
• Use output caching where available
• When using PHP-APC, make sure apc.stat = 0
• Increase concurrency by reverse proxying
requests to Apache

CHALLENGES

• Handle Google Bots traffic of over 1 TB/month
with only 2 servers
• Do output caching with Codeigniter
• Achieving sub second page generation even in
warm boots
• Redesign backend by creating an API for our
native apps

URBANESIA V1
The second iteration based on refined codes
and infrastructure design

PROS

• Achieved sub second page generation in warm boots
• Aggressive & effective caching mechanism
• Optimized MY_Controller
• Session storage handled by Memcache
• MySQL read/write access lowered from ~400 qps to only 1 qps
• Lean memory usage in database server
• Created an OAUTH enabled API
• Concurrency increased by using nginx as reverse proxy
• The same server setup can theoretically handle 10x the current traffic
without scaling horizontally
• Google bots are only limited by bandwidth instead of efficient codes
• Index properly with MySQL
• Don’t use MySQL, used custom built MySQL alternative: Percona Server

CONS

• Source code = Spaghetti code
• Unpredictable behavior of codes because of V0 inheritance,
when more rows fill, queries are bottlenecks
• Subqueries still exists
• Everything is still synchronous, no message queue yet
• The end product fails to impress the illusion of speed (fast)
to users
• New hires have a steeper learning curve because of the
inherited complexity added with V1’s own complex
• Still difficult to scale horizontally & vertically

WHAT WE LEARNED

• CodeIgniter is enabling fast product delivery but optimization &
efficiency of codes are questionable at best
• Need to enable asynchronous architecture
• Do not do things realtime, instead offload to message queues
• To impress users with the illusion of speed, JavaScript must be
thoroughly implemented
• Emails should not be handled by ourselves, use third party email
solutions like AWS SES
• Offload server side international bandwidth to clients, for
Facebook, use Facebook JS SDK instead of the PHP SDK
• The product gains more engagements with contents that are more
focused (thematic)
• Speed of content delivery is important to engagement metrics

CHALLENGES

• Build a third iteration with a strong identity based on users’
personas
• Focus more on verticals, create the illusion of a
discovery/recommendation platform
• Progressive Disclosure of contents
• A JavaScript framework that is light, fast and minimal
dependencies
• Make everything asynchronous and message/event based
• Redefine Urbanesia’s atomic data structure
• Do MySQL JOINs in server side
• Get the data first FAST, compute later

PRODUCTS & TECHNOLOGIES
Does the product makes the technology
or the technology makes the product?

THE PRODUCT MAKES THE TECHNOLOGY!

REAL WORLD EXAMPLES

• We need to know which part of Urbanesia will
really work for users
• Store the preferences for each users’ dynamic
activity
• Make calculations of other contents a user
might consume
• Present the content unobtrusively
• Do it fast and almost realtime

TECHNICAL SPEAK

We need to know which part of Urbanesia will really work
for users

• Mine all user’s data each time they visit, including
anonymous users
• Log everything FAST and asynchronously
• Low value & high volume data
• Avoid MySQL at all cost
• Model data based on choosen NoSQL database model

TECHNICAL SPEAK

Introducing Redis

• Read/Write data from memory
• Stores data on disk
• Key/Value similarity with Memcache
• Ability to perform atomic tasks without worrying states
• Redis’ primitive data types are very simple
• Ideal for low value/high volume data
• Less is more!

TECHNICAL SPEAK

Store the preferences for each users’ dynamic activity

• Simple increments
• Perfect for Sorted Hashmaps in Redis
• Need them sorted so analytics functions is supported
primitively by Redis == High Performance
• Fire & Forget – Consider using async frameworks like
Node.js & trigger using JavaScript
• Why trigger with JavaScript? To make sure at the very
least that it’s actually users accessing the page

TECHNICAL SPEAK

Node.js & Socket.io

• Node.js is a Network ready daemon with Chrome’s V8
JavaScript engine inside
• Node.js is asynchronous by default (event based)
• Socket.io is the transport used for data
• Socket.io is abstracted to fallback gracefully between
Websocket, Flash and plain AJAX
• JavaScript clients should only subscribe to onFailed
events to minimize overhead

TECHNICAL SPEAK

Make calculations of other contents a user
might consume

• Use Machine Learning algorithms to learn
users behaviors
• Naïve Bayes Classifier to the rescue
• Independent per keyword assumptions
• Proven algorithm used by many big websites

TECHNICAL SPEAK

Naïve Bayes Classifier

• There is no wrong or right assumptions, only
accuracy
• Accuracy is increased with more data and better
classifications
• Relatively easy to code
• Lots of libraries out there in different languages

TECHNICAL SPEAK

Present the content unobtrusively

• Giving users the illusion that we understand
them
• Do not make this feature dominant
• Show it where you want the content look
smart

TECHNICAL SPEAK

Do it fast and almost realtime

• Fast is an illusion
• Realtime is overrated
• If you don’t have enough resource to do so,
schedule it and pre generate content
• Scale vertically

Talk is cheap, show me the CODES!

URBANESIA @ Github

https://github.com/Urbanesia

URBANESIA @ Github

https://github.com/Urbanesia/Simple-Naive-
Bayes-Classifier-for-PHP

NAÏVE BAYES CLASSIFIER

First Iteration:
• Took ~1000 seconds to classify 1 keyword
• MySQL as storage
• No micro optimizations


Second Iteration:
• Took ~400 seconds to classify 1 keyword
• MongoDB as storage
• Macro optimization trimmed 600 of 1000
seconds


Third Iteration:
• Took ~1 second to classify 1 keyword
• Redis as storage
• Insane macro optimization boost


Fourth Iteration:
• Took 0.01428 second to classify 1 keyword
• Redis as storage
• Reworked classification algorithm
• Get the data first and compute later
• More memory usage, faster execution time


Fifth Iteration:
• Reworked the trainer methods
• Created deTrain method to update data
• Created helpers to do keyword blacklists
• Consistent performance from CLI or HTTP


What we learned:
• Always be open to new things
• Geek Talk with peers from the industry
• Very talented people will always come up with smarter and
better way to do something
• Decide, get smart or get smarter?
• Algorithms are the engine but it doesn’t mean anything
without implementation
• Consider opening up source codes for others to examine,
the smarter the population, the better products we create
• Focus on USERS instead of technology


More insights below:

http://www.bango29.com/go/blog/2012/naive-
bayes-classifier-revisited

Geekball
Every Tuesday, 17.00 – 19.00
Basket Hall C, Senayan

OUR PRODUCTS
Urbanesia’s product lineup

URBANESIA WINDOWS 8

http://urho.me/vkND6

URBANESIA ANDROID

http://urho.me/BSsqR

JAJAN
Jajan is Open Source, get the source codes:
• Blackberry - https://github.com/Urbanesia/Jajan-Blackberry
• Android - https://github.com/Urbanesia/Jajan
• HTML5 - https://github.com/Urbanesia/jajan-html5

Platforms:
• Blackberry - https://appworld.blackberry.com/webstore/content/54742/
• Android - https://play.google.com/store/apps/details?id=com.bango.jajan
• iOS - https://itunes.apple.com/us/app/jajan/id527278768?mt=8
• HTML5 - https://jajan5.urbanesia.com/

URBANESIA BALI

http://urho.me/HPLT9

WHAT’S NEXT
Our third iteration of Urbanesia.com

WHAT’S NEXT

• A rework from scratch both in Product Design
and Technical Implementation
• Focusing more on users and our RICH content
• A social network useful for everyday city life
• Machine learning implementation for our
recommendation engine

WHAT’S NEXT

Live Beta opening soon!
Email to dev@urbanesia.com for access 

KEY TAKEAWAYS

• Empower people working with you
• Invest in company culture
• Focus on USERS, not technology
• Macro to Micro optimizations & scaling
• Be open to new ideas (things)
• Geek Talks over whatever like Basketball or Beer
• Good is not Great
• Whatever WORKS

THANK YOU
Email me: batista@bango29.com
Twitter: @tista
Github: tistaharahap
Blog: www.bango29.com

Urbanesia - Development History

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Urbanesia - Development History

Ähnlich wie Urbanesia - Development History (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Urbanesia - Development History