David Arcos from Catchoom presented at NoSQLMatters Barcelona (6 Oct 2012) how Catchoom Recognition Service (a SaaS platform for visual recognition) was implemented using Redis and other deployment tools. David argues about the necessity of NoSQL for critical components of the service.
1. NoSQL matters in
Catchoom Recognition Service
David Arcos
david.arcos@catchoom.com | @DZPM
catchoom.com | @catchoom
catchoom.com | @catchoom
2. 1) Introduction
2) What did we need?
3) How we build it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
3. Hi! I'm David Arcos
- Python/Django developer (>4yr)
- Web backend, distributed systems,
databases, scalability, security
- Team leader at Catchoom
- You can follow me at @DZPM
David Arcos | @DZPM catchoom.com | @catchoom
4. Catchoom technology recognizes an
object by searching through a large
collection of images in a fraction of a
second.
Catchoom targets application
developers and integrators.
David Arcos | @DZPM catchoom.com | @catchoom
5. Our customers are leaders in Augmented Reality
David Arcos | @DZPM catchoom.com | @catchoom
6. Visual Recognition:
âIdentify an object in front of the camera by comparing it
to a huge collection of reference imagesâ
David Arcos | @DZPM catchoom.com | @catchoom
7. Examples of recognized objects:
- CD/DVD and book covers
- Newspapers and magazines
- Logos and brands
- Posters
- Packaged goods
- Monuments and places
David Arcos | @DZPM catchoom.com | @catchoom
8. Catchoom Recognition Service:
- Cloud-based Visual Recognition (SaaS)
- RESTful API to integrate
- âAdd VR features to your app/platformâ
David Arcos | @DZPM catchoom.com | @catchoom
9. - Small team of 4 developers, doing SCRUM
David Arcos | @DZPM catchoom.com | @catchoom
10. 1) Introduction
2) What did we need?
3) How we build it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
11. Minimum requirements:
- a public API for the final users to perform Visual
Recognition
- a private API for the customer to manage the
Collections and get statistics
- a nice website for the customer, providing the
functionality of both APIs
David Arcos | @DZPM catchoom.com | @catchoom
12. Must be flexible:
- A customer who does Augmented Reality, and
needs a 3D model (binary format) in the item
- Another one who needs just the item id
- Our data model needs to allow everything
(structured and unstructured data)
David Arcos | @DZPM catchoom.com | @catchoom
13. Must be reliable:
- Images or data should never be lost
- Avoid single points of failure
- We need redundancy
David Arcos | @DZPM catchoom.com | @catchoom
14. Must be very fast:
âLayar has been using Catchoomâs Visual Search technology since the
launch of Layar Vision, allowing users to quickly view the AR content placed
on top of images by just pointing their camera to the image.
Weâve benchmarked Catchoomâs technology in 2011 against 3 of their main
competitors and found they had the best results both on speed and on
successful matches (including lowest false positives)â
Dirk Groten â CTO of Layar
David Arcos | @DZPM catchoom.com | @catchoom
15. 1) Introduction
2) What did we need?
3) How we built it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
17. The Panel:
- typical customer portal:
- manage your Collections, run Visual Recognition
- get usage statistics
- and configure the payment method :)
David Arcos | @DZPM catchoom.com | @catchoom
20. Mobile apps:
- for Android, iOS
- use the Visual Recognition API
- the code will be published
David Arcos | @DZPM catchoom.com | @catchoom
21. Data models:
- Collection: a set of items. Has at least one token.
- Item: has at least one Image. Has metadata.
- Image: you want several images if the item has different
sides, logos, flavours...
- Token: for authenticating the requests.
David Arcos | @DZPM catchoom.com | @catchoom
22. Components:
- the platform is highly modular
- âDo one thing, and do it wellâ
- they pass json messages
- optimized hardware settings
David Arcos | @DZPM catchoom.com | @catchoom
23. - Frontend:
gets the API request
- Extractor:
extracts the visual points
- Collector:
message exchange
- Searcher:
looks for matches
David Arcos | @DZPM catchoom.com | @catchoom
24. Required NoSQL features:
- key-value storage
- cache
- message lists
- message pub/sub
- real-time analysis
What servers have we chosen?
David Arcos | @DZPM catchoom.com | @catchoom
27. 1) Introduction
2) What did we need?
3) How we build it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
28. Performance:
- Can't afford writing to disk, or querying slow databases
- Using Redis, everything stays on memory
- One V.R. query takes just 300 ms
David Arcos | @DZPM catchoom.com | @catchoom
29. Scalability:
- Need to scale different components, separately
- Load balancing using Redis Lists:
BLPOP: Remove and get
the first element in a list,
or block until one is available
- But focus on the bottlenecks!
David Arcos | @DZPM catchoom.com | @catchoom
30. Unstructured data: query
- A query object has many optional parameters
- each component can add/remove fields dynamically
- schema change between versions
- Can't fit in a SQL table
- We model the query in Redis as a json
David Arcos | @DZPM catchoom.com | @catchoom
31. Unstructured data: metadata
- Metadata is optional and unstructed, can be from a json to a
binary blob
- Can't fit in a SQL table, and would be too slow
- Serve the data from Redis, and use S3 as a backup
- Warning: in the future, if we have huge metadata files,
Redis will get out of memory. We'll improve this approach
David Arcos | @DZPM catchoom.com | @catchoom
32. Availability:
- Avoid single points of failure. Replicate everything!
- Replicating a SQL server is painful
- Redis instances configured as Master/Slave
- When the master dies:
- promote a slave to be the new master
- reconfigure the other slaves to use this new master
- Redis Sentinel does this (beta)
David Arcos | @DZPM catchoom.com | @catchoom
33. 1) Introduction
2) What did we need?
3) How we build it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
34. Do real-time calculations:
- Usage statistics
- total, monthly, daily, hourly
- per image, item or collection
- Metric monitoring for internal use
- response times, queue size, etc
- QoS: enforce rate limiting
- max hits per minute
David Arcos | @DZPM catchoom.com | @catchoom
35. Sorted Sets:
- To create indexes and filters
- In example, âMost recognized imagesâ (sorted by hits)
- Updating the Sorted Set, no need to reconsolidate:
ZADD Add one or more members to a sorted set,
or update its score if it already exists
David Arcos | @DZPM catchoom.com | @catchoom
36. Cache:
- Redis is compatible with memcached API
- Cache everything:
- Sessions, metadata, etc
- ...although the website is internal: no bottleneck here
- Better focus on optimizing other stuff!
David Arcos | @DZPM catchoom.com | @catchoom
37. Volatile data:
- Redis can set an expiration time for a value
- Very easy for:
- implementing timeouts
- removing old queries
- adding temporary capping
David Arcos | @DZPM catchoom.com | @catchoom
38. Messages:
- Redis implements pub/sub and lists.
- Publish/Subscribe to a channel
- all components get the message
- use it for monitoring
- List: push/pop messages
- only one component gets the message
- use the blocking versions for load balancing
David Arcos | @DZPM catchoom.com | @catchoom
39. 1) Introduction
2) What did we need?
3) How we build it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
40. Django apps compatibility:
- we use Django and several contrib and external apps.
- (âStanding in the shoulder of giantsâ)
- but no support for NoSQL in Django ORM
- dropping SQL is not an option!
- we use MySQL. South migrations.
David Arcos | @DZPM catchoom.com | @catchoom
41. 1) Introduction
2) What did we need?
3) How we build it
4) Advantages of NoSQL
5) Cool uses of NoSQL
6) Limits
7) Conclusion
David Arcos | @DZPM catchoom.com | @catchoom
42. Summary:
- We use a combination of SQL and NoSQL
- Using NoSQL was necessary to meet the requirements
- There are a lot of different uses for NoSQL
David Arcos | @DZPM catchoom.com | @catchoom
43. Recommendations:
- There is no silver bullet
- Use the best tool for each task
- But avoid unneeded complexity!
- Try Redis. Don't do a migration, just add it to your stack
David Arcos | @DZPM catchoom.com | @catchoom
44. Thanks for attending!
- Our beta will be ready soon.
Get a free trial at http://catchoom.com
- Contact me at
david.arcos@catchoom.com
- Questions?
David Arcos | @DZPM catchoom.com | @catchoom
Hinweis der Redaktion
Looks easy?
(timestamps, the image index, debug info...)
Efficiency Totals, per month, per day, per image, per item, per collection Response times, queue size Redis is compatible with memcached API Avoid hitting the db