Growing too quickly may sound like a nice problem to have, unless you are the one having it. A growing business can’t afford not to keep up with customer demand and availability. Don’t be left behind. Come learn how start-ups Chute and Euclid kept up with real-time user-generated data from over 3,000 apps and 2 TB of metadata and stayed ahead of retail peak-time traffic, all with AWS. Hear how they used all that data on their own growth to propel their business even further and deepen relationships with customers. Not planning for growth is just like not planning to grow!
4. How Euclid Works
We use Wi-Fi technology to turn in-store behavior into actionable insights
XX:XX:XX:XX:XX:XX
Wi-Fi AP detects smartphone
MAC addresses
Shopper carrying smartphone
walks by or into store
Euclid analyzes data
for trends and insights
Insights on customer acquisition,
engagement and retention
5. Market Leader in Real World Analytics
•
•
•
•
•
First to develop proprietary Wi-Fi based analytics
–
–
Most advanced data analytics capabilities and experience in retail environments
Backed by tier 1 investors: Series A led by NEA, Series B led by Benchmark Capital
World-class executive team
–
–
Co-founder of Google Analytics, Founding team of ShopperTrak
Executive experience from Google, SAP, Ariba and Tibco
Experience with the world’s leading retailers
–
Specialty retail, QSR, department store, big box, automotive, malls and more
Largest data scale and rapidly accelerating adoption
–
–
–
Recording >5B events per day
Dataset with >100M unique devices (shoppers)
Gartner Cool Vendor 2012; Idea Innovation Award Winner: Business Technology 2012
Market leadership recognized by:
6. Euclid is a
Data Company
As of October, 2013, the
Euclid Network:
• Covers over 600
shopping centers, malls,
and street locations
• Processes 50 TB of raw
data
• Collects over 30 GB of
raw data daily
Acquire
Data
•Reliable
•Durable
•Scalable
Process
Data
•Efficient
•Flexible
•Scalable
•Versatile
Deliver
Data
•Richness
•Sophistication
•Value
7. Euclid’s Challenges
Common Challenges
• Scaling
• Performance
• Cost effectiveness
• Removing the technical
barriers for innovation
• “Failing fast”
Unique Challenges
• Recomputing the entire
history of Euclid data!
– Need fast results
– Need a lot of computational
power, sometimes greater
than 100x of regular daily
compute needs
8. Euclid’s Use of AWS
Euclid started with AWS from Day One
- Amazon EC2, Amazon RDS, Amazon EMR,
Amazon S3
- AWS Elastic Beanstalk
- Amazon Redshift
Heroku from Amazon Partner Network (APN)
12. Data Acquisition - Principles
• Log to Amazon EBS Volume – high I/O
performance
• As “dumb” as possible: reliable
• Fork data from disk to
– Amazon S3 for batch processing
– Kafka messaging service for real time processing
14. Data Processing - Pipeline
Raw Data
Map
Reduce
(EMR)
Product
dashboard, insights
R/D
Analytics
15. Pipeline – Dual Purposes
Two worlds, one platform
• Big Data Engineering – noSQL
– Pig Latin with Amazon EMR (Java, Python UDFs)
– Work flows (Jenkins), shell scripting
• Analytics, Analysts, Business – SQL
– Excel
– Tableau
– Maybe some Python, etc.
16. Pipeline - Architecture
Amazon S3
SQL DB: MySQL, Redshift
Raw Data
Meta
Data
Aggr.
Level 1
3rd Party
Data
Some Raw Data
Analytics
Aggr.
Level 1
Direct
DB Load
Meta
Data
3rd Party
Data
Models
Algorithms
Aggr.
Level n
MapReduce
MySQL
Product
dashboard, insights
SQL
Aggr.
Level n
R&D
Models
Algorithms
17. SQL: MySQL, Amazon Redshift, both by AWS
• Started with MySQL, Amazon Redshift Preview Jan
2013
• MySQL 1TB limit vs Amazon Redshift PB scale
• Performance, night and day
– E.g., count distinct of 100m rows: 5h in MySQL, 2m in Amazon Redshift
• Amazon Redshift: killer data warehouse
– Low cost
– No DBA!
– Easy integration
18. Pipeline - Monitoring
• System monitoring provided by AWS
• Workflow monitoring with Jenkins
– Failure notification
– Dependency management
• Data quality (including acquisition) monitoring
– Also utilize Jenkins
– Scripts that check data at various stages
– Each script as a job in the Jenkins workflow
20. AWS Benefits
• “Apps not Ops” – Euclid does not have/need an
Ops team
• Scale up and down on demand
• Pay as we go
• Agile (innovations, time-to-market)
22. Data
● Real time analytics is hard
● Hadoop!
○ Sqoop imports SQL data to HDFS
○ Clojure
○ Scalding (github.com/twitter/scalding)
● Elasticsearch, Logstash
○ parse logs to track activity for customers
24. N number of
EC2 instances
● varnish
● logstash
plugin front ends
Kibana
ELB
Redis cluster
ElasticSearch
Events Server
● nginx
● logstash
API
25. Automation through DevOps
● Chute has 100 servers
○ Configured many manually
○ 82? of 100 now managed by Chef
● Whirr
● Sqoop and Cron to automate data import
● route53 with Chef for urls
26. Uptime
● Architect applications to scale horizontally
○ AWS launches servers on demand
○ spot and reserve pricing
● Keep services running with Chef
○ Chef makes it easy to wrap programs as
a service on AWS
27. Monitoring
● newrelic
○ server resource monitoring
○ application monitoring
● logstash + kibana
○ elasticsearch backend
○ redis (cluster)
○ can monitor server logs
28. Please give us your feedback on this
presentation
CPN209
As a thank you, we will select prize
winners daily for completed surveys!