AWS Community Day CPH - Three problems of Terraform
Â
Using Task Queues and D3.js to build an analytics product on App Engine
1. Using Task Queues and
D3.js to build an
analytics product on
App Engine
Warren Edwards
Founder, Waizee
2. TODAY
How Do We Handle All The Data?!?
Look at the Product
Why App Engine? Why D3.js?
Task Queues in App Engine
Code Samples
A Quiz!
Wrap Up
3. All of the data stored in
the world currently
about 4 zettabytes
according to EMC.
At current growth rates,
that data will surpass a
yottabyte in 2030.
If each bit was represented by a grain of
sand, a yottabyte would be about five
percent the mass of the Moon.
Data surpasses human readability.
9. HTTP Request Task Queue
Full access to data store? YES YES
Atomic transactions? YES YES
Write in Python / Java / Go? YES YES
Maximum request lifetime 30-60 sec Up to 10 min
Timing of execution Immediate Up to 30 days
Retry Can be done
manually, no policy
Automatic by policy
Concurrent requests No set limit Limited by policy
TQ build on App Engine's HTTP Request
but liberate server use from user interaction
Introducing Task Queues (TQ)
10. Appeal of Task Queues
Well suited to number crunching
â Long-lived jobs for running analytics
â Full access to data store and messaging to build
on your existing App Engine knowledge
â Save state back to Data Store and call the next
task
Task Queues allow you to crunch data
"while you wait"
11. USER
UPLOADS
DATA
Task
#1
Task
#2
. . . Task
#N
RESULT
Save to
Data Store
Pass
Parameters
Cascade Tasks in Queue for Multipass
Processing
Save to
Data Store
Program Flow
Data Flow
13. Let's Look at a Code Sample
taskqueue.add(queue_name='analyze', url='/work1',
params={'key': keyID})
class WorkerTheFirst(webapp2.RequestHandler):
def post(self):
keyID = self.request.get('key')
app = webapp2.WSGIApplication([('/', StartPage),
('/work1', WorkerTheFirst),
('.*', ErrPage)
])
Add Task into Queue
Define Task with Retrieval of Parameters
Associate Task with Handler
14. One Task Calls Another
class WorkerTheFirst(webapp2.RequestHandler):
def post(self):
keyID = self.request.get('key')
#
# First pass of work here
#
taskqueue.add(queue_name='analyze', url='/work2',
params={'key': keyID})
class WorkerTheSecond(webapp2.RequestHandler):
def post(self):
keyID = self.request.get('key')
#
# Second pass of work here
#
app = webapp2.WSGIApplication([('/', StartPage),
('/work1', WorkerTheFirst),
('/work2', WorkerTheSecond),
('.*', ErrPage)
])
15. total_storage_limit: 120G # Max for free apps is 500M
queue:
# Queue for analyzing the incoming data
- name: analyze
rate: 35/s
retry_parameters:
task_retry_limit: 5
task_age_limit: 2h
# Queue for user behavior heuristics
- name: heuristic
rate: 5/s
# Queue for doing maintenance on the data store or site
- name: kickoff
rate: 5/s
Setting up Task Queues in queue.yaml
16. svg = d3.select('body')
.append('svg')
.attr('class', 'circles')
.attr('width', width)
.attr('height', height)
svg.append('g').selectAll('circle')
.data(data)
.enter()
.append('circle')
.attr('transform', 'translate(' + pan + ', 0)')
# pan allows moving te whole graph slightly
# to account for long text labels
svg.selectAll('circle')
.attr('cx', function(d) {return x(d.v2)})
.attr('cy', function(d) {return y(d.v1)})
.attr('r', dot_out) # dot_out scales the size
.attr('fill', function(d) {return d.v4})
Core of the D3 Code
17. svg = d3.select('body')
.append('svg')
.attr('class', 'circles')
.attr('width', width)
.attr('height', height)
svg.append('g').selectAll('circle')
.data(data)
.enter()
.append('circle')
.attr('transform', 'translate(' + pan + ', 0)')
# pan allows moving te whole graph slightly
# to account for long text labels
svg.selectAll('circle')
.attr('cx', function(d) {return x(d.v2)})
.attr('cy', function(d) {return y(d.v1)})
.attr('r', dot_out) # dot_out scales the size
.attr('fill', function(d) {return d.v4})
Program Writes Its Own Code !
Axes are set
heuristically by
server software
18. Titles Are Set By Heuristic Text Analysis
var title = '{{ pagetitle }}'
var subtitle = '{{ pagesubtitle }}'
var title = 'Google+ Rating More Important Metric Than
Star Rating'
var subtitle = 'Survey of Productivity Apps in the
Chrome Web Store, Nov 2012'
In Django template
Rendered in Javascript to the browser
Labels were pulled Heuristically from input - not hard coded!
22. Choose the Correct Task Queue Call
taskqueue.add(queue_name='analyze', url='/work2',
param={key: keyID})
queue.add(queue='analyze', url='/work2',
params={'key': keyID})
taskqueue.add(queue='analyze', url='/work2',
param={'key': keyID})
taskqueue.add(queue_name='analyze', url='/work2',
params={'key': keyID})
taskqueue.add(queue='analyze', url='/work2',
params={key: keyID})
A
B
C
D
E
Correct Answer is D
23. TQ Open a World of Possibilities
You can send tasks to different versions
of your app
â Automated test of new version of app before Go
Live
You can access the queueâs usage data
â Your app can monitor its own consumption of
tasks through QueueStatistics class
Task Queue + Crowdsource = ???
â Software application instructing humans !
24. Task Queues allow "while you wait" processing
â Allow server task to run autonomously
â Cascade tasks for multistep processing
â Flexible functionality to create great products
Task Queues provide a great tool for
automating the understanding of data
D3.js offers flexible, stable tool for viz of data
â Works nicely with automated scripting
â Lush visualizations but not pre-packaged
â Leverage huge traction in San Francisco
D3.js is best platform for visualization
using automated processing of data
25. Questions?
Do you have a passion for analytics? Letâs talk!
warren@waizee.com
@campbellwarren
26. We are your number cruncher in
the cloud that understands your
data and shows you only what is
most important.
27. Data Sources
The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and
Biggest Growth in the Far East. IDC sponsored by EMC. December
2012
diameter and volume of the Moon: Wolfram Alpha
Yottabyte representation: Waizee calculations
2013 Gartner Magic Quadrant for Business Intelligence and Analytics
Platforms