This is my presentation from JSConf 2011. I am proposing a new Web protocol to improve performance across the Internet. It's based on a dual-band protocol layered over TCP/IP and UDP and is backward compatible with existing HTTP-based systems.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Notes on a High-Performance JSON Protocol
1. Toward A High-
Performance
JSON Protocol:
Notes
JS.Conf Presented By:
May 3rd, 2011 Daniel Austin
V 0.9 Yahoo! Exceptional Performance
2. 1
Introduction: Starting From Scratch
AGENDA
2 Protocol Design
3
Results & Current State
4 Where Do We Go From Here?
3. Exceptional Performance: What we do…
Create great Tools for users to optimize their pages, like
YSlow.
Optimize the User Experience for Yahoo! users every
day
Research on how to make the Web smarter and faster
4. Goals for Today’s Talk
Explain the goals and design for SCRATCH, and why
we are excited about using JSON to make the Web
faster and smarter
Describe our Experiments and what we’ve learned
about protocol design, and where we are thinking of
going next
Request Feedback from our colleagues for ideas and
improvements!
Prepared for
Client name
5. Starting From SCRATCH
“We wanted to design a super-fast data
protocol that would let us prioritize content and
manage context while still working at
scale…initially we ended up more or less re-
designing TCP…
…then we tore it up and started all over
again…that’s why we called it SCRATCH”
6. Elevator Pitch
SCRATCH is a new dual-
band data protocol for the
Web.
It’s designed to work together
with HTTP/TCP as a control
channel [ ] and use
SCRATCH/UDP as it’s data
channel [ ].
Prepared for
Client name
7. Goals for Scratch Data Channel [Work in Progress!]
• Fast
Bandwidth efficiency up by 2x to 50%
• Smart ‘semantic awareness’
Managed contexts for state, identity, etc.
as first-class objects in the system
• Robust but lightweight
To target slow Networks, mobile and tablet
devices, low-bandwidth IoT chatter…
8. 1
Introduction: Starting From Scratch
AGENDA
2 Protocol Design
3
Results & Current State
4 Where Do We Go From Here?
9. Distribution of Web Objects By Size & TCP Efficiency
TCP & Bandwidth Efficiency
•Slow for small objects
•Parallelism not uniform
•No context = redundancy
•Trades reliability for
performance
•Not designed for small
incremental changes
•Typically
W. Shi et al. / J. Parallel Distrib. Comput. 63 (2003) 963–980
11. Why UDP?
• Need for Speed
• Need more flexible, multipoint architectures
• Small messages, transient data
• Consistent ordering not required
• Use resend-don’t-retransmit strategy
• Already a significant amount of prior art
• Simple as possible (but no simpler)
12. The UDT Library
- Originally developed at UIUC
- Winner of multiple Supercomputing Challenge
awards
- Provides full encapsulation, connection
management, congestion control hooks
- 3rd generation code/design choice
- Code is robust, well-tested
- API similar to traditional BSD sockets
- Almost too much flexibility!
13. JSON – The Good Parts
Scratch Uses JSON as Its Data Layer Format. Why?
- Easy to encode/decode
- Available on all platforms (mobile, desktop…)
- True to Web semantics, human-understandable
- Compact and lightweight
It makes everything else a whole lot easier…
14. 1
Intro: Starting From Scratch
AGENDA
2 Protocol Design
3
Results & Current State
4 Where Do We Go From Here?
16. Learnings from using AVRO/JSON
Pro Con
- Well-managed, current - Code complexity, long
codebase learning curve
- Makes JSON more robust - Very RPC-centric (not bad
with well-defined types, but not what we wanted )
grammar - Not many cons!
[{
- Self-contained schemas- "type" : "record",
"name" : “Cookie",
as-metadata "fields" [ {
"name" : “Name",
"type" : "string"
- Hooks for SASL, lexical }, {
"name" : “Value",
sorting "type" : “string"} ] …
Prepared for
Client name
17. Scratchpad Performance – 1st Pass Results
Test Setup Results
- 5 AWS global locations SCRATCH [ ] (ms) HTTP/TCP (ms) dropped %
Update 1 338 2240 0.11
US-,US-W,AP-S-AP-T,EU Update all
Send base file (35k)
1281 N/A
217 675 N/A
0.11
Compress & Send 114 480 N/A
- Circular buffer test
1000 ‘Linkdef’ [ ] objects SCRATCH [ ] vs. HTTP/TCP
(1470 bytes padded)
HTTP/TCP
- Also tested 35k text buffer
Response Time (ms)
SCRATCH
(size of Yahoo! Front Page Scratch/UDP
base HTML)
0 200 400 600 800
Prepared for
Client name
18. Is SCRATCH Network-friendly?
Fewer Packets vs. More Updates
Throttling based on MTU, RTT
Metadata as 1st Class Object?
Well-defined endpoints and
connection state establishment?
Handles smaller MTU sizes?
Nearest-node potential to reduce
payloads
19. 1 Intro: Starting From Scratch
AGENDA
2 Protocol Design
3 Results & Current State
4 Where Do We Go From Here?
20. Where Do We Go From Here?
When we first started, we were only trying to
make things go faster…we soon realized that
to really make the Web go faster, we had to
make it smarter as well…
21. Must Haves
Better Semantics
- Currently only 3 SCRATCH Schemas: Cookie, URI,
HTTPHeader
Resource Caching Encapsulation
- Should dynamically update IP of nearest copy
Encryption with SASL/SSL/TLS
- Difficult to make any type of encryption work over a
proxy
Native Compression (byte-pair, gzip)
- Byte-pair cheaper for mobile devices?
Node support
22. Future Research
Improving Hypertext
Use SCRATCH to make links self-aware and
self-healing, multi-home and context-aware
Peer Caching
Use SCRATCH to update the browser cache
incrementally in a stateful way
Merging with the Internet of Things
Everyday objects emitting SCRATCH objects
and joining the Web…who knows?
23. In building, architecture is a noun –
in business, architecture is a verb.
R. Buckminster Fuller
THANK YOU
Questions?
Daniel Austin
daustin@yahoo-inc.com
@daniel_b_austin
Editor's Notes
One of the intended arhitectures has the user collecting data from multiple data source farms while connected to a single control channelScracth uses JSON as its data layer.Today we’ll mostly talk about SCRATCH D
We can achieve some easy wins on bandwidth efficiency upfront due to not sending duplicate headers with each request. The data channel will negotiate the accept types only once. By managing the user’s cache properly we will reduce content downloads and with built in compression we will be able te reach 50% easily. A lot of HTTP data is redundant.The Internet of Things will use something very similar to this when it arrives
When we looked around we found that a lot of work had already been done in this space and that we were standing on the shoulders of giants…All of these technologies are related to each other in one way or another and are fellow travelers toward a faster smarter web, albeit at different levels of the stack. In designing SCRATCH, we borrowed liberally from any and all sources of good ideas, including these.I only discovered telehash recently and so they didn’t make it on the slide, but they are doing something very interesting as well.
This is closer to the old OSI model than the newer ‘TCP/IP’ model that has no session or presentation layer as such. It’s not TCP.
Scratchpad is a simple client/server setup,These results are very preliminary – we are only testing delivery and per node synchronization limits here, but the results are promising
Thanks to Paul Querna we’re now getting some UDP support in Node; it’ll be necessary to do considerably more work before this is ready.