SlideShare a Scribd company logo
1 of 23
Toward A High-
Performance
JSON Protocol:
Notes
JS.Conf                           Presented By:
May 3rd, 2011                      Daniel Austin
V 0.9            Yahoo! Exceptional Performance
1
             Introduction: Starting From Scratch
AGENDA
         2   Protocol Design

         3
             Results & Current State
         4   Where Do We Go From Here?
Exceptional Performance: What we do…
Create great Tools for users to optimize their pages, like
 YSlow.

Optimize the User Experience for Yahoo! users every
 day

Research on how to make the Web smarter and faster
Goals for Today’s Talk

               Explain the goals and design for SCRATCH, and why
                we are excited about using JSON to make the Web
                faster and smarter

               Describe our Experiments and what we’ve learned
                about protocol design, and where we are thinking of
                going next

               Request Feedback from our colleagues for ideas and
                improvements!



Prepared for
Client name
Starting From SCRATCH

 “We wanted to design a super-fast data
 protocol that would let us prioritize content and
 manage context while still working at
 scale…initially we ended up more or less re-
 designing TCP…
 …then we tore it up and started all over
 again…that’s why we called it SCRATCH”
Elevator Pitch



    SCRATCH is a new dual-
    band data protocol for the
    Web.
    It’s designed to work together
    with HTTP/TCP as a control
    channel [ ] and use
    SCRATCH/UDP as it’s data
    channel [ ].

Prepared for
Client name
Goals for Scratch Data Channel [Work in Progress!]

• Fast
    Bandwidth efficiency up by 2x to 50%

• Smart ‘semantic awareness’
   Managed contexts for state, identity, etc.
    as first-class objects in the system

• Robust but lightweight
    To target slow Networks, mobile and tablet
    devices, low-bandwidth IoT chatter…
1
             Introduction: Starting From Scratch
AGENDA
         2   Protocol Design

         3
             Results & Current State
         4   Where Do We Go From Here?
Distribution of Web Objects By Size & TCP Efficiency


                                                                 TCP & Bandwidth Efficiency
                                                                 •Slow for small objects
                                                                 •Parallelism not uniform
                                                                 •No context = redundancy
                                                                 •Trades reliability for
                                                                 performance
                                                                 •Not designed for small
                                                                 incremental changes
                                                                 •Typically


W. Shi et al. / J. Parallel Distrib. Comput. 63 (2003) 963–980
Fellow Travellers



                    RakNet             EXI
                          Protocol
                           Buffers
                     Argot/
            AVRO                     Scratch SPDY
                     XPL

                    YQL   Thrift       SCTP

                              RTP/
                              RTCP
Why UDP?

•   Need for Speed
•   Need more flexible, multipoint architectures
•   Small messages, transient data
•   Consistent ordering not required
•   Use resend-don’t-retransmit strategy
•   Already a significant amount of prior art
•   Simple as possible (but no simpler)
The UDT Library


- Originally developed at UIUC
- Winner of multiple Supercomputing Challenge
   awards
- Provides full encapsulation, connection
   management, congestion control hooks
- 3rd generation code/design choice
- Code is robust, well-tested
- API similar to traditional BSD sockets
- Almost too much flexibility!
JSON – The Good Parts



    Scratch Uses JSON as Its Data Layer Format. Why?

-   Easy to encode/decode
-   Available on all platforms (mobile, desktop…)
-   True to Web semantics, human-understandable
-   Compact and lightweight

It makes everything else a whole lot easier…
1
             Intro: Starting From Scratch
AGENDA
         2   Protocol Design

         3
             Results & Current State
         4   Where Do We Go From Here?
Looking at the Stack: UDP+JSON
Learnings from using AVRO/JSON


    Pro                                 Con
    - Well-managed, current             - Code complexity, long
       codebase                           learning curve
    -          Makes JSON more robust - Very RPC-centric (not bad
               with well-defined types,     but not what we wanted )
               grammar                  - Not many cons!
                                         [{
    -          Self-contained schemas-   "type" : "record",
                                         "name" : “Cookie",
               as-metadata               "fields" [ {
                                         "name" : “Name",
                                         "type" : "string"
    -          Hooks for SASL, lexical   }, {
                                         "name" : “Value",
               sorting                   "type" : “string"} ] …
Prepared for
Client name
Scratchpad Performance – 1st Pass Results


    Test Setup                          Results
    - 5 AWS global locations                                 SCRATCH [ ] (ms) HTTP/TCP (ms) dropped %
                                        Update 1                           338          2240        0.11
      US-,US-W,AP-S-AP-T,EU             Update all
                                        Send base file (35k)
                                                                          1281 N/A
                                                                           217           675 N/A
                                                                                                    0.11

                                        Compress & Send                    114           480 N/A
    - Circular buffer test
           1000 ‘Linkdef’ [ ] objects                       SCRATCH [ ] vs. HTTP/TCP
           (1470 bytes padded)
                                            HTTP/TCP
    - Also tested 35k text buffer
                                                                                       Response Time (ms)
                                                                                       SCRATCH
    (size of Yahoo! Front Page            Scratch/UDP


    base HTML)
                                                        0     200   400   600   800



Prepared for
Client name
Is SCRATCH Network-friendly?

      Fewer Packets vs. More Updates
      Throttling based on MTU, RTT
      Metadata as 1st Class Object?
       Well-defined endpoints and
       connection state establishment?
      Handles smaller MTU sizes?
      Nearest-node potential to reduce
       payloads
1   Intro: Starting From Scratch
AGENDA
         2   Protocol Design

         3   Results & Current State

         4   Where Do We Go From Here?
Where Do We Go From Here?

   When we first started, we were only trying to
   make things go faster…we soon realized that
   to really make the Web go faster, we had to
   make it smarter as well…
Must Haves
Better Semantics
 - Currently only 3 SCRATCH Schemas: Cookie, URI,
  HTTPHeader
Resource Caching Encapsulation
  - Should dynamically update IP of nearest copy
Encryption with SASL/SSL/TLS
  - Difficult to make any type of encryption work over a
  proxy
Native Compression (byte-pair, gzip)
   - Byte-pair cheaper for mobile devices?
Node support
Future Research

Improving Hypertext
  Use SCRATCH to make links self-aware and
  self-healing, multi-home and context-aware
Peer Caching
  Use SCRATCH to update the browser cache
  incrementally in a stateful way
Merging with the Internet of Things
  Everyday objects emitting SCRATCH objects
  and joining the Web…who knows?
In building, architecture is a noun –
        in business, architecture is a verb.
                              R. Buckminster Fuller




THANK YOU
                                               Questions?
                                             Daniel Austin
                                    daustin@yahoo-inc.com
                                          @daniel_b_austin

More Related Content

What's hot

Apache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructureApache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructure
Steve Watt
 
Designing for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacampDesigning for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacamp
Michael Montano
 
Riak intro to..
Riak intro to..Riak intro to..
Riak intro to..
Adron Hall
 

What's hot (11)

NewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTPNewSQL vs NoSQL for New OLTP
NewSQL vs NoSQL for New OLTP
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
Apache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructureApache con 2012 taking the guesswork out of your hadoop infrastructure
Apache con 2012 taking the guesswork out of your hadoop infrastructure
 
SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
cosbench-openstack.pdf
cosbench-openstack.pdfcosbench-openstack.pdf
cosbench-openstack.pdf
 
Designing for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacampDesigning for Massive Scalability at BackType #bigdatacamp
Designing for Massive Scalability at BackType #bigdatacamp
 
Riak intro to..
Riak intro to..Riak intro to..
Riak intro to..
 

Similar to Notes on a High-Performance JSON Protocol

Similar to Notes on a High-Performance JSON Protocol (20)

ppbench - A Visualizing Network Benchmark for Microservices
ppbench - A Visualizing Network Benchmark for Microservicesppbench - A Visualizing Network Benchmark for Microservices
ppbench - A Visualizing Network Benchmark for Microservices
 
OpenStack and OpenFlow Demos
OpenStack and OpenFlow DemosOpenStack and OpenFlow Demos
OpenStack and OpenFlow Demos
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
 
Oracle Cloud Infrastructure
Oracle Cloud InfrastructureOracle Cloud Infrastructure
Oracle Cloud Infrastructure
 
Mtn view sql server nov 2014
Mtn view sql server nov 2014Mtn view sql server nov 2014
Mtn view sql server nov 2014
 
R&D for L&D
R&D for L&DR&D for L&D
R&D for L&D
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Oracle in the Cloud
Oracle in the CloudOracle in the Cloud
Oracle in the Cloud
 

More from Daniel Austin

More from Daniel Austin (20)

Next generation web protocols
Next generation web protocolsNext generation web protocols
Next generation web protocols
 
Always Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of ThingsAlways Offline: Delay-Tolerant Networking for the Internet of Things
Always Offline: Delay-Tolerant Networking for the Internet of Things
 
Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?Performance: How Fast is Fast Enough?
Performance: How Fast is Fast Enough?
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 
Big data comes in small packages v1.2
Big data comes in small packages v1.2Big data comes in small packages v1.2
Big data comes in small packages v1.2
 
Designing Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of ThingsDesigning Delay-tolerant Data Services for the Network of Things
Designing Delay-tolerant Data Services for the Network of Things
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1HTML5, HTTP2, and You 1.1
HTML5, HTTP2, and You 1.1
 
Managing Performance Globally with MySQL
Managing Performance Globally with MySQLManaging Performance Globally with MySQL
Managing Performance Globally with MySQL
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Perspectives on the Evolution of HTML
Perspectives on the Evolution of HTMLPerspectives on the Evolution of HTML
Perspectives on the Evolution of HTML
 
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
The Fastest Possible Search Algorithm: Grover's Search and the World of Quant...
 
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
Quantum Computing in a Nutshell: Grover's Search and the World of Quantum Com...
 
Reconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data SystemReconceiving the Web as a Distributed (NoSQL) Data System
Reconceiving the Web as a Distributed (NoSQL) Data System
 
Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)Big data and the Future of Money (World Big Data Congress 2013)
Big data and the Future of Money (World Big Data Congress 2013)
 
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)
 
Performance analysisclass
Performance analysisclassPerformance analysisclass
Performance analysisclass
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
The Fastest Possible Search Algorithm
The Fastest Possible Search AlgorithmThe Fastest Possible Search Algorithm
The Fastest Possible Search Algorithm
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Notes on a High-Performance JSON Protocol

  • 1. Toward A High- Performance JSON Protocol: Notes JS.Conf Presented By: May 3rd, 2011 Daniel Austin V 0.9 Yahoo! Exceptional Performance
  • 2. 1 Introduction: Starting From Scratch AGENDA 2 Protocol Design 3 Results & Current State 4 Where Do We Go From Here?
  • 3. Exceptional Performance: What we do… Create great Tools for users to optimize their pages, like YSlow. Optimize the User Experience for Yahoo! users every day Research on how to make the Web smarter and faster
  • 4. Goals for Today’s Talk Explain the goals and design for SCRATCH, and why we are excited about using JSON to make the Web faster and smarter Describe our Experiments and what we’ve learned about protocol design, and where we are thinking of going next Request Feedback from our colleagues for ideas and improvements! Prepared for Client name
  • 5. Starting From SCRATCH “We wanted to design a super-fast data protocol that would let us prioritize content and manage context while still working at scale…initially we ended up more or less re- designing TCP… …then we tore it up and started all over again…that’s why we called it SCRATCH”
  • 6. Elevator Pitch SCRATCH is a new dual- band data protocol for the Web. It’s designed to work together with HTTP/TCP as a control channel [ ] and use SCRATCH/UDP as it’s data channel [ ]. Prepared for Client name
  • 7. Goals for Scratch Data Channel [Work in Progress!] • Fast Bandwidth efficiency up by 2x to 50% • Smart ‘semantic awareness’ Managed contexts for state, identity, etc. as first-class objects in the system • Robust but lightweight To target slow Networks, mobile and tablet devices, low-bandwidth IoT chatter…
  • 8. 1 Introduction: Starting From Scratch AGENDA 2 Protocol Design 3 Results & Current State 4 Where Do We Go From Here?
  • 9. Distribution of Web Objects By Size & TCP Efficiency TCP & Bandwidth Efficiency •Slow for small objects •Parallelism not uniform •No context = redundancy •Trades reliability for performance •Not designed for small incremental changes •Typically W. Shi et al. / J. Parallel Distrib. Comput. 63 (2003) 963–980
  • 10. Fellow Travellers RakNet EXI Protocol Buffers Argot/ AVRO Scratch SPDY XPL YQL Thrift SCTP RTP/ RTCP
  • 11. Why UDP? • Need for Speed • Need more flexible, multipoint architectures • Small messages, transient data • Consistent ordering not required • Use resend-don’t-retransmit strategy • Already a significant amount of prior art • Simple as possible (but no simpler)
  • 12. The UDT Library - Originally developed at UIUC - Winner of multiple Supercomputing Challenge awards - Provides full encapsulation, connection management, congestion control hooks - 3rd generation code/design choice - Code is robust, well-tested - API similar to traditional BSD sockets - Almost too much flexibility!
  • 13. JSON – The Good Parts Scratch Uses JSON as Its Data Layer Format. Why? - Easy to encode/decode - Available on all platforms (mobile, desktop…) - True to Web semantics, human-understandable - Compact and lightweight It makes everything else a whole lot easier…
  • 14. 1 Intro: Starting From Scratch AGENDA 2 Protocol Design 3 Results & Current State 4 Where Do We Go From Here?
  • 15. Looking at the Stack: UDP+JSON
  • 16. Learnings from using AVRO/JSON Pro Con - Well-managed, current - Code complexity, long codebase learning curve - Makes JSON more robust - Very RPC-centric (not bad with well-defined types, but not what we wanted ) grammar - Not many cons! [{ - Self-contained schemas- "type" : "record", "name" : “Cookie", as-metadata "fields" [ { "name" : “Name", "type" : "string" - Hooks for SASL, lexical }, { "name" : “Value", sorting "type" : “string"} ] … Prepared for Client name
  • 17. Scratchpad Performance – 1st Pass Results Test Setup Results - 5 AWS global locations SCRATCH [ ] (ms) HTTP/TCP (ms) dropped % Update 1 338 2240 0.11 US-,US-W,AP-S-AP-T,EU Update all Send base file (35k) 1281 N/A 217 675 N/A 0.11 Compress & Send 114 480 N/A - Circular buffer test 1000 ‘Linkdef’ [ ] objects SCRATCH [ ] vs. HTTP/TCP (1470 bytes padded) HTTP/TCP - Also tested 35k text buffer Response Time (ms) SCRATCH (size of Yahoo! Front Page Scratch/UDP base HTML) 0 200 400 600 800 Prepared for Client name
  • 18. Is SCRATCH Network-friendly?  Fewer Packets vs. More Updates  Throttling based on MTU, RTT  Metadata as 1st Class Object?  Well-defined endpoints and connection state establishment?  Handles smaller MTU sizes?  Nearest-node potential to reduce payloads
  • 19. 1 Intro: Starting From Scratch AGENDA 2 Protocol Design 3 Results & Current State 4 Where Do We Go From Here?
  • 20. Where Do We Go From Here? When we first started, we were only trying to make things go faster…we soon realized that to really make the Web go faster, we had to make it smarter as well…
  • 21. Must Haves Better Semantics - Currently only 3 SCRATCH Schemas: Cookie, URI, HTTPHeader Resource Caching Encapsulation - Should dynamically update IP of nearest copy Encryption with SASL/SSL/TLS - Difficult to make any type of encryption work over a proxy Native Compression (byte-pair, gzip) - Byte-pair cheaper for mobile devices? Node support
  • 22. Future Research Improving Hypertext Use SCRATCH to make links self-aware and self-healing, multi-home and context-aware Peer Caching Use SCRATCH to update the browser cache incrementally in a stateful way Merging with the Internet of Things Everyday objects emitting SCRATCH objects and joining the Web…who knows?
  • 23. In building, architecture is a noun – in business, architecture is a verb. R. Buckminster Fuller THANK YOU Questions? Daniel Austin daustin@yahoo-inc.com @daniel_b_austin

Editor's Notes

  1. One of the intended arhitectures has the user collecting data from multiple data source farms while connected to a single control channelScracth uses JSON as its data layer.Today we’ll mostly talk about SCRATCH D
  2. We can achieve some easy wins on bandwidth efficiency upfront due to not sending duplicate headers with each request. The data channel will negotiate the accept types only once. By managing the user’s cache properly we will reduce content downloads and with built in compression we will be able te reach 50% easily. A lot of HTTP data is redundant.The Internet of Things will use something very similar to this when it arrives
  3. When we looked around we found that a lot of work had already been done in this space and that we were standing on the shoulders of giants…All of these technologies are related to each other in one way or another and are fellow travelers toward a faster smarter web, albeit at different levels of the stack. In designing SCRATCH, we borrowed liberally from any and all sources of good ideas, including these.I only discovered telehash recently and so they didn’t make it on the slide, but they are doing something very interesting as well.
  4. This is closer to the old OSI model than the newer ‘TCP/IP’ model that has no session or presentation layer as such. It’s not TCP.
  5. Scratchpad is a simple client/server setup,These results are very preliminary – we are only testing delivery and per node synchronization limits here, but the results are promising
  6. Thanks to Paul Querna we’re now getting some UDP support in Node; it’ll be necessary to do considerably more work before this is ready.