SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Creating a Custom
Serialization Format
Scott Mansfield (@sgmansfield)
Senior Software Engineer
Netflix
What are we doing here?
1. Motivations
2. Queries
3. The Format
4. Performance
5. Future
1. Motivations
"The field is too in love with horribly
inefficient frameworks. Writing network
code and protocols is now considered
too low level for people."
- jnordwick (Hacker News)
Motivations
● Computers make meaning out of voltages
● Serialization is everywhere
○ Network protocols
○ Video encoding
○ Machine code
○ HTTP/2 headers
○ Hard drive communication
○ Video display
● Engineers should know what's inside the black box
Motivations
● JSON is the de facto serialization format
● Common pattern:
1. Get entire document
2. Inflate serialized data
3. Walk data structure & extract
● New pattern:
1. Query the document
2. Get only the data you need
3. Still need to inflate
Motivations
● Query capabilities over JSON documents
● Documents stored as a byte array
JSON Document (Augmented)
{
"null" : null,
"boolean" : true,
"integer" : 1,
"float" : 2.3,
"string" : "a string",
"array" : [4, 5, 6],
"map" : {"foo": 1}
}
2. Queries
Query Types
● Array Index
● Array Slice
● Array Iteration
● Map Access
● Map Keys
● Map Iteration
Array Index
Query: [2]
Result: 3
[1, 2, 3, 4, 5]
↑
Index 2
Array Slice
Query: [2:-1]
Result: [3, 4]
[1, 2, (3, 4), 5]
↑
Index 2 until 4
Array Iteration
Query: .a[] [0]
Result: [1,2,3,4,5]
[[1], [2], [3], [4], [5]]
↑ ↑ ↑ ↑ ↑
Index 0 of each list
Map Access (Single)
Query: .foo
Result: 3
{"foo": 3, "bar": 4}
↑
Key foo
Map Access (Multiple)
Query: .foo|bar
Result: {"foo":3, "bar":4}
{"foo":3, "bar":4, "baz":5}
↑ ↑
Key foo Key bar
Map Keys
Query: keys
Result: ["foo", "bar"]
{"foo": 3, "bar": 4}
↑ ↑
Map Keys
Map Iteration
Query: .m[] [0]
Result: {"foo": 3, "bar": 4}
{"foo": [3], "bar": [4]}
↑ ↑
Index 0 of each array value
Example
{"foo": {"k1": [3,4]},
"bar": {"k1": [5,6]}}
Query: .m[] .k1 [0]
Result: {"foo": 3, "bar": 5}
Example
{"foo": {"1":1, "2":2, "3":3},
"bar": {"4":4, "5":5, "6":6}}
Query: .m[] keys
Result: {"foo": ["1","2","3"],
"bar": ["4","5","6"] }
3. The Format
Types
Augmented JSON == JSON + integers
● Scalars
○ Null
○ Boolean
○ Integer (64 bit)
○ Float (64 bit)
○ String
● Composites
○ Array
○ Map
General Format
Every record starts with a single byte for the type:
int ...
Type Data
1 byte
Scalars
● Null
● Boolean
● Integer (64 bit)
● Float (64 bit)
● String
Scalar: Null
null
Type
1 byte
Scalar: Boolean
bool 1 or 0
Type Data
1 byte 1 byte
Scalar: Integer
int Little endian int64
Type Data
1 byte 8 bytes
Scalar: Integer (example)
4 = 0x0000_0000_0000_0004
int
Type Data
1 byte 8 bytes
04 00 00 00 00 00 00 00
Scalar: Float
float float64 as little endian uint64
Type Data
1 byte 8 bytes
Scalar: Float (example)
4.5 = 0x4012_0000_0000_0000
float
Type Data
1 byte 8 bytes
00 00 00 00 00 00 12 40
Scalar: String
string
Type Length
1 byte 4 bytes
Little endian
uint32
String contents
Data
length bytes
Scalar: String (example)
"Hello, Go!" Length: 10 = 0x0000_000A
string
Type __ Length __
1 byte 4 bytes
_______ Data ________
10 bytes
0A 000000 l l o , G o !H e
Composites
Recursive - contained data are defined by this same format
● Array
● Map
Composite: Array
array
Type Header
1 byte var bytes
array header array entries
Data
var bytes
Composite: Array - Header
numoffsets
var bytes
offlen
uvarint
numoffsets uints
of offlen length
(0,8)
numoffsets × offlen bytes1 byte
offsets
Composite: Array - Header offsets
2 or more offsets
Each offlen bytes
offset offsetoffsetoffsetoffset
Composite: Array - Data
1 or more records
Each var bytes
record recordrecordrecord
Composite: Empty Array
numoffsets
1 byte
uvarint (0)array
Type
1 byte
Composite: Array (example)
[true, false]
array
Type ___ Header ____
1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾
______ Data _______
‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾
0 2 4 bool 1 bool 013
num
off
off
len
____ offsets ____ ___ record 2 ______ record 1 ___
Composite: Array (example, slicing)
[true, false]
array
Type ___ Header ____
1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾
______ Data _______
‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾
0 2 4 bool 1 bool 0123
num
off
off
len
____ offsets ____ ___ record 2 ______ record 1 ___
Composite: Map
map
Type Header
1 byte var bytes
map header map entries
Data
var bytes
Composite: Map - Header
num recs
var bytes
offlen
uvarint
num recs
header records
(0,8)
∝num recs1 byte
header recordslenlen
(0,8)
1 byte
Composite: Map - Header
1 or more header records
Each 4 + offlen + lenlen bytes
record recordrecordrecord
Composite: Map - Data
1 or more records
Each var bytes
record recordrecordrecord
Composite: Map - Header Record
Intern ID
4 bytes
uint32 uintuint
offset length
offlen bytes lenlen bytes
Composite: Map - Interned Keys
● Map keys are assigned a unique uint32 ID
● IDs are shared by identical strings
● Forward and reverse mappings stored next to the data
● Example:
○ "true" → 1
○ "false" → 2
Composite: Map - Header
header records
1 1955217
Composite: Empty Map
num recs
1 byte
uvarint
(0)
map
Type
1 byte
Composite: Map (example)
{"false":false, "true":true}
map
Type _______ Header _______
1 byte ‾‾‾‾‾‾‾‾‾‾‾ 15 bytes ‾‾‾‾‾‾‾‾‾‾
1 1 012
#
rec
______ header records ______
___ Data ___
‾‾‾‾ 4 bytes ‾‾‾‾
bool
1
bool
0
record 2record 1
off
len
len
len __ record 1 __
2 2 2 2
__ record 2 __
"true" → 1
"false" → 2
4. Performance
How fast is it?
It depends
… on:
● How much data you ask for
● How complex the query is
● How many CPU's
● Speed of the underlying data storage
Scalars
Serialize
Type time/op
Null 64.3 ns ± 2%
Boolean 71.6 ns ± 1%
Int 75.7 ns ± 0%
Float 75.4 ns ± 1%
String 88.6 ns ± 1% "foobar"
Deserialize
Type time/op
Null 16.0 ns ± 1%
Boolean 23.9 ns ± 1%
Int 26.6 ns ± 1%
Float 27.1 ns ± 1%
String 70.1 ns ± 1%
Composites: Serialize
Type # elems time/op time/op (ns)
Array 0 115 ns ± 0% 115 ns
Array 1 273 ns ± 1% 273 ns
Array 10 900 ns ± 1% 900 ns
Array 100 5.42 µs ± 1% 5420 ns
Array 1000 43.7 µs ± 1% 43700 ns
Array 10000 453 µs ± 1% 453000 ns
Array 100000 5.35 ms ± 1% 5350000 ns
Array 1000000 54.0 ms ± 3% 54000000 ns
Map 0 87.2 ns ± 1% 87 ns
Map 1 608 ns ± 1% 608 ns
Map 10 3.39 µs ± 1% 3390 ns
Map 100 34.1 µs ± 1% 34100 ns
Map 1000 374 µs ± 0% 374000 ns
Map 10000 4.37 ms ± 1% 4370000 ns
Map 100000 58.7 ms ± 2% 58700000 ns
Map 1000000 866 ms ± 4% 866000000 ns
Composites: Deserialize
Type # elems time/op time/op (ns)
Array 0 136 ns ± 1% 136 ns
Array 1 201 ns ± 0% 201 ns
Array 10 588 ns ± 2% 588 ns
Array 100 4.05 µs ± 3% 4050 ns
Array 1000 38.1 µs ± 1% 38100 ns
Array 10000 380 µs ± 2% 380000 ns
Array 100000 3.81 ms ± 1% 3810000 ns
Array 1000000 39.9 ms ± 2% 39900000 ns
Map 0 158 ns ± 0% 158 ns
Map 1 361 ns ± 0% 361 ns
Map 10 1.97 µs ± 0% 1970 ns
Map 100 21.3 µs ± 0% 21300 ns
Map 1000 261 µs ± 1% 261000 ns
Map 10000 2.67 ms ± 1% 2670000 ns
Map 100000 38.3 ms ± 2% 38300000 ns
Map 1000000 757 ms ± 3% 757000000 ns
Composites: Queries
Type # elems time/op
Array Get 1 25.9 ns ± 7%
Array Get 10 26.4 ns ± 6%
Array Get 100 26.6 ns ± 6%
Array Get 1000 26.3 ns ± 6%
Array Get 10000 26.3 ns ± 8%
Array Get 100000 26.0 ns ± 4%
Array Get 1000000 26.2 ns ± 7%
Map Get 1 35.3 ns ± 1%
Map Get 10 64.7 ns ± 0%
Map Get 100 74.6 ns ± 1%
Map Get 1000 121 ns ± 1%
Map Get 10000 157 ns ± 0%
Map Get 100000 221 ns ± 2%
Map Get 1000000 375 ns ± 1%
Type # elems time/op
Array Slice 1 70.1 ns ± 1%
Array Slice 10 73.9 ns ± 4%
Array Slice 100 73.7 ns ± 3%
Array Slice 1000 73.0 ns ± 2%
Array Slice 10000 73.4 ns ± 3%
Array Slice 100000 75.6 ns ± 3%
Array Slice 1000000 73.4 ns ± 2%
Map Keys 1 662 ns ± 9%
Map Keys 10 2.11 µs ± 8%
Map Keys 100 17.4 µs ± 8%
Map Keys 1000 173 µs ± 8%
Map Keys 10000 2.28 ms ± 4%
Map Keys 100000 35.6 ms ± 5%
Map Keys 1000000 348 ms ± 7%
5. Future
In Progress & Future Work
● Replace simple scalar values
● Append to arrays
● Add new keys to a map
● Other ops (inc, dec, etc)
● Compression
Thank You
@sgmansfield
smansfield@netflix.com
techblog.netflix.com
Creating a Custom Serialization Format for Fast Queries and Small Size

Weitere ähnliche Inhalte

Was ist angesagt?

The Ring programming language version 1.3 book - Part 14 of 88
The Ring programming language version 1.3 book - Part 14 of 88The Ring programming language version 1.3 book - Part 14 of 88
The Ring programming language version 1.3 book - Part 14 of 88Mahmoud Samir Fayed
 
The Ring programming language version 1.4.1 book - Part 6 of 31
The Ring programming language version 1.4.1 book - Part 6 of 31The Ring programming language version 1.4.1 book - Part 6 of 31
The Ring programming language version 1.4.1 book - Part 6 of 31Mahmoud Samir Fayed
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting SpatialFAO
 
The Ring programming language version 1.5.2 book - Part 22 of 181
The Ring programming language version 1.5.2 book - Part 22 of 181The Ring programming language version 1.5.2 book - Part 22 of 181
The Ring programming language version 1.5.2 book - Part 22 of 181Mahmoud Samir Fayed
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Eventsztellman
 
5. R basics
5. R basics5. R basics
5. R basicsFAO
 
Bloom filter
Bloom filterBloom filter
Bloom filterwang ping
 
Opensource gis development - part 3
Opensource gis development - part 3Opensource gis development - part 3
Opensource gis development - part 3Andrea Antonello
 
11. Linear Models
11. Linear Models11. Linear Models
11. Linear ModelsFAO
 
Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?Data Con LA
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashingDmitriy Selivanov
 
Create a correlation plot from joined tables and lag times
Create a correlation plot from joined tables and lag timesCreate a correlation plot from joined tables and lag times
Create a correlation plot from joined tables and lag timesDougLoqa
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Procedural Content Generation with Clojure
Procedural Content Generation with ClojureProcedural Content Generation with Clojure
Procedural Content Generation with ClojureMike Anderson
 

Was ist angesagt? (20)

The Ring programming language version 1.3 book - Part 14 of 88
The Ring programming language version 1.3 book - Part 14 of 88The Ring programming language version 1.3 book - Part 14 of 88
The Ring programming language version 1.3 book - Part 14 of 88
 
The Ring programming language version 1.4.1 book - Part 6 of 31
The Ring programming language version 1.4.1 book - Part 6 of 31The Ring programming language version 1.4.1 book - Part 6 of 31
The Ring programming language version 1.4.1 book - Part 6 of 31
 
10. Getting Spatial
10. Getting Spatial10. Getting Spatial
10. Getting Spatial
 
The Ring programming language version 1.5.2 book - Part 22 of 181
The Ring programming language version 1.5.2 book - Part 22 of 181The Ring programming language version 1.5.2 book - Part 22 of 181
The Ring programming language version 1.5.2 book - Part 22 of 181
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Events
 
5. R basics
5. R basics5. R basics
5. R basics
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 
Opensource gis development - part 3
Opensource gis development - part 3Opensource gis development - part 3
Opensource gis development - part 3
 
11. Linear Models
11. Linear Models11. Linear Models
11. Linear Models
 
Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Create a correlation plot from joined tables and lag times
Create a correlation plot from joined tables and lag timesCreate a correlation plot from joined tables and lag times
Create a correlation plot from joined tables and lag times
 
Scalar lenses-workshop
Scalar lenses-workshopScalar lenses-workshop
Scalar lenses-workshop
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Smalltalk
SmalltalkSmalltalk
Smalltalk
 
PART 5: RASTER DATA
PART 5: RASTER DATAPART 5: RASTER DATA
PART 5: RASTER DATA
 
Fs2 - Crash Course
Fs2 - Crash CourseFs2 - Crash Course
Fs2 - Crash Course
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Procedural Content Generation with Clojure
Procedural Content Generation with ClojureProcedural Content Generation with Clojure
Procedural Content Generation with Clojure
 

Ähnlich wie Creating a Custom Serialization Format for Fast Queries and Small Size

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdfzehiwot hone
 
Data Structures & Algorithms Coursework Assignment for Sem.docx
Data Structures & Algorithms Coursework Assignment for Sem.docxData Structures & Algorithms Coursework Assignment for Sem.docx
Data Structures & Algorithms Coursework Assignment for Sem.docxsimonithomas47935
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1asslang
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data SynchronizationKlemen Verdnik
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)Edu4Sure
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLKostis Kyzirakos
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RStacy Irwin
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming languageLincoln Hannah
 
Oct8 - 131 slid
Oct8 - 131 slidOct8 - 131 slid
Oct8 - 131 slidTak Lee
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesSreedhar Chowdam
 
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14stewashton
 

Ähnlich wie Creating a Custom Serialization Format for Fast Queries and Small Size (20)

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Mat lab
Mat labMat lab
Mat lab
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
01_introduction_lab.pdf
01_introduction_lab.pdf01_introduction_lab.pdf
01_introduction_lab.pdf
 
R programming language
R programming languageR programming language
R programming language
 
Data Structures & Algorithms Coursework Assignment for Sem.docx
Data Structures & Algorithms Coursework Assignment for Sem.docxData Structures & Algorithms Coursework Assignment for Sem.docx
Data Structures & Algorithms Coursework Assignment for Sem.docx
 
Kaizen cso002 l1
Kaizen cso002 l1Kaizen cso002 l1
Kaizen cso002 l1
 
Stream-based Data Synchronization
Stream-based Data SynchronizationStream-based Data Synchronization
Stream-based Data Synchronization
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)SQL (Basic to Intermediate Customized 8 Hours)
SQL (Basic to Intermediate Customized 8 Hours)
 
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQLModeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming language
 
Oct8 - 131 slid
Oct8 - 131 slidOct8 - 131 slid
Oct8 - 131 slid
 
1.2 matlab numerical data
1.2  matlab numerical data1.2  matlab numerical data
1.2 matlab numerical data
 
SISAP17
SISAP17SISAP17
SISAP17
 
Saga.lng
Saga.lngSaga.lng
Saga.lng
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
Row Pattern Matching 12c MATCH_RECOGNIZE OOW14
 

Kürzlich hochgeladen

Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 

Kürzlich hochgeladen (20)

Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 

Creating a Custom Serialization Format for Fast Queries and Small Size

  • 1. Creating a Custom Serialization Format Scott Mansfield (@sgmansfield) Senior Software Engineer Netflix
  • 2. What are we doing here? 1. Motivations 2. Queries 3. The Format 4. Performance 5. Future
  • 4. "The field is too in love with horribly inefficient frameworks. Writing network code and protocols is now considered too low level for people." - jnordwick (Hacker News)
  • 5. Motivations ● Computers make meaning out of voltages ● Serialization is everywhere ○ Network protocols ○ Video encoding ○ Machine code ○ HTTP/2 headers ○ Hard drive communication ○ Video display ● Engineers should know what's inside the black box
  • 6. Motivations ● JSON is the de facto serialization format ● Common pattern: 1. Get entire document 2. Inflate serialized data 3. Walk data structure & extract ● New pattern: 1. Query the document 2. Get only the data you need 3. Still need to inflate
  • 7. Motivations ● Query capabilities over JSON documents ● Documents stored as a byte array
  • 8. JSON Document (Augmented) { "null" : null, "boolean" : true, "integer" : 1, "float" : 2.3, "string" : "a string", "array" : [4, 5, 6], "map" : {"foo": 1} }
  • 10. Query Types ● Array Index ● Array Slice ● Array Iteration ● Map Access ● Map Keys ● Map Iteration
  • 11. Array Index Query: [2] Result: 3 [1, 2, 3, 4, 5] ↑ Index 2
  • 12. Array Slice Query: [2:-1] Result: [3, 4] [1, 2, (3, 4), 5] ↑ Index 2 until 4
  • 13. Array Iteration Query: .a[] [0] Result: [1,2,3,4,5] [[1], [2], [3], [4], [5]] ↑ ↑ ↑ ↑ ↑ Index 0 of each list
  • 14. Map Access (Single) Query: .foo Result: 3 {"foo": 3, "bar": 4} ↑ Key foo
  • 15. Map Access (Multiple) Query: .foo|bar Result: {"foo":3, "bar":4} {"foo":3, "bar":4, "baz":5} ↑ ↑ Key foo Key bar
  • 16. Map Keys Query: keys Result: ["foo", "bar"] {"foo": 3, "bar": 4} ↑ ↑ Map Keys
  • 17. Map Iteration Query: .m[] [0] Result: {"foo": 3, "bar": 4} {"foo": [3], "bar": [4]} ↑ ↑ Index 0 of each array value
  • 18. Example {"foo": {"k1": [3,4]}, "bar": {"k1": [5,6]}} Query: .m[] .k1 [0] Result: {"foo": 3, "bar": 5}
  • 19. Example {"foo": {"1":1, "2":2, "3":3}, "bar": {"4":4, "5":5, "6":6}} Query: .m[] keys Result: {"foo": ["1","2","3"], "bar": ["4","5","6"] }
  • 21. Types Augmented JSON == JSON + integers ● Scalars ○ Null ○ Boolean ○ Integer (64 bit) ○ Float (64 bit) ○ String ● Composites ○ Array ○ Map
  • 22. General Format Every record starts with a single byte for the type: int ... Type Data 1 byte
  • 23. Scalars ● Null ● Boolean ● Integer (64 bit) ● Float (64 bit) ● String
  • 25. Scalar: Boolean bool 1 or 0 Type Data 1 byte 1 byte
  • 26. Scalar: Integer int Little endian int64 Type Data 1 byte 8 bytes
  • 27. Scalar: Integer (example) 4 = 0x0000_0000_0000_0004 int Type Data 1 byte 8 bytes 04 00 00 00 00 00 00 00
  • 28. Scalar: Float float float64 as little endian uint64 Type Data 1 byte 8 bytes
  • 29. Scalar: Float (example) 4.5 = 0x4012_0000_0000_0000 float Type Data 1 byte 8 bytes 00 00 00 00 00 00 12 40
  • 30. Scalar: String string Type Length 1 byte 4 bytes Little endian uint32 String contents Data length bytes
  • 31. Scalar: String (example) "Hello, Go!" Length: 10 = 0x0000_000A string Type __ Length __ 1 byte 4 bytes _______ Data ________ 10 bytes 0A 000000 l l o , G o !H e
  • 32. Composites Recursive - contained data are defined by this same format ● Array ● Map
  • 33. Composite: Array array Type Header 1 byte var bytes array header array entries Data var bytes
  • 34. Composite: Array - Header numoffsets var bytes offlen uvarint numoffsets uints of offlen length (0,8) numoffsets × offlen bytes1 byte offsets
  • 35. Composite: Array - Header offsets 2 or more offsets Each offlen bytes offset offsetoffsetoffsetoffset
  • 36. Composite: Array - Data 1 or more records Each var bytes record recordrecordrecord
  • 37. Composite: Empty Array numoffsets 1 byte uvarint (0)array Type 1 byte
  • 38. Composite: Array (example) [true, false] array Type ___ Header ____ 1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾ ______ Data _______ ‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾ 0 2 4 bool 1 bool 013 num off off len ____ offsets ____ ___ record 2 ______ record 1 ___
  • 39. Composite: Array (example, slicing) [true, false] array Type ___ Header ____ 1 byte ‾‾‾‾‾‾ 5 bytes ‾‾‾‾‾‾ ______ Data _______ ‾‾‾‾‾‾‾‾‾ 4 bytes ‾‾‾‾‾‾‾‾‾ 0 2 4 bool 1 bool 0123 num off off len ____ offsets ____ ___ record 2 ______ record 1 ___
  • 40. Composite: Map map Type Header 1 byte var bytes map header map entries Data var bytes
  • 41. Composite: Map - Header num recs var bytes offlen uvarint num recs header records (0,8) ∝num recs1 byte header recordslenlen (0,8) 1 byte
  • 42. Composite: Map - Header 1 or more header records Each 4 + offlen + lenlen bytes record recordrecordrecord
  • 43. Composite: Map - Data 1 or more records Each var bytes record recordrecordrecord
  • 44. Composite: Map - Header Record Intern ID 4 bytes uint32 uintuint offset length offlen bytes lenlen bytes
  • 45. Composite: Map - Interned Keys ● Map keys are assigned a unique uint32 ID ● IDs are shared by identical strings ● Forward and reverse mappings stored next to the data ● Example: ○ "true" → 1 ○ "false" → 2
  • 46. Composite: Map - Header header records 1 1955217
  • 47. Composite: Empty Map num recs 1 byte uvarint (0) map Type 1 byte
  • 48. Composite: Map (example) {"false":false, "true":true} map Type _______ Header _______ 1 byte ‾‾‾‾‾‾‾‾‾‾‾ 15 bytes ‾‾‾‾‾‾‾‾‾‾ 1 1 012 # rec ______ header records ______ ___ Data ___ ‾‾‾‾ 4 bytes ‾‾‾‾ bool 1 bool 0 record 2record 1 off len len len __ record 1 __ 2 2 2 2 __ record 2 __ "true" → 1 "false" → 2
  • 50. How fast is it? It depends … on: ● How much data you ask for ● How complex the query is ● How many CPU's ● Speed of the underlying data storage
  • 51. Scalars Serialize Type time/op Null 64.3 ns ± 2% Boolean 71.6 ns ± 1% Int 75.7 ns ± 0% Float 75.4 ns ± 1% String 88.6 ns ± 1% "foobar" Deserialize Type time/op Null 16.0 ns ± 1% Boolean 23.9 ns ± 1% Int 26.6 ns ± 1% Float 27.1 ns ± 1% String 70.1 ns ± 1%
  • 52. Composites: Serialize Type # elems time/op time/op (ns) Array 0 115 ns ± 0% 115 ns Array 1 273 ns ± 1% 273 ns Array 10 900 ns ± 1% 900 ns Array 100 5.42 µs ± 1% 5420 ns Array 1000 43.7 µs ± 1% 43700 ns Array 10000 453 µs ± 1% 453000 ns Array 100000 5.35 ms ± 1% 5350000 ns Array 1000000 54.0 ms ± 3% 54000000 ns Map 0 87.2 ns ± 1% 87 ns Map 1 608 ns ± 1% 608 ns Map 10 3.39 µs ± 1% 3390 ns Map 100 34.1 µs ± 1% 34100 ns Map 1000 374 µs ± 0% 374000 ns Map 10000 4.37 ms ± 1% 4370000 ns Map 100000 58.7 ms ± 2% 58700000 ns Map 1000000 866 ms ± 4% 866000000 ns
  • 53. Composites: Deserialize Type # elems time/op time/op (ns) Array 0 136 ns ± 1% 136 ns Array 1 201 ns ± 0% 201 ns Array 10 588 ns ± 2% 588 ns Array 100 4.05 µs ± 3% 4050 ns Array 1000 38.1 µs ± 1% 38100 ns Array 10000 380 µs ± 2% 380000 ns Array 100000 3.81 ms ± 1% 3810000 ns Array 1000000 39.9 ms ± 2% 39900000 ns Map 0 158 ns ± 0% 158 ns Map 1 361 ns ± 0% 361 ns Map 10 1.97 µs ± 0% 1970 ns Map 100 21.3 µs ± 0% 21300 ns Map 1000 261 µs ± 1% 261000 ns Map 10000 2.67 ms ± 1% 2670000 ns Map 100000 38.3 ms ± 2% 38300000 ns Map 1000000 757 ms ± 3% 757000000 ns
  • 54. Composites: Queries Type # elems time/op Array Get 1 25.9 ns ± 7% Array Get 10 26.4 ns ± 6% Array Get 100 26.6 ns ± 6% Array Get 1000 26.3 ns ± 6% Array Get 10000 26.3 ns ± 8% Array Get 100000 26.0 ns ± 4% Array Get 1000000 26.2 ns ± 7% Map Get 1 35.3 ns ± 1% Map Get 10 64.7 ns ± 0% Map Get 100 74.6 ns ± 1% Map Get 1000 121 ns ± 1% Map Get 10000 157 ns ± 0% Map Get 100000 221 ns ± 2% Map Get 1000000 375 ns ± 1% Type # elems time/op Array Slice 1 70.1 ns ± 1% Array Slice 10 73.9 ns ± 4% Array Slice 100 73.7 ns ± 3% Array Slice 1000 73.0 ns ± 2% Array Slice 10000 73.4 ns ± 3% Array Slice 100000 75.6 ns ± 3% Array Slice 1000000 73.4 ns ± 2% Map Keys 1 662 ns ± 9% Map Keys 10 2.11 µs ± 8% Map Keys 100 17.4 µs ± 8% Map Keys 1000 173 µs ± 8% Map Keys 10000 2.28 ms ± 4% Map Keys 100000 35.6 ms ± 5% Map Keys 1000000 348 ms ± 7%
  • 56. In Progress & Future Work ● Replace simple scalar values ● Append to arrays ● Add new keys to a map ● Other ops (inc, dec, etc) ● Compression