SlideShare a Scribd company logo
John H Schulz
What is in all of those sstable files? Not just the data one, but all the rest too!
1 Bio and My Employer
2 Some background
3 The Test Table and data
4 The actual file contents
5 Conclusions and Other Stuff
2© DataStax, All Rights Reserved.
Necessary Stuff
Nap time if you wish
BIO
Likes databases more than
cheese sandwiches with
peppered bacon and onions
Open source databases since
2003
Relational databases since
1984
Owned one of the first Apple IIs
in the neighborhood
Lives in Michigan near the
Detroit Zoo
About Pythian
11,400
Pythian currently manages
more than 11,400 systems.
400+
Pythian currently employs
more than 400 people in 200
cities in 35 countries
1997
Pythian was founded in 1997
Global Leader In IT Transformation And Operational Excellence
Unparalleled Expertise
• Top 5% in databases, applications, infrastructure, Big Data, Cloud, Data
Science, and DevOps
Unmatched Certifications
• 9 Oracle ACEs, 4 Oracle ACE Directors, 1 Oracle ACE Associate
• 6 Microsoft MVPs, 1 Microsoft Certified Master
• 5 Google Platform Qualified Developers
• 1 Cloudera Champion of Big Data
• 1 Mongo DB Certified DBA Associate Level
• DataStax Certified Partner, 1 MVP, 3 Architect, 2 Certified administrators
Broad Technical Experience
• Oracle, Microsoft, MySQL, Oracle EBS, Hadoop, Cassandra, MongoDB,
virtualization, configuration management, monitoring, trending, and more.
Background
Background
Some Terminology
Cassandra data directory structure
Some Terminology
Partition Key
The column(s) which are hashed to create a token mapped by the partitioner
Partition
The rows and columns which belong to a specific Partition Key
Clustering Key
The column(s) which, with the Partition Key define a row
Column
A specific entity with a type (can be user defined) a value, version, time to live and a name
More on columns
A single column can stand on its own in an SSTable file
What is the minimum information needed to find and use a column
Column Name
Data type
Cluster column values
Partition column values
Version (usually a timestamp)
Time to Live
Value
The Cassandra data directory structure
data
key-spaces
Tables
backups
snapshots
SSTable Files
Example partial directory tree from Cassandra
2.1
[root@Snoopy-3 node1]# ls -lR data
data:
total 12
drwxr-xr-x. 3 root root 4096 Aug 19 13:59 stuff
drwxr-xr-x. 19 root root 4096 Aug 19 13:58 system
drwxr-xr-x. 4 root root 4096 Aug 19 13:58 system_traces
data/stuff:
total 4
drwxr-xr-x. 2 root root 4096 Aug 19 14:00 simplefields-bdd61590663611e69c3e1d84c92693ab
data/stuff/simplefields-bdd61590663611e69c3e1d84c92693ab:
total 36
SSTable files in 2.1.9
8 Aug 19 14:00 stuff-simplefields-ka-1-CRC.db
1044 Aug 19 14:00 stuff-simplefields-ka-1-Data.db
9 Aug 19 14:00 stuff-simplefields-ka-1-Digest.sha1
24 Aug 19 14:00 stuff-simplefields-ka-1-Filter.db
130 Aug 19 14:00 stuff-simplefields-ka-1-Index.db
4460 Aug 19 14:00 stuff-simplefields-ka-1-Statistics.db
116 Aug 19 14:00 stuff-simplefields-ka-1-Summary.db
79 Aug 19 14:00 stuff-simplefields-ka-1-TOC.txt
The SSTable files Cassandra 3.0
8 Aug 19 10:42 mb-1-big-CRC.db
481 Aug 19 10:42 mb-1-big-Data.db
10 Aug 19 10:42 mb-1-big-Digest.crc32
24 Aug 19 10:42 mb-1-big-Filter.db
84 Aug 19 10:42 mb-1-big-Index.db
4799 Aug 19 10:42 mb-1-big-Statistics.db
80 Aug 19 10:42 mb-1-big-Summary.db
80 Aug 19 10:42 mb-1-big-TOC.txt
Parts of an SSTable file name
Pre 2.2
stuff-simplefields-ia-1-Data.db
2.2 going forward
mb-1-big-Data.db
Version Counter What it is
Keyspace table name Version Counter What it is
Data File Size Changed!!!
Cassandra 2.1
1044 Aug 19 14:00 stuff-simplefields-ka-1-Data.db
Cassandra 3.0
481 Aug 19 10:42 mb-1-big-Data.db
A relational view of how the files are connected
© DataStax, All Rights Reserved. 16
DATA
Summary
Index
Filter
Yes/No
Statistics
Digest
Compression
TOC
CRC
METADATA
Metadata
• Partitioner
• Commit log location
• Field names and types
• Bloom filter chance
• Counter information
• Repair time
• Cardinality estimate
• Checksums
• Compression Algorithm
• Filenames
© DataStax, All Rights Reserved. 17
The Test Table and Data
My test table
CREATE TABLE stuff.simplefields (
field1 text,
field2 text,
field3 text,
field4 timestamp,
field5 decimal,
PRIMARY KEY ((field1, field2), field3)
) WITH CLUSTERING ORDER BY (field3 ASC)
AND compression = {};
Some test data to put in it
insert into simpleFields (field1, field2, field3, field4, field5)
values('FV1', 'FV2' , 'FV3', '2016-08-11 11:37:25.976', '127.00');
insert into simpleFields (field1, field2, field3, field4, field5)
values('GV1', 'GV2' , 'GV3', '2016-08-11 11:37:25.976', '127.10');
insert into simpleFields (field1, field2, field3, field4, field5)
values('AV1', 'AV2' , 'AV3', '2016-08-11 11:37:25.976', '127.20');
insert into simpleFields (field1, field2, field3, field4, field5)
values('BV1', 'BV2' , 'BV3', '2016-08-11 11:37:25.976', '127.30');
insert into simpleFields (field1, field2, field3, field4, field5)
values('CV1', 'CV2' , 'CV3', '2016-08-11 11:37:25.976', '127.40');
insert into simpleFields (field1, field2, field3, field4, field5)
values('CV1', 'CV2' , 'CV4', '2016-08-11 11:37:25.976', '127.50');
insert into simpleFields (field1, field2, field3, field4, field5)
values('CV1', 'CV2' , 'CV5', '2016-08-11 11:37:25.976', '127.60');
insert into simpleFields (field1, field2, field3, field4, field5)
values('CV1', 'CV2' , 'CV6', '2016-08-11 11:37:25.976', '127.70');
My Data in the Table
field1 | field2 | field3 | field4 | field5
--------+--------+--------+-------------------------+--------
CV1 | CV2 | CV3 | 2016-08-11 11:37:25.976 | 127.40
CV1 | CV2 | CV4 | 2016-08-11 11:37:25.976 | 127.50
CV1 | CV2 | CV5 | 2016-08-11 11:37:25.976 | 127.60
CV1 | CV2 | CV6 | 2016-08-11 11:37:25.976 | 127.70
FV1 | FV2 | FV3 | 2016-08-11 11:37:25.976 | 127.00
GV1 | GV2 | GV3 | 2016-08-11 11:37:25.976 | 127.10
BV1 | BV2 | BV3 | 2016-08-11 11:37:25.976 | 127.30
AV1 | AV2 | AV3 | 2016-08-11 11:37:25.976 | 127.20
How I got my clusters going
Using CCM – Cassandra Cluster Manager
https://github.com/pcmanus/ccm
Example of creating a cluster:
ccm create --vnodes --nodes=1 –v 2.1.9--start ver-219
ccm node1 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 196.38 KB 256 ? 6ecec1cd-424d-4173-a83f-aefe485f4e2c rack1
© DataStax, All Rights Reserved. 22
Clusters currently on my Machine
ver-12a ver-219 ver227 ver_33
ver-12a-multi ver-219-single ver-227-single
ver_20_11a ver-221 ver-301
test ver-2011-single ver_225 ver-308
ver-120-single ver-213-multi ver_225_6 ver-308-single
© DataStax, All Rights Reserved. 23
A peek inside the files
Quick overview of each File
CRC – A single CRC of the Data file
Compression – The name of the compression engine used
Data – The SSTable data stored in sorted order by Partition/Cluster Key
Digest – One or more CRCs one per 2GB segment of the Data file
Filter – The bloom filter – folded hashes of all key values in the Data file
Index – A list of all partition/Cluster Keys
Statistics – Hodgepodge of metadata about the SSTable
Summary – A list of some of the Partition/Cluster keys – Think second level index
TOC – Text list of the files
A peek inside the less interesting files
hexdump mb-1-big-CRC.db
0000000 0100 0000 c7ff 8b91
0000008
cat mb-1-big-Digest.crc32
4291269003
cat mb-1-big-TOC.txt
Filter.db
CRC.db
Data.db
Digest.crc32
Summary.db
TOC.txt
Statistics.db
Index.db
Cassandra 2.1 Index
0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 0 0
0000010 0 0 0 0 0 0 0 0 0 0 0 f 0 003 F V
0000020 1 0 0 003 F V 2 0 0 0 0 0 0 0 001 340
0000030 0 0 0 0 0 f 0 003 G V 1 0 0 003 G V
0000040 2 0 0 0 0 0 0 0 002 m 0 0 0 0 0 f
0000050 0 003 B V 1 0 0 003 B V 2 0 0 0 0 0
0000060 0 0 002 372 0 0 0 0 0 f 0 003 A V 1 0
0000070 0 003 A V 2 0 0 0 0 0 0 0 003 207 0 0
0000080 0 0
Cassandra 3.0 Index
hexdump -c mb-1-big-Index.db
0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 0 0
0000010 0 f 0 003 F V 1 0 0 003 F V 2 0 200 312
0000020 0 0 f 0 003 G V 1 0 0 003 G V 2 0 201
0000030 017 0 0 f 0 003 B V 1 0 0 003 B V 2 0
0000040 201 U 0 0 f 0 003 A V 1 0 0 003 A V 2
0000050 0 201 233 0
Cassandra 2.1 Summary
0000000 0 0 0 200 0 0 0 001 0 0 0 0 0 0 0 030
0000010 0 0 0 200 0 0 0 001 004 0 0 0 0 003 C V
0000020 1 0 0 003 C V 2 0 0 0 0 0 0 0 0 0
0000030 0 0 0 f 0 003 C V 1 0 0 003 C V 2 0
0000040 0 0 0 f 0 003 A V 1 0 0 003 A V 2 0
0000050 0 004 m m a p 0 0 0 001 0 0 0 0 0 0
0000060 0 0 0 004 m m a p 0 0 0 001 0 0 0 0
0000070 0 0 0 0
Cassandra 3.0 Summary
hexdump -c mb-1-big-Summary.db
0000000 0 0 0 200 0 0 0 001 0 0 0 0 0 0 0 030
0000010 0 0 0 200 0 0 0 001 004 0 0 0 0 003 C V
0000020 1 0 0 003 C V 2 0 0 0 0 0 0 0 0 0
0000030 0 0 0 f 0 003 C V 1 0 0 003 C V 2 0
0000040 0 0 0 f 0 003 A V 1 0 0 003 A V 2 0
Cassandra 3.0 Statistics Part Beginning
hexdump -c mb-1-big-Statistics.db
0000000 0 0 0 004 0 0 0 0 0 0 0 $ 0 0 0 001
0000010 0 0 0 Y 0 0 0 002 0 0 0 y 0 0 0 003
0000020 0 0 021 250 0 + o r g . a p a c h e
0000030 . c a s s a n d r a . d h t . M
0000040 u r m u r 3 P a r t i t i o n e
0000050 r ? 204 z 341 G 256 024 { 0 0 0 034 377 377 377
0000060 376 r 031 001 005 256 346 275 001 302 263 326 005 334 361 364
0000070 t 354 241 224 n 324 234 337 001 0 0 0 227 0 0 0
0000080 0 0 0 0 001 0 0 0 0 0 0 0 0 0 0 0
Cassandra 3.0 Statistics Tail end
00001230 70 65 29 01 28 6f 72 67 2e 61 70 61 63 68 65 2e |pe).(org.apache.|
00001240 63 61 73 73 61 6e 64 72 61 2e 64 62 2e 6d 61 72 |cassandra.db.mar|
00001250 73 68 61 6c 2e 55 54 46 38 54 79 70 65 00 02 06 |shal.UTF8Type...|
00001260 66 69 65 6c 64 34 28 6f 72 67 2e 61 70 61 63 68 |field4(org.apach|
00001270 65 2e 63 61 73 73 61 6e 64 72 61 2e 64 62 2e 6d |e.cassandra.db.m|
00001280 61 72 73 68 61 6c 2e 55 54 46 38 54 79 70 65 06 |arshal.UTF8Type.|
00001290 66 69 65 6c 64 35 28 6f 72 67 2e 61 70 61 63 68 |field5(org.apach|
000012a0 65 2e 63 61 73 73 61 6e 64 72 61 2e 64 62 2e 6d |e.cassandra.db.m|
000012b0 61 72 73 68 61 6c 2e 55 54 46 38 54 79 70 65 |arshal.UTF8Type|
Cassandra 2.1 First Part
© DataStax, All Rights Reserved. 33
0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 177 377
0000010 377 377 200 0 0 0 0 0 0 0 0 t 0 003 C V
0000020 3 0 0 0 0 0 0 005 : p w w [ 341 0 0
0000030 0 0 0 017 0 003 C V 3 0 0 006 f i e l
0000040 d 4 0 0 0 005 : p w w [ 341 0 0 0 027
0000050 2 0 1 6 - 0 8 - 1 1 1 1 : 3 7
0000060 : 2 5 . 9 7 6 0 017 0 003 C V 3 0 0
0000070 006 f i e l d 5 0 0 0 005 : p w w [
0000080 341 0 0 0 006 1 2 7 . 4 0 0 t 0 003 C
0000090 V 4 0 0 0 0 0 0 005 : p w w n / 0
00000a0 0 0 0 0 017 0 003 C V 4 0 0 006 f i e
00000b0 l d 4 0 0 0 005 : p w w n / 0 0 0
00000c0 027 2 0 1 6 - 0 8 - 1 1 1 1 : 3
00000d0 7 : 2 5 . 9 7 6 0 017 0 003 C V 4 0
00000e0 0 006 f i e l d 5 0 0 0 005 : p w w
00000f0 n / 0 0 0 006 1 2 7 . 5 0 0 t 0 003
0000100 C V 5 0 0 0 0 0 0 005 : p w w 200 261
0000110 0 0 0 0 0 017 0 003 C V 5 0 0 006 f i
0000120 e l d 4 0 0 0 005 : p w w 200 261 0 0
0000130 0 027 2 0 1 6 - 0 8 - 1 1 1 1 :
Cassandra 2.1 Second Part
© DataStax, All Rights Reserved. 34
0000140 3 7 : 2 5 . 9 7 6 0 017 0 003 C V 5
0000150 0 0 006 f i e l d 5 0 0 0 005 : p w
0000160 w 200 261 0 0 0 006 1 2 7 . 6 0 0 t 0
0000170 003 C V 6 0 0 0 0 0 0 005 : p w w 213
0000180 a 0 0 0 0 0 017 0 003 C V 6 0 0 006 f
0000190 i e l d 4 0 0 0 005 : p w w 213 a 0
00001a0 0 0 027 2 0 1 6 - 0 8 - 1 1 1 1
00001b0 : 3 7 : 2 5 . 9 7 6 0 017 0 003 C V
00001c0 6 0 0 006 f i e l d 5 0 0 0 005 : p
00001d0 w w 213 a 0 0 0 006 1 2 7 . 7 0 0 0
00001e0 0 f 0 003 F V 1 0 0 003 F V 2 0 177 377
00001f0 377 377 200 0 0 0 0 0 0 0 0 t 0 003 F V
0000200 3 0 0 0 0 0 0 005 : p w v 271 d 0 0
0000210 0 0 0 017 0 003 F V 3 0 0 006 f i e l
0000220 d 4 0 0 0 005 : p w v 271 d 0 0 0 027
0000230 2 0 1 6 - 0 8 - 1 1 1 1 : 3 7
0000240 : 2 5 . 9 7 6 0 017 0 003 F V 3 0 0
0000250 006 f i e l d 5 0 0 0 005 : p w v 271
0000260 d 0 0 0 006 1 2 7 . 0 0 0 0 0 f 0
Cassandra 2.1 Third Part
© DataStax, All Rights Reserved. 35
0000270 003 G V 1 0 0 003 G V 2 0 177 377 377 377 200
0000280 0 0 0 0 0 0 0 0 t 0 003 G V 3 0 0
0000290 0 0 0 0 005 : p w w : 255 0 0 0 0 0
00002a0 017 0 003 G V 3 0 0 006 f i e l d 4 0
00002b0 0 0 005 : p w w : 255 0 0 0 027 2 0 1
00002c0 6 - 0 8 - 1 1 1 1 : 3 7 : 2 5
00002d0 . 9 7 6 0 017 0 003 G V 3 0 0 006 f i
00002e0 e l d 5 0 0 0 005 : p w w : 255 0 0
00002f0 0 006 1 2 7 . 1 0 0 0 0 f 0 003 B V
0000300 1 0 0 003 B V 2 0 177 377 377 377 200 0 0 0
0000310 0 0 0 0 0 t 0 003 B V 3 0 0 0 0 0
0000320 0 005 : p w w Q 234 0 0 0 0 0 017 0 003
0000330 B V 3 0 0 006 f i e l d 4 0 0 0 005
0000340 : p w w Q 234 0 0 0 027 2 0 1 6 - 0
0000350 8 - 1 1 1 1 : 3 7 : 2 5 . 9 7
0000360 6 0 017 0 003 B V 3 0 0 006 f i e l d
0000370 5 0 0 0 005 : p w w Q 234 0 0 0 006 1
0000380 2 7 . 3 0 0 0 0 f 0 003 A V 1 0 0
0000390 003 A V 2 0 177 377 377 377 200 0 0 0 0 0 0
Cassandra 3.0 Part A
© DataStax, All Rights Reserved. 36
0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 177 377
0000010 377 377 200 0 0 0 0 0 0 0 $ 0 003 C V 3
0000020 $ 032 270 221 b 027 2 0 1 6 - 0 8 - 1 1
0000030 1 1 : 3 7 : 2 5 . 9 7 6 b 006 1
0000040 2 7 . 4 0 $ 0 003 C V 4 % + 300 D a
0000050 b 027 2 0 1 6 - 0 8 - 1 1 1 1 :
0000060 3 7 : 2 5 . 9 7 6 b 006 1 2 7 . 5
0000070 0 $ 0 003 C V 5 % , 300 N 234 b 027 2 0
0000080 1 6 - 0 8 - 1 1 1 1 : 3 7 : 2
0000090 5 . 9 7 6 b 006 1 2 7 . 6 0 $ 0 003
00000a0 C V 6 % , 300 [ ` b 027 2 0 1 6 - 0
00000b0 8 - 1 1 1 1 : 3 7 : 2 5 . 9 7
00000c0 6 b 006 1 2 7 . 7 0 001 0 f 0 003 F V
00000d0 1 0 0 003 F V 2 0 177 377 377 377 200 0 0 0
00000e0 0 0 0 0 $ 0 003 F V 3 # 032 0 b 027 2
00000f0 0 1 6 - 0 8 - 1 1 1 1 : 3 7 :
0000100 2 5 . 9 7 6 b 006 1 2 7 . 0 0 001 0
0000110 f 0 003 G V 1 0 0 003 G V 2 0 177 377 377
Cassandra 3.0 Part B
© DataStax, All Rights Reserved. 37
0000120 377 200 0 0 0 0 0 0 0 $ 0 003 G V 3 $
0000130 032 221 346 b 027 2 0 1 6 - 0 8 - 1 1
0000140 1 1 : 3 7 : 2 5 . 9 7 6 b 006 1 2
0000150 7 . 1 0 001 0 f 0 003 B V 1 0 0 003 B
0000160 V 2 0 177 377 377 377 200 0 0 0 0 0 0 0 $
0000170 0 003 B V 3 $ 032 255 036 b 027 2 0 1 6 -
0000180 0 8 - 1 1 1 1 : 3 7 : 2 5 . 9
0000190 7 6 b 006 1 2 7 . 3 0 001 0 f 0 003 A
00001a0 V 1 0 0 003 A V 2 0 177 377 377 377 200 0 0
00001b0 0 0 0 0 0 $ 0 003 A V 3 $ 032 240 > b
00001c0 027 2 0 1 6 - 0 8 - 1 1 1 1 : 3
00001d0 7 : 2 5 . 9 7 6 b 006 1 2 7 . 2 0
00001e0 001
Bloom Filter contents 2.1 and 3.0
hexdump stuff-simplefields-ka-1-Filter.db
0000000 0000 0500 0000 0200 0032 2090 3002 0009
0000010 1803 0110 0022 40c0
hexdump mb-1-big-Filter.db
0000000 0000 0500 0000 0200 1808 5068 8822 0019
0000010 1012 1100 0002 4180
© DataStax, All Rights Reserved. 38
Conclusions References and questions
Conclusions
• Cassandra 3.0 SSTable format is considerably more compact
• In Cassandra 3.0 field names are no longer stored in the data file field Ids are
used instead
• TTL and Timestamps are grouped
• TTL and Timestamp Differences are stored as deltas
• Sorted String Tables are sorted…. By Partition Key Token Value
• The Statistics file is full of a bunch of different stuff
• Data, Summary, Filter and Statistics have important useful stuff
© DataStax, All Rights Reserved. 40
References
http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-
storage-format.html
https://github.com/scylladb/scylla/wiki/Storage-On-Disk-File-Formats
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-
cassandra-3-storage-engine.html
© DataStax, All Rights Reserved. 41
QUESTIONS?
© DataStax, All Rights Reserved. 42
In version "ka", contents of Statistics component file is split in to 3 types of metadata called validation,
compaction and stats. Validation metadata is used to validate SSTable which includes partitioner and
bloom filter fp chance fields. Compaction metadata includes ancestors information which is also
available in older formats and a new field called cardinality estimator. Cardinality estimator is used to
efficiently pre-allocate bloom filter space in a merged compaction file by estimating how much the input
SStables overlap. Stats metadata contains rest of the information available in old formats and two
additional fields. First one is a flag to track the presence of local/remote counter shards and the other
one is for storing the repair time.
© DataStax, All Rights Reserved. 43

More Related Content

What's hot

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 

What's hot (20)

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 
Maximum Overdrive: Tuning the Spark Cassandra Connector
Maximum Overdrive: Tuning the Spark Cassandra ConnectorMaximum Overdrive: Tuning the Spark Cassandra Connector
Maximum Overdrive: Tuning the Spark Cassandra Connector
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 

Viewers also liked

What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
DataStax
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
DataStax
 

Viewers also liked (20)

Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?Can My Inventory Survive Eventual Consistency?
Can My Inventory Survive Eventual Consistency?
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends Report
 

Similar to What is in All of Those SSTable Files Not Just the Data One but All the Rest Too! (John Schulz, The Pythian Group) | Cassandra Summit 2016

Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
VirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWRVirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWR
Kristofferson A
 

Similar to What is in All of Those SSTable Files Not Just the Data One but All the Rest Too! (John Schulz, The Pythian Group) | Cassandra Summit 2016 (20)

SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
 
Backups
BackupsBackups
Backups
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Redo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12cRedo logfile addition in oracle rac 12c
Redo logfile addition in oracle rac 12c
 
As400 load all subfile
As400   load all subfileAs400   load all subfile
As400 load all subfile
 
Aermap userguide under-revision
Aermap userguide under-revisionAermap userguide under-revision
Aermap userguide under-revision
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
spark stream - kafka - the right way
spark stream - kafka - the right way spark stream - kafka - the right way
spark stream - kafka - the right way
 
Adaptive Query Processing on RAW Data
Adaptive Query Processing on RAW DataAdaptive Query Processing on RAW Data
Adaptive Query Processing on RAW Data
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Moving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMoving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASM
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Apache Spark Data intensive processing in practice
Apache Spark Data intensive processing in practice Apache Spark Data intensive processing in practice
Apache Spark Data intensive processing in practice
 
Amazon DynamoDB Workshop
Amazon DynamoDB WorkshopAmazon DynamoDB Workshop
Amazon DynamoDB Workshop
 
VirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWRVirtaThon 2011 - Mining the AWR
VirtaThon 2011 - Mining the AWR
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

Recently uploaded (20)

BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 

What is in All of Those SSTable Files Not Just the Data One but All the Rest Too! (John Schulz, The Pythian Group) | Cassandra Summit 2016

  • 1. John H Schulz What is in all of those sstable files? Not just the data one, but all the rest too!
  • 2. 1 Bio and My Employer 2 Some background 3 The Test Table and data 4 The actual file contents 5 Conclusions and Other Stuff 2© DataStax, All Rights Reserved.
  • 4. BIO Likes databases more than cheese sandwiches with peppered bacon and onions Open source databases since 2003 Relational databases since 1984 Owned one of the first Apple IIs in the neighborhood Lives in Michigan near the Detroit Zoo
  • 5. About Pythian 11,400 Pythian currently manages more than 11,400 systems. 400+ Pythian currently employs more than 400 people in 200 cities in 35 countries 1997 Pythian was founded in 1997 Global Leader In IT Transformation And Operational Excellence Unparalleled Expertise • Top 5% in databases, applications, infrastructure, Big Data, Cloud, Data Science, and DevOps Unmatched Certifications • 9 Oracle ACEs, 4 Oracle ACE Directors, 1 Oracle ACE Associate • 6 Microsoft MVPs, 1 Microsoft Certified Master • 5 Google Platform Qualified Developers • 1 Cloudera Champion of Big Data • 1 Mongo DB Certified DBA Associate Level • DataStax Certified Partner, 1 MVP, 3 Architect, 2 Certified administrators Broad Technical Experience • Oracle, Microsoft, MySQL, Oracle EBS, Hadoop, Cassandra, MongoDB, virtualization, configuration management, monitoring, trending, and more.
  • 8. Some Terminology Partition Key The column(s) which are hashed to create a token mapped by the partitioner Partition The rows and columns which belong to a specific Partition Key Clustering Key The column(s) which, with the Partition Key define a row Column A specific entity with a type (can be user defined) a value, version, time to live and a name
  • 9. More on columns A single column can stand on its own in an SSTable file What is the minimum information needed to find and use a column Column Name Data type Cluster column values Partition column values Version (usually a timestamp) Time to Live Value
  • 10. The Cassandra data directory structure data key-spaces Tables backups snapshots SSTable Files
  • 11. Example partial directory tree from Cassandra 2.1 [root@Snoopy-3 node1]# ls -lR data data: total 12 drwxr-xr-x. 3 root root 4096 Aug 19 13:59 stuff drwxr-xr-x. 19 root root 4096 Aug 19 13:58 system drwxr-xr-x. 4 root root 4096 Aug 19 13:58 system_traces data/stuff: total 4 drwxr-xr-x. 2 root root 4096 Aug 19 14:00 simplefields-bdd61590663611e69c3e1d84c92693ab data/stuff/simplefields-bdd61590663611e69c3e1d84c92693ab: total 36
  • 12. SSTable files in 2.1.9 8 Aug 19 14:00 stuff-simplefields-ka-1-CRC.db 1044 Aug 19 14:00 stuff-simplefields-ka-1-Data.db 9 Aug 19 14:00 stuff-simplefields-ka-1-Digest.sha1 24 Aug 19 14:00 stuff-simplefields-ka-1-Filter.db 130 Aug 19 14:00 stuff-simplefields-ka-1-Index.db 4460 Aug 19 14:00 stuff-simplefields-ka-1-Statistics.db 116 Aug 19 14:00 stuff-simplefields-ka-1-Summary.db 79 Aug 19 14:00 stuff-simplefields-ka-1-TOC.txt
  • 13. The SSTable files Cassandra 3.0 8 Aug 19 10:42 mb-1-big-CRC.db 481 Aug 19 10:42 mb-1-big-Data.db 10 Aug 19 10:42 mb-1-big-Digest.crc32 24 Aug 19 10:42 mb-1-big-Filter.db 84 Aug 19 10:42 mb-1-big-Index.db 4799 Aug 19 10:42 mb-1-big-Statistics.db 80 Aug 19 10:42 mb-1-big-Summary.db 80 Aug 19 10:42 mb-1-big-TOC.txt
  • 14. Parts of an SSTable file name Pre 2.2 stuff-simplefields-ia-1-Data.db 2.2 going forward mb-1-big-Data.db Version Counter What it is Keyspace table name Version Counter What it is
  • 15. Data File Size Changed!!! Cassandra 2.1 1044 Aug 19 14:00 stuff-simplefields-ka-1-Data.db Cassandra 3.0 481 Aug 19 10:42 mb-1-big-Data.db
  • 16. A relational view of how the files are connected © DataStax, All Rights Reserved. 16 DATA Summary Index Filter Yes/No Statistics Digest Compression TOC CRC METADATA
  • 17. Metadata • Partitioner • Commit log location • Field names and types • Bloom filter chance • Counter information • Repair time • Cardinality estimate • Checksums • Compression Algorithm • Filenames © DataStax, All Rights Reserved. 17
  • 18. The Test Table and Data
  • 19. My test table CREATE TABLE stuff.simplefields ( field1 text, field2 text, field3 text, field4 timestamp, field5 decimal, PRIMARY KEY ((field1, field2), field3) ) WITH CLUSTERING ORDER BY (field3 ASC) AND compression = {};
  • 20. Some test data to put in it insert into simpleFields (field1, field2, field3, field4, field5) values('FV1', 'FV2' , 'FV3', '2016-08-11 11:37:25.976', '127.00'); insert into simpleFields (field1, field2, field3, field4, field5) values('GV1', 'GV2' , 'GV3', '2016-08-11 11:37:25.976', '127.10'); insert into simpleFields (field1, field2, field3, field4, field5) values('AV1', 'AV2' , 'AV3', '2016-08-11 11:37:25.976', '127.20'); insert into simpleFields (field1, field2, field3, field4, field5) values('BV1', 'BV2' , 'BV3', '2016-08-11 11:37:25.976', '127.30'); insert into simpleFields (field1, field2, field3, field4, field5) values('CV1', 'CV2' , 'CV3', '2016-08-11 11:37:25.976', '127.40'); insert into simpleFields (field1, field2, field3, field4, field5) values('CV1', 'CV2' , 'CV4', '2016-08-11 11:37:25.976', '127.50'); insert into simpleFields (field1, field2, field3, field4, field5) values('CV1', 'CV2' , 'CV5', '2016-08-11 11:37:25.976', '127.60'); insert into simpleFields (field1, field2, field3, field4, field5) values('CV1', 'CV2' , 'CV6', '2016-08-11 11:37:25.976', '127.70');
  • 21. My Data in the Table field1 | field2 | field3 | field4 | field5 --------+--------+--------+-------------------------+-------- CV1 | CV2 | CV3 | 2016-08-11 11:37:25.976 | 127.40 CV1 | CV2 | CV4 | 2016-08-11 11:37:25.976 | 127.50 CV1 | CV2 | CV5 | 2016-08-11 11:37:25.976 | 127.60 CV1 | CV2 | CV6 | 2016-08-11 11:37:25.976 | 127.70 FV1 | FV2 | FV3 | 2016-08-11 11:37:25.976 | 127.00 GV1 | GV2 | GV3 | 2016-08-11 11:37:25.976 | 127.10 BV1 | BV2 | BV3 | 2016-08-11 11:37:25.976 | 127.30 AV1 | AV2 | AV3 | 2016-08-11 11:37:25.976 | 127.20
  • 22. How I got my clusters going Using CCM – Cassandra Cluster Manager https://github.com/pcmanus/ccm Example of creating a cluster: ccm create --vnodes --nodes=1 –v 2.1.9--start ver-219 ccm node1 nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 196.38 KB 256 ? 6ecec1cd-424d-4173-a83f-aefe485f4e2c rack1 © DataStax, All Rights Reserved. 22
  • 23. Clusters currently on my Machine ver-12a ver-219 ver227 ver_33 ver-12a-multi ver-219-single ver-227-single ver_20_11a ver-221 ver-301 test ver-2011-single ver_225 ver-308 ver-120-single ver-213-multi ver_225_6 ver-308-single © DataStax, All Rights Reserved. 23
  • 24. A peek inside the files
  • 25. Quick overview of each File CRC – A single CRC of the Data file Compression – The name of the compression engine used Data – The SSTable data stored in sorted order by Partition/Cluster Key Digest – One or more CRCs one per 2GB segment of the Data file Filter – The bloom filter – folded hashes of all key values in the Data file Index – A list of all partition/Cluster Keys Statistics – Hodgepodge of metadata about the SSTable Summary – A list of some of the Partition/Cluster keys – Think second level index TOC – Text list of the files
  • 26. A peek inside the less interesting files hexdump mb-1-big-CRC.db 0000000 0100 0000 c7ff 8b91 0000008 cat mb-1-big-Digest.crc32 4291269003 cat mb-1-big-TOC.txt Filter.db CRC.db Data.db Digest.crc32 Summary.db TOC.txt Statistics.db Index.db
  • 27. Cassandra 2.1 Index 0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 0 0 0000010 0 0 0 0 0 0 0 0 0 0 0 f 0 003 F V 0000020 1 0 0 003 F V 2 0 0 0 0 0 0 0 001 340 0000030 0 0 0 0 0 f 0 003 G V 1 0 0 003 G V 0000040 2 0 0 0 0 0 0 0 002 m 0 0 0 0 0 f 0000050 0 003 B V 1 0 0 003 B V 2 0 0 0 0 0 0000060 0 0 002 372 0 0 0 0 0 f 0 003 A V 1 0 0000070 0 003 A V 2 0 0 0 0 0 0 0 003 207 0 0 0000080 0 0
  • 28. Cassandra 3.0 Index hexdump -c mb-1-big-Index.db 0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 0 0 0000010 0 f 0 003 F V 1 0 0 003 F V 2 0 200 312 0000020 0 0 f 0 003 G V 1 0 0 003 G V 2 0 201 0000030 017 0 0 f 0 003 B V 1 0 0 003 B V 2 0 0000040 201 U 0 0 f 0 003 A V 1 0 0 003 A V 2 0000050 0 201 233 0
  • 29. Cassandra 2.1 Summary 0000000 0 0 0 200 0 0 0 001 0 0 0 0 0 0 0 030 0000010 0 0 0 200 0 0 0 001 004 0 0 0 0 003 C V 0000020 1 0 0 003 C V 2 0 0 0 0 0 0 0 0 0 0000030 0 0 0 f 0 003 C V 1 0 0 003 C V 2 0 0000040 0 0 0 f 0 003 A V 1 0 0 003 A V 2 0 0000050 0 004 m m a p 0 0 0 001 0 0 0 0 0 0 0000060 0 0 0 004 m m a p 0 0 0 001 0 0 0 0 0000070 0 0 0 0
  • 30. Cassandra 3.0 Summary hexdump -c mb-1-big-Summary.db 0000000 0 0 0 200 0 0 0 001 0 0 0 0 0 0 0 030 0000010 0 0 0 200 0 0 0 001 004 0 0 0 0 003 C V 0000020 1 0 0 003 C V 2 0 0 0 0 0 0 0 0 0 0000030 0 0 0 f 0 003 C V 1 0 0 003 C V 2 0 0000040 0 0 0 f 0 003 A V 1 0 0 003 A V 2 0
  • 31. Cassandra 3.0 Statistics Part Beginning hexdump -c mb-1-big-Statistics.db 0000000 0 0 0 004 0 0 0 0 0 0 0 $ 0 0 0 001 0000010 0 0 0 Y 0 0 0 002 0 0 0 y 0 0 0 003 0000020 0 0 021 250 0 + o r g . a p a c h e 0000030 . c a s s a n d r a . d h t . M 0000040 u r m u r 3 P a r t i t i o n e 0000050 r ? 204 z 341 G 256 024 { 0 0 0 034 377 377 377 0000060 376 r 031 001 005 256 346 275 001 302 263 326 005 334 361 364 0000070 t 354 241 224 n 324 234 337 001 0 0 0 227 0 0 0 0000080 0 0 0 0 001 0 0 0 0 0 0 0 0 0 0 0
  • 32. Cassandra 3.0 Statistics Tail end 00001230 70 65 29 01 28 6f 72 67 2e 61 70 61 63 68 65 2e |pe).(org.apache.| 00001240 63 61 73 73 61 6e 64 72 61 2e 64 62 2e 6d 61 72 |cassandra.db.mar| 00001250 73 68 61 6c 2e 55 54 46 38 54 79 70 65 00 02 06 |shal.UTF8Type...| 00001260 66 69 65 6c 64 34 28 6f 72 67 2e 61 70 61 63 68 |field4(org.apach| 00001270 65 2e 63 61 73 73 61 6e 64 72 61 2e 64 62 2e 6d |e.cassandra.db.m| 00001280 61 72 73 68 61 6c 2e 55 54 46 38 54 79 70 65 06 |arshal.UTF8Type.| 00001290 66 69 65 6c 64 35 28 6f 72 67 2e 61 70 61 63 68 |field5(org.apach| 000012a0 65 2e 63 61 73 73 61 6e 64 72 61 2e 64 62 2e 6d |e.cassandra.db.m| 000012b0 61 72 73 68 61 6c 2e 55 54 46 38 54 79 70 65 |arshal.UTF8Type|
  • 33. Cassandra 2.1 First Part © DataStax, All Rights Reserved. 33 0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 177 377 0000010 377 377 200 0 0 0 0 0 0 0 0 t 0 003 C V 0000020 3 0 0 0 0 0 0 005 : p w w [ 341 0 0 0000030 0 0 0 017 0 003 C V 3 0 0 006 f i e l 0000040 d 4 0 0 0 005 : p w w [ 341 0 0 0 027 0000050 2 0 1 6 - 0 8 - 1 1 1 1 : 3 7 0000060 : 2 5 . 9 7 6 0 017 0 003 C V 3 0 0 0000070 006 f i e l d 5 0 0 0 005 : p w w [ 0000080 341 0 0 0 006 1 2 7 . 4 0 0 t 0 003 C 0000090 V 4 0 0 0 0 0 0 005 : p w w n / 0 00000a0 0 0 0 0 017 0 003 C V 4 0 0 006 f i e 00000b0 l d 4 0 0 0 005 : p w w n / 0 0 0 00000c0 027 2 0 1 6 - 0 8 - 1 1 1 1 : 3 00000d0 7 : 2 5 . 9 7 6 0 017 0 003 C V 4 0 00000e0 0 006 f i e l d 5 0 0 0 005 : p w w 00000f0 n / 0 0 0 006 1 2 7 . 5 0 0 t 0 003 0000100 C V 5 0 0 0 0 0 0 005 : p w w 200 261 0000110 0 0 0 0 0 017 0 003 C V 5 0 0 006 f i 0000120 e l d 4 0 0 0 005 : p w w 200 261 0 0 0000130 0 027 2 0 1 6 - 0 8 - 1 1 1 1 :
  • 34. Cassandra 2.1 Second Part © DataStax, All Rights Reserved. 34 0000140 3 7 : 2 5 . 9 7 6 0 017 0 003 C V 5 0000150 0 0 006 f i e l d 5 0 0 0 005 : p w 0000160 w 200 261 0 0 0 006 1 2 7 . 6 0 0 t 0 0000170 003 C V 6 0 0 0 0 0 0 005 : p w w 213 0000180 a 0 0 0 0 0 017 0 003 C V 6 0 0 006 f 0000190 i e l d 4 0 0 0 005 : p w w 213 a 0 00001a0 0 0 027 2 0 1 6 - 0 8 - 1 1 1 1 00001b0 : 3 7 : 2 5 . 9 7 6 0 017 0 003 C V 00001c0 6 0 0 006 f i e l d 5 0 0 0 005 : p 00001d0 w w 213 a 0 0 0 006 1 2 7 . 7 0 0 0 00001e0 0 f 0 003 F V 1 0 0 003 F V 2 0 177 377 00001f0 377 377 200 0 0 0 0 0 0 0 0 t 0 003 F V 0000200 3 0 0 0 0 0 0 005 : p w v 271 d 0 0 0000210 0 0 0 017 0 003 F V 3 0 0 006 f i e l 0000220 d 4 0 0 0 005 : p w v 271 d 0 0 0 027 0000230 2 0 1 6 - 0 8 - 1 1 1 1 : 3 7 0000240 : 2 5 . 9 7 6 0 017 0 003 F V 3 0 0 0000250 006 f i e l d 5 0 0 0 005 : p w v 271 0000260 d 0 0 0 006 1 2 7 . 0 0 0 0 0 f 0
  • 35. Cassandra 2.1 Third Part © DataStax, All Rights Reserved. 35 0000270 003 G V 1 0 0 003 G V 2 0 177 377 377 377 200 0000280 0 0 0 0 0 0 0 0 t 0 003 G V 3 0 0 0000290 0 0 0 0 005 : p w w : 255 0 0 0 0 0 00002a0 017 0 003 G V 3 0 0 006 f i e l d 4 0 00002b0 0 0 005 : p w w : 255 0 0 0 027 2 0 1 00002c0 6 - 0 8 - 1 1 1 1 : 3 7 : 2 5 00002d0 . 9 7 6 0 017 0 003 G V 3 0 0 006 f i 00002e0 e l d 5 0 0 0 005 : p w w : 255 0 0 00002f0 0 006 1 2 7 . 1 0 0 0 0 f 0 003 B V 0000300 1 0 0 003 B V 2 0 177 377 377 377 200 0 0 0 0000310 0 0 0 0 0 t 0 003 B V 3 0 0 0 0 0 0000320 0 005 : p w w Q 234 0 0 0 0 0 017 0 003 0000330 B V 3 0 0 006 f i e l d 4 0 0 0 005 0000340 : p w w Q 234 0 0 0 027 2 0 1 6 - 0 0000350 8 - 1 1 1 1 : 3 7 : 2 5 . 9 7 0000360 6 0 017 0 003 B V 3 0 0 006 f i e l d 0000370 5 0 0 0 005 : p w w Q 234 0 0 0 006 1 0000380 2 7 . 3 0 0 0 0 f 0 003 A V 1 0 0 0000390 003 A V 2 0 177 377 377 377 200 0 0 0 0 0 0
  • 36. Cassandra 3.0 Part A © DataStax, All Rights Reserved. 36 0000000 0 f 0 003 C V 1 0 0 003 C V 2 0 177 377 0000010 377 377 200 0 0 0 0 0 0 0 $ 0 003 C V 3 0000020 $ 032 270 221 b 027 2 0 1 6 - 0 8 - 1 1 0000030 1 1 : 3 7 : 2 5 . 9 7 6 b 006 1 0000040 2 7 . 4 0 $ 0 003 C V 4 % + 300 D a 0000050 b 027 2 0 1 6 - 0 8 - 1 1 1 1 : 0000060 3 7 : 2 5 . 9 7 6 b 006 1 2 7 . 5 0000070 0 $ 0 003 C V 5 % , 300 N 234 b 027 2 0 0000080 1 6 - 0 8 - 1 1 1 1 : 3 7 : 2 0000090 5 . 9 7 6 b 006 1 2 7 . 6 0 $ 0 003 00000a0 C V 6 % , 300 [ ` b 027 2 0 1 6 - 0 00000b0 8 - 1 1 1 1 : 3 7 : 2 5 . 9 7 00000c0 6 b 006 1 2 7 . 7 0 001 0 f 0 003 F V 00000d0 1 0 0 003 F V 2 0 177 377 377 377 200 0 0 0 00000e0 0 0 0 0 $ 0 003 F V 3 # 032 0 b 027 2 00000f0 0 1 6 - 0 8 - 1 1 1 1 : 3 7 : 0000100 2 5 . 9 7 6 b 006 1 2 7 . 0 0 001 0 0000110 f 0 003 G V 1 0 0 003 G V 2 0 177 377 377
  • 37. Cassandra 3.0 Part B © DataStax, All Rights Reserved. 37 0000120 377 200 0 0 0 0 0 0 0 $ 0 003 G V 3 $ 0000130 032 221 346 b 027 2 0 1 6 - 0 8 - 1 1 0000140 1 1 : 3 7 : 2 5 . 9 7 6 b 006 1 2 0000150 7 . 1 0 001 0 f 0 003 B V 1 0 0 003 B 0000160 V 2 0 177 377 377 377 200 0 0 0 0 0 0 0 $ 0000170 0 003 B V 3 $ 032 255 036 b 027 2 0 1 6 - 0000180 0 8 - 1 1 1 1 : 3 7 : 2 5 . 9 0000190 7 6 b 006 1 2 7 . 3 0 001 0 f 0 003 A 00001a0 V 1 0 0 003 A V 2 0 177 377 377 377 200 0 0 00001b0 0 0 0 0 0 $ 0 003 A V 3 $ 032 240 > b 00001c0 027 2 0 1 6 - 0 8 - 1 1 1 1 : 3 00001d0 7 : 2 5 . 9 7 6 b 006 1 2 7 . 2 0 00001e0 001
  • 38. Bloom Filter contents 2.1 and 3.0 hexdump stuff-simplefields-ka-1-Filter.db 0000000 0000 0500 0000 0200 0032 2090 3002 0009 0000010 1803 0110 0022 40c0 hexdump mb-1-big-Filter.db 0000000 0000 0500 0000 0200 1808 5068 8822 0019 0000010 1012 1100 0002 4180 © DataStax, All Rights Reserved. 38
  • 40. Conclusions • Cassandra 3.0 SSTable format is considerably more compact • In Cassandra 3.0 field names are no longer stored in the data file field Ids are used instead • TTL and Timestamps are grouped • TTL and Timestamp Differences are stored as deltas • Sorted String Tables are sorted…. By Partition Key Token Value • The Statistics file is full of a bunch of different stuff • Data, Summary, Filter and Statistics have important useful stuff © DataStax, All Rights Reserved. 40
  • 42. QUESTIONS? © DataStax, All Rights Reserved. 42
  • 43. In version "ka", contents of Statistics component file is split in to 3 types of metadata called validation, compaction and stats. Validation metadata is used to validate SSTable which includes partitioner and bloom filter fp chance fields. Compaction metadata includes ancestors information which is also available in older formats and a new field called cardinality estimator. Cardinality estimator is used to efficiently pre-allocate bloom filter space in a merged compaction file by estimating how much the input SStables overlap. Stats metadata contains rest of the information available in old formats and two additional fields. First one is a flag to track the presence of local/remote counter shards and the other one is for storing the repair time. © DataStax, All Rights Reserved. 43