SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
CHRONIX SPARK
TIME SERIES PROCESSING WITH SPARK
Dr. Josef Adersberger ( @adersberger)
TIME SERIES 101
TIME SERIES 101
WE`RE SURROUNDED BY TIME SERIES
▸ Operational data: Monitoring data, performance metrics, log events, …
▸ Data Warehouse: Dimension time
▸ Measured Me: Activity tracking, ECG, …
▸ Sensor telemetry: Sensor data, …
▸ Financial data: Stock charts, …
▸ Climate data: Temperature, …
▸ Web tracking: Clickstreams, …
TIME SERIES 101
TIME SERIES: BASIC TERMS
univariate time series multivariate time series multi-dimensional time
series (time series tensor)
time series setobservation
TIME SERIES 101
OPERATIONS ON TIME SERIES (EXAMPLES)
align
Time series Time series
Time series Scalar
diff downsampling outlier
min/max avg/med slope std-dev
OUR USE CASE
Monitoring Data Analysis 

of a business-critical,

worldwide distributed 

software system. Enable

root cause analysis and

anomaly detection.

> 1,000 nodes worldwide
> 10 processes per node
> 20 metrics per process

(OS, JVM, App-spec.)
Measured every second.
= about 6.3 trillions observations p.a.

Data retention: 5 yrs.
http://www.datasciencecentral.com
THE CHRONIX STACK
THE CHRONIX STACK
Core
Chronix Storage
Chronix Server
Chronix SparkChronixFormat
GrafanaChronix Analytics
Collection
Visualization
Chronix CollectorLogstash fluentd
jmx
collectd
ssh
Zeppelin
THE CHRONIX STACK
node
Distributed Data &

Data Retrieval
Distributed Processing
Result Processing data flow
icon credits to Nimal Raj (database), Arthur Shlain (console) and alvarobueno (takslist)
}
}
ChronixSparkChronixServer
USE CASE
CHRONIX ANALYTICS: EXPLORING MULTI-DIMENSIONAL TIME SERIES
USE CASE
CHRONIX ANALYTICS: ANOMALY DETECTION
Featuring Twitter Anomaly Detection (https://github.com/twitter/AnomalyDetection

and Yahoo EGDAS https://github.com/yahoo/egads
USE CASE
ZEPPELIN ON CHRONIX
https://github.com/ChronixDB/chronix.spark
EASY-TO-USE BIG TIME
SERIES DATA STORAGE &
PROCESSING ON SPARK
MISSION
MISSION
(as well as for data scientists)
CHRONIX SPARK
TIME SERIES MODEL
Set of univariate multi-dimensional numeric time series
▸ set … because it’s more flexible and better to parallelise if operations can
input and output multiple time series.
▸ univariate … because multivariate will introduce too much complexity (and
we have our set to bundle multiple time series).
▸ multi-dimensional … because the ability to slice & dice in the set of time
series is very convenient for a lot of use cases.
▸ numeric … because it’s the most common use case.
A single time series is identified by a combination of its non-temporal
dimensional values (e.g. unit “mem usage” + host “aws42” + process
“tomcat”)
CHRONIX SPARK
CHRONIX SPARK


ChronixRDD
ChronixSparkContext
‣ Represents a set of time series
‣ Distributed operations on sets of time series
‣ Creates ChronixRDDs
‣ Speaks with the Chronix Server (Solr)
CHRONIX SPARK
ChronixRDD
transform to a Dataset
extends
transform to a
DataFrame (SQL!)
the set characteristic: 

a JavaRDD of MetricTimeSeries
CHRONIX SPARK
SPARK APIS FOR DATA PROCESSING
RDD DataFrame Dataset
typed yes no yes
optimized medium highly highly
mature yes yes no
SQL no yes no
CHRONIX SPARK
THE MetricTimeSeries DATA TYPE
access all timestamps
access all observations as
stream
the multi-dimensionality:

get/set dimensions

(attributes)
access all numeric values

(univariate)
CHRONIX SPARK
THE OVERALL DATA MODEL
ChronixRDD
MetricTimeSeries
MetricObservation Dataset<MetricObservation>
Dataset<MetricTimeSeries>
DataFrame
toDataFrame()
toDataset()
toObservationsDataset()
CHRONIX SPARK
ChronixSparkContext
RDD on all time series matched by a SolrQuery:
/**

* @param query Solr query

* @param zkHost Zookeeper host

* @param collection the Solr collection of chronix time series data

* @param chronixStorage a ChronixSolrCloudStorage instance

* @return ChronixRDD of time series

*/

public ChronixRDD query(

final SolrQuery query,

final String zkHost,

final String collection,

final ChronixSolrCloudStorage chronixStorage) {
CHRONIX SPARK
SAMPLE CODE
//Create Chronix Spark context from a SparkContext / JavaSparkContext

ChronixSparkContext csc = new ChronixSparkContext(sc);



//Read data into ChronixRDD

SolrQuery query = new SolrQuery(

"metric:"java.lang:type=Memory/HeapMemoryUsage/used"");



ChronixRDD rdd = csc.query(query,

"localhost:9983", //ZooKeeper host

"chronix", //Solr collection for Chronix

new ChronixSolrCloudStorage());



//Calculate the overall min/max/mean of all time series in the RDD

double min = rdd.min();

double max = rdd.max();

double mean = rdd.mean();
DEMO TIME
‣ 8,707 time series with 76,983,735 observations
‣ one MacBook with 4 cores
https://github.com/ChronixDB/chronix.spark/tree/master/chronix-infrastructure-local
A TRIP TO



CHRONIX SPARK



WONDERLAND
CHRONIX SPARK WONDERLAND
‣ Data sharding
‣ Fast index-based queries and
aggregations
‣ Efficient storage format
‣ Heavy lifting distributed
processing
‣ Catalyst processing optimizer
‣ Post-processing on a smaller
set of time series (e.g. complex
analysis algorithms)
CHRONIX SPARK WONDERLAND
}
}
ChronixSparkChronixServer
… with a few custom extensions.
▸ Index machine.
▸ Powerful query language based on Lucene. Powerful aggregation
features (facets). E.g. groups way better than Spark.
CHRONIX SPARK WONDERLAND
ARCHITECTURE
Shard2
Solr Server
Zookeeper
Solr ServerSolr Server
Shard1
Zookeeper Zookeeper Zookeeper Cluster
Solr Cloud
Leader
Scale Out
Shard3
Replica8 Replica9
Shard5Shard4 Shard6 Shard8Shard7 Shard9
Replica2 Replica3 Replica5
Shards
Replicas
Collection
Replica4 Replica7 Replica1 Shard6
CHRONIX SPARK WONDERLAND
STORAGE FORMAT
TIME SERIES
‣ start: TimeStamp
‣ end: TimeStamp
‣ unit: String
‣ dimensions: Map<String, String>
‣ values: byte[]
TIME SERIES
‣ start: TimeStamp
‣ end: TimeStamp
‣ unit: String
‣ dimensions: Map<String, String>
‣ values: byte[]
TIME SERIES
‣ start: TimeStamp
‣ end: TimeStamp
‣ unit: String
‣ dimensions: Map<String, String>
‣ values: byte[]
▸ Chunking:

1 logical time series = n physical time
series all with the same identity
containing a fixed amount of
observations. 1 chunk = 1 solr document.
▸ Binary encoding of all

timestamp/value pairs. Delta-encoded
and bitwise compressed.
Logical
Physical
CHRONIX SPARK WONDERLAND
CHRONIX FORMAT: OPTIMAL CHUNK SIZE AND COMPRESSION CODEC
GZIP +
128
kBytes
Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger

Chronix: Efficient Storage and Query of Operational Time Series
International Conference on Software Maintenance and Evolution 2016 (submitted)
CHRONIX SPARK WONDERLAND
BENCHMARK: STORAGE DEMAND
Florian Lautenschlager,Michael Philippsen,Andreas Kumlehn,JosefAdersberger
Chronix:Efficient Storage and Query of Operational Time Series
International Conference on Software Maintenance and Evolution 2016 (submitted)
CHRONIX SPARK WONDERLAND
BENCHMARK: PERFORMANCE
Florian Lautenschlager,Michael Philippsen,Andreas Kumlehn,JosefAdersberger
Chronix:Efficient Storage and Query of Operational Time Series
International Conference on Software Maintenance and Evolution 2016 (submitted)
DISCLAIMER: BENCHMARK ONLY PERFORMED ON ONE NODE ONLY
CHRONIX SPARK WONDERLAND
}
}
ChronixSparkChronixServer
CHRONIX SPARK WONDERLAND
SolrDocument
Solr Shard
SolrDocument SolrDocument SolrDocument
Solr Shard
SolrDocument
TimeSeries TimeSeries TimeSeries TimeSeries TimeSeries
Partition Partition
ChronixRDD
Binary protocol
1 SolrDocument = 1 Chunk
1 Spark Partition = 1 Solr Shard
CHRONIX SPARK WONDERLAND
ChronixRDD CREATION: GET THE CHUNKS
public ChronixRDD queryChronixChunks(

final SolrQuery query,

final String zkHost,

final String collection,

final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage)
throws SolrServerException, IOException {



// first get a list of replicas to query for this collection

List<String> shards = chronixStorage.getShardList(zkHost, collection);



// parallelize the requests to the shards

JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap(

(FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode(

new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);

return new ChronixRDD(docs);

}
Figure out all
Solr shards
Query each shard in parallel and convert
SolrDocuments to MetricTimeSeries
CHRONIX SPARK WONDERLAND
ChronixRDD CREATION: JOIN THEM TOGETHER TO A LOGICAL TIME SERIES
public ChronixRDD joinChunks() {

JavaPairRDD<MetricTimeSeriesKey, Iterable<MetricTimeSeries>> groupRdd

= this.groupBy(MetricTimeSeriesKey::new);



JavaPairRDD<MetricTimeSeriesKey, MetricTimeSeries> joinedRdd

= groupRdd.mapValues((Function<Iterable<MetricTimeSeries>, MetricTimeSeries>) mtsIt -> {

MetricTimeSeriesOrdering ordering = new MetricTimeSeriesOrdering();

List<MetricTimeSeries> orderedChunks = ordering.immutableSortedCopy(mtsIt);

MetricTimeSeries result = null;

for (MetricTimeSeries mts : orderedChunks) {

if (result == null) {

result = new MetricTimeSeries

.Builder(mts.getMetric())

.attributes(mts.attributes()).build();

}

result.addAll(mts.getTimestampsAsArray(), mts.getValuesAsArray());

}

return result;

});



JavaRDD<MetricTimeSeries> resultJavaRdd =

joinedRdd.map((Tuple2<MetricTimeSeriesKey, MetricTimeSeries> mtTuple) -> mtTuple._2);



return new ChronixRDD(resultJavaRdd); }
group chunks
according
identity
join chunks to

logical time 

series
PERFORMANCE
PERFORMANCE
THE SECRET OF DISTRIBUTED PERFORMANCE
Rule 1: Be as close to the data as possible!

(CPU cache > memory > local disk > network)
Horizontal processing 

(distribution / parallelization)
Verticalprocessing

(divide&conquer)
Rule 2: Reduce data volume as early as possible! 

(as long as you don’t sacrifice parallelization)
Rule 3: Parallelize as much as possible! 

(max = #cores)
PERFORMANCE
THE RULES APPLIED
‣ Rule 1: Be as close to the data as possible!
1. Solr caching

2. Spark in-memory processing with activated RDD compression

3. Binary protocol between Solr and Spark

‣ Rule 2: Reduce data volume as early as possible!
‣ Efficient storage format (Chronix Format)

‣ Predicate pushdown to Solr (query)

‣ Group-by & aggregation pushdown to Solr (faceting within a query)

‣ Rule 3: Parallelize as much as possible!
‣ Scale-out on data-level with SolrCloud

‣ Scale-out on processing-level with Spark
codingvoding.tumblr.com
RULE 4: PREMATURE
OPTIMIZATION IS NOT EVIL 

IF YOU HANDLE BIG DATA
Josef Adersberger
PERFORMANCE
USING A JAVA PROFILER WITH A LOCAL CLUSTER
PERFORMANCE
HIGH-PERFORMANCE, LOW-OVERHEAD COLLECTIONS
PERFORMANCE
830 MB -> 360 MB

(- 57%)
unveiled wrong Jackson 

handling inside of SolrClient
PERFORMANCE
PROFILING ChronixRDD WITH PLAIN VANILLA SPARK
Watch out 

for branches!
Watch out 

for shuffling!
ROADMAP
ROADMAP
THINGS TO COME
see https://github.com/ChronixDB/chronix.spark/issues
v0.4

(06/16)
v0.5

(08/16)
v0.6

(10/16)
v1.0

(12/16)
More actions and
transformations
Bulk transfer Solr
request handler
Streaming access R wrapper
Reduce memory
overhead
Data locality (co-
location)
SparkML support
Custom Dataset
encoder
SolrRDD adapter
Incorporate alien
technology
Johannes
Josef
Lukas
Claudio
Johannes
Flaute
Cloud
THE CONTRIBUTORS
YOU!
TWITTER.COM/QAWARE - SLIDESHARE.NET/QAWARE
Thank you!
Questions?
josef.adersberger@qaware.de
@adersberger
https://github.com/ChronixDB/chronix.spark
BONUS SLIDES
THE COMPETITORS
THE COMPETITORS / ALTERNATIVES
THE COMPETITORS / ALTERNATIVES
▸ Small Time Series Data
▸ Matlab (Econometrics toolbox)
▸ Python (Pandas)
▸ R (zoo, xts)
▸ SAS (ETS)
▸ …
▸ Big Time Series Data
▸ influxDB
▸ Graphite
▸ OpenTSDB
▸ KairosDB
▸ Prometheus
▸ …
THE COMPETITORS / ALTERNATIVES
BIG DATA LANDSCAPE
https://github.com/qaware/big-data-landscape
THE COMPETITORS / ALTERNATIVES
CHRONIX RDD VS. SPARK-TS
▸ Spark-TS provides no specific time series storage it uses the Spark persistence
mechanisms instead. This leads to a less efficient storage usage and less possibilities
to perform performance optimizations via predicate pushdown.
▸ In contrast to Spark-TS Chronix does not align all time series values on one vector of
timestamps. This leads to greater flexibility in time series aggregation
▸ Chronix provides multi-dimensional time series as this is very useful for data
warehousing and APM.
▸ Chronix has support for Datasets as this will be an important Spark API in the near
future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML.
▸ Chronix is purely written in Java. There is no explicit support for Python and Scala yet.
▸ Chronix doesn not support a ZonedTime as this makes it way more complicated.
APACHE SPARK 101
CHRONIX SPARK WONDERLAND
ARCHITECTURE
APACHE SPARK
SPARK TERMINOLOGY (1/2)
▸ RDD: Has transformations and actions. Hides data partitioning &
distributed computation. References a set of partitions (“output
partitions”) - materialized or not - and has dependencies to
another RDD (“input partitions”). RDD operations are evaluated as
late as possible (when an action is called). As long as not being the
root RDD the partitions of an RDD are in memory but they can be
persisted by request.
▸ Partitions: (Logical) chunks of data. Default unit and level of
parallelism - inside of a partition everything is a sequential
operation on records. Has to fit into memory. Can have different
representations (in-memory, on disk, off heap, …)
APACHE SPARK
SPARK TERMINOLOGY (2/2)
▸ Job: A computation job which is launched when an action is called on a
RDD.
▸ Task: The atomic unit of work (function). Bound to exactly one partition.
▸ Stage: Set of Task pipelines which can be executed in parallel on one
executor.
▸ Shuffling: If partitions need to be transferred between executors. Shuffle
write = outbound partition transfer. Shuffle read = inbound partition
transfer.
▸ DAG Scheduler: Computes DAG of stages from RDD DAG. Determines
the preferred location for each task.

Weitere ähnliche Inhalte

Andere mochten auch

Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeInside Analysis
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...Spark Summit
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 

Andere mochten auch (7)

Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
New Directions in pySpark for Time Series Analysis: Spark Summit East talk by...
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 

Ähnlich wie Time Series Processing with Chronix Spark

Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisQAware GmbH
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...NETWAYS
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrFlorian Lautenschlager
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusQAware GmbH
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupKarthik Ramasamy
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Mathias Herberts
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...Amazon Web Services
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf Conference
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 

Ähnlich wie Time Series Processing with Chronix Spark (20)

Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache Solr
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the Block
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - Meetup
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 

Mehr von QAware GmbH

Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...QAware GmbH
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzQAware GmbH
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureQAware GmbH
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!QAware GmbH
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringQAware GmbH
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightQAware GmbH
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAsQAware GmbH
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo QAware GmbH
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...QAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.QAware GmbH
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPQAware GmbH
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.QAware GmbH
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysQAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 
How to speed up Spring Integration Tests
How to speed up Spring Integration TestsHow to speed up Spring Integration Tests
How to speed up Spring Integration TestsQAware GmbH
 

Mehr von QAware GmbH (20)

Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile Architecture
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform Engineering
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAs
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API Gateways
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 
How to speed up Spring Integration Tests
How to speed up Spring Integration TestsHow to speed up Spring Integration Tests
How to speed up Spring Integration Tests
 

Kürzlich hochgeladen

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 

Kürzlich hochgeladen (16)

2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 

Time Series Processing with Chronix Spark

  • 1. CHRONIX SPARK TIME SERIES PROCESSING WITH SPARK Dr. Josef Adersberger ( @adersberger)
  • 3. TIME SERIES 101 WE`RE SURROUNDED BY TIME SERIES ▸ Operational data: Monitoring data, performance metrics, log events, … ▸ Data Warehouse: Dimension time ▸ Measured Me: Activity tracking, ECG, … ▸ Sensor telemetry: Sensor data, … ▸ Financial data: Stock charts, … ▸ Climate data: Temperature, … ▸ Web tracking: Clickstreams, …
  • 4. TIME SERIES 101 TIME SERIES: BASIC TERMS univariate time series multivariate time series multi-dimensional time series (time series tensor) time series setobservation
  • 5. TIME SERIES 101 OPERATIONS ON TIME SERIES (EXAMPLES) align Time series Time series Time series Scalar diff downsampling outlier min/max avg/med slope std-dev
  • 7. Monitoring Data Analysis 
 of a business-critical,
 worldwide distributed 
 software system. Enable
 root cause analysis and
 anomaly detection.
 > 1,000 nodes worldwide > 10 processes per node > 20 metrics per process
 (OS, JVM, App-spec.) Measured every second. = about 6.3 trillions observations p.a.
 Data retention: 5 yrs.
  • 8.
  • 10.
  • 11. THE CHRONIX STACK THE CHRONIX STACK Core Chronix Storage Chronix Server Chronix SparkChronixFormat GrafanaChronix Analytics Collection Visualization Chronix CollectorLogstash fluentd jmx collectd ssh Zeppelin
  • 12. THE CHRONIX STACK node Distributed Data &
 Data Retrieval Distributed Processing Result Processing data flow icon credits to Nimal Raj (database), Arthur Shlain (console) and alvarobueno (takslist) } } ChronixSparkChronixServer
  • 13. USE CASE CHRONIX ANALYTICS: EXPLORING MULTI-DIMENSIONAL TIME SERIES
  • 14. USE CASE CHRONIX ANALYTICS: ANOMALY DETECTION Featuring Twitter Anomaly Detection (https://github.com/twitter/AnomalyDetection
 and Yahoo EGDAS https://github.com/yahoo/egads
  • 17. EASY-TO-USE BIG TIME SERIES DATA STORAGE & PROCESSING ON SPARK MISSION
  • 18. MISSION (as well as for data scientists)
  • 19. CHRONIX SPARK TIME SERIES MODEL Set of univariate multi-dimensional numeric time series ▸ set … because it’s more flexible and better to parallelise if operations can input and output multiple time series. ▸ univariate … because multivariate will introduce too much complexity (and we have our set to bundle multiple time series). ▸ multi-dimensional … because the ability to slice & dice in the set of time series is very convenient for a lot of use cases. ▸ numeric … because it’s the most common use case. A single time series is identified by a combination of its non-temporal dimensional values (e.g. unit “mem usage” + host “aws42” + process “tomcat”)
  • 20. CHRONIX SPARK CHRONIX SPARK 
 ChronixRDD ChronixSparkContext ‣ Represents a set of time series ‣ Distributed operations on sets of time series ‣ Creates ChronixRDDs ‣ Speaks with the Chronix Server (Solr)
  • 21. CHRONIX SPARK ChronixRDD transform to a Dataset extends transform to a DataFrame (SQL!) the set characteristic: 
 a JavaRDD of MetricTimeSeries
  • 22. CHRONIX SPARK SPARK APIS FOR DATA PROCESSING RDD DataFrame Dataset typed yes no yes optimized medium highly highly mature yes yes no SQL no yes no
  • 23. CHRONIX SPARK THE MetricTimeSeries DATA TYPE access all timestamps access all observations as stream the multi-dimensionality:
 get/set dimensions
 (attributes) access all numeric values
 (univariate)
  • 24. CHRONIX SPARK THE OVERALL DATA MODEL ChronixRDD MetricTimeSeries MetricObservation Dataset<MetricObservation> Dataset<MetricTimeSeries> DataFrame toDataFrame() toDataset() toObservationsDataset()
  • 25. CHRONIX SPARK ChronixSparkContext RDD on all time series matched by a SolrQuery: /**
 * @param query Solr query
 * @param zkHost Zookeeper host
 * @param collection the Solr collection of chronix time series data
 * @param chronixStorage a ChronixSolrCloudStorage instance
 * @return ChronixRDD of time series
 */
 public ChronixRDD query(
 final SolrQuery query,
 final String zkHost,
 final String collection,
 final ChronixSolrCloudStorage chronixStorage) {
  • 26. CHRONIX SPARK SAMPLE CODE //Create Chronix Spark context from a SparkContext / JavaSparkContext
 ChronixSparkContext csc = new ChronixSparkContext(sc);
 
 //Read data into ChronixRDD
 SolrQuery query = new SolrQuery(
 "metric:"java.lang:type=Memory/HeapMemoryUsage/used"");
 
 ChronixRDD rdd = csc.query(query,
 "localhost:9983", //ZooKeeper host
 "chronix", //Solr collection for Chronix
 new ChronixSolrCloudStorage());
 
 //Calculate the overall min/max/mean of all time series in the RDD
 double min = rdd.min();
 double max = rdd.max();
 double mean = rdd.mean();
  • 27. DEMO TIME ‣ 8,707 time series with 76,983,735 observations ‣ one MacBook with 4 cores https://github.com/ChronixDB/chronix.spark/tree/master/chronix-infrastructure-local
  • 28. A TRIP TO
 
 CHRONIX SPARK
 
 WONDERLAND
  • 29. CHRONIX SPARK WONDERLAND ‣ Data sharding ‣ Fast index-based queries and aggregations ‣ Efficient storage format ‣ Heavy lifting distributed processing ‣ Catalyst processing optimizer ‣ Post-processing on a smaller set of time series (e.g. complex analysis algorithms)
  • 31. … with a few custom extensions. ▸ Index machine. ▸ Powerful query language based on Lucene. Powerful aggregation features (facets). E.g. groups way better than Spark.
  • 32. CHRONIX SPARK WONDERLAND ARCHITECTURE Shard2 Solr Server Zookeeper Solr ServerSolr Server Shard1 Zookeeper Zookeeper Zookeeper Cluster Solr Cloud Leader Scale Out Shard3 Replica8 Replica9 Shard5Shard4 Shard6 Shard8Shard7 Shard9 Replica2 Replica3 Replica5 Shards Replicas Collection Replica4 Replica7 Replica1 Shard6
  • 33. CHRONIX SPARK WONDERLAND STORAGE FORMAT TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ unit: String ‣ dimensions: Map<String, String> ‣ values: byte[] TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ unit: String ‣ dimensions: Map<String, String> ‣ values: byte[] TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ unit: String ‣ dimensions: Map<String, String> ‣ values: byte[] ▸ Chunking:
 1 logical time series = n physical time series all with the same identity containing a fixed amount of observations. 1 chunk = 1 solr document. ▸ Binary encoding of all
 timestamp/value pairs. Delta-encoded and bitwise compressed. Logical Physical
  • 34. CHRONIX SPARK WONDERLAND CHRONIX FORMAT: OPTIMAL CHUNK SIZE AND COMPRESSION CODEC GZIP + 128 kBytes Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger
 Chronix: Efficient Storage and Query of Operational Time Series International Conference on Software Maintenance and Evolution 2016 (submitted)
  • 35. CHRONIX SPARK WONDERLAND BENCHMARK: STORAGE DEMAND Florian Lautenschlager,Michael Philippsen,Andreas Kumlehn,JosefAdersberger Chronix:Efficient Storage and Query of Operational Time Series International Conference on Software Maintenance and Evolution 2016 (submitted)
  • 36. CHRONIX SPARK WONDERLAND BENCHMARK: PERFORMANCE Florian Lautenschlager,Michael Philippsen,Andreas Kumlehn,JosefAdersberger Chronix:Efficient Storage and Query of Operational Time Series International Conference on Software Maintenance and Evolution 2016 (submitted) DISCLAIMER: BENCHMARK ONLY PERFORMED ON ONE NODE ONLY
  • 38. CHRONIX SPARK WONDERLAND SolrDocument Solr Shard SolrDocument SolrDocument SolrDocument Solr Shard SolrDocument TimeSeries TimeSeries TimeSeries TimeSeries TimeSeries Partition Partition ChronixRDD Binary protocol 1 SolrDocument = 1 Chunk 1 Spark Partition = 1 Solr Shard
  • 39. CHRONIX SPARK WONDERLAND ChronixRDD CREATION: GET THE CHUNKS public ChronixRDD queryChronixChunks(
 final SolrQuery query,
 final String zkHost,
 final String collection,
 final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage) throws SolrServerException, IOException {
 
 // first get a list of replicas to query for this collection
 List<String> shards = chronixStorage.getShardList(zkHost, collection);
 
 // parallelize the requests to the shards
 JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap(
 (FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode(
 new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);
 return new ChronixRDD(docs);
 } Figure out all Solr shards Query each shard in parallel and convert SolrDocuments to MetricTimeSeries
  • 40. CHRONIX SPARK WONDERLAND ChronixRDD CREATION: JOIN THEM TOGETHER TO A LOGICAL TIME SERIES public ChronixRDD joinChunks() {
 JavaPairRDD<MetricTimeSeriesKey, Iterable<MetricTimeSeries>> groupRdd
 = this.groupBy(MetricTimeSeriesKey::new);
 
 JavaPairRDD<MetricTimeSeriesKey, MetricTimeSeries> joinedRdd
 = groupRdd.mapValues((Function<Iterable<MetricTimeSeries>, MetricTimeSeries>) mtsIt -> {
 MetricTimeSeriesOrdering ordering = new MetricTimeSeriesOrdering();
 List<MetricTimeSeries> orderedChunks = ordering.immutableSortedCopy(mtsIt);
 MetricTimeSeries result = null;
 for (MetricTimeSeries mts : orderedChunks) {
 if (result == null) {
 result = new MetricTimeSeries
 .Builder(mts.getMetric())
 .attributes(mts.attributes()).build();
 }
 result.addAll(mts.getTimestampsAsArray(), mts.getValuesAsArray());
 }
 return result;
 });
 
 JavaRDD<MetricTimeSeries> resultJavaRdd =
 joinedRdd.map((Tuple2<MetricTimeSeriesKey, MetricTimeSeries> mtTuple) -> mtTuple._2);
 
 return new ChronixRDD(resultJavaRdd); } group chunks according identity join chunks to
 logical time 
 series
  • 42. PERFORMANCE THE SECRET OF DISTRIBUTED PERFORMANCE Rule 1: Be as close to the data as possible!
 (CPU cache > memory > local disk > network) Horizontal processing 
 (distribution / parallelization) Verticalprocessing
 (divide&conquer) Rule 2: Reduce data volume as early as possible! 
 (as long as you don’t sacrifice parallelization) Rule 3: Parallelize as much as possible! 
 (max = #cores)
  • 43. PERFORMANCE THE RULES APPLIED ‣ Rule 1: Be as close to the data as possible! 1. Solr caching 2. Spark in-memory processing with activated RDD compression 3. Binary protocol between Solr and Spark
 ‣ Rule 2: Reduce data volume as early as possible! ‣ Efficient storage format (Chronix Format) ‣ Predicate pushdown to Solr (query) ‣ Group-by & aggregation pushdown to Solr (faceting within a query)
 ‣ Rule 3: Parallelize as much as possible! ‣ Scale-out on data-level with SolrCloud ‣ Scale-out on processing-level with Spark
  • 45. RULE 4: PREMATURE OPTIMIZATION IS NOT EVIL 
 IF YOU HANDLE BIG DATA Josef Adersberger
  • 46. PERFORMANCE USING A JAVA PROFILER WITH A LOCAL CLUSTER
  • 48. PERFORMANCE 830 MB -> 360 MB
 (- 57%) unveiled wrong Jackson 
 handling inside of SolrClient
  • 49. PERFORMANCE PROFILING ChronixRDD WITH PLAIN VANILLA SPARK Watch out 
 for branches! Watch out 
 for shuffling!
  • 51. ROADMAP THINGS TO COME see https://github.com/ChronixDB/chronix.spark/issues v0.4
 (06/16) v0.5
 (08/16) v0.6
 (10/16) v1.0
 (12/16) More actions and transformations Bulk transfer Solr request handler Streaming access R wrapper Reduce memory overhead Data locality (co- location) SparkML support Custom Dataset encoder SolrRDD adapter Incorporate alien technology
  • 53. TWITTER.COM/QAWARE - SLIDESHARE.NET/QAWARE Thank you! Questions? josef.adersberger@qaware.de @adersberger https://github.com/ChronixDB/chronix.spark
  • 56. THE COMPETITORS / ALTERNATIVES THE COMPETITORS / ALTERNATIVES ▸ Small Time Series Data ▸ Matlab (Econometrics toolbox) ▸ Python (Pandas) ▸ R (zoo, xts) ▸ SAS (ETS) ▸ … ▸ Big Time Series Data ▸ influxDB ▸ Graphite ▸ OpenTSDB ▸ KairosDB ▸ Prometheus ▸ …
  • 57. THE COMPETITORS / ALTERNATIVES BIG DATA LANDSCAPE https://github.com/qaware/big-data-landscape
  • 58. THE COMPETITORS / ALTERNATIVES CHRONIX RDD VS. SPARK-TS ▸ Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown. ▸ In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation ▸ Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM. ▸ Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML. ▸ Chronix is purely written in Java. There is no explicit support for Python and Scala yet. ▸ Chronix doesn not support a ZonedTime as this makes it way more complicated.
  • 61. APACHE SPARK SPARK TERMINOLOGY (1/2) ▸ RDD: Has transformations and actions. Hides data partitioning & distributed computation. References a set of partitions (“output partitions”) - materialized or not - and has dependencies to another RDD (“input partitions”). RDD operations are evaluated as late as possible (when an action is called). As long as not being the root RDD the partitions of an RDD are in memory but they can be persisted by request. ▸ Partitions: (Logical) chunks of data. Default unit and level of parallelism - inside of a partition everything is a sequential operation on records. Has to fit into memory. Can have different representations (in-memory, on disk, off heap, …)
  • 62. APACHE SPARK SPARK TERMINOLOGY (2/2) ▸ Job: A computation job which is launched when an action is called on a RDD. ▸ Task: The atomic unit of work (function). Bound to exactly one partition. ▸ Stage: Set of Task pipelines which can be executed in parallel on one executor. ▸ Shuffling: If partitions need to be transferred between executors. Shuffle write = outbound partition transfer. Shuffle read = inbound partition transfer. ▸ DAG Scheduler: Computes DAG of stages from RDD DAG. Determines the preferred location for each task.