SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Stratio Meta 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20141"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-20142"
Who are we? 
STRATIO 
• Stra3o-is-a-Big-Data-Company 
• Founded-in-2013 
• Commercially-launched-in-2014 
• 50+-employees-in-Madrid 
• Office-in-San-Francisco 
• Cer3fied-Spark-distribu3on 
#CassandraSummit-2014 
3"
We love… 
Cassandra 
• P2P-architecture 
• Read/write-performance 
• Fault-tolerance 
• Easy-to-deploy 
• CQL 
#CassandraSummit-2014 
4"
• Introduction 
• Crossdata architecture 
• Metadata management 
• Streaming sources 
• Full text search 
• Spark and Crossdata 
• ODBC 
• The future 
Agenda 
5"
Introduction 
o Big-Data-analysis-is-commonly-associated-with-batch-processing 
• Users-aiming-to-combine-batch-and-stream-processing-have-to- 
rely-on-tailorRmade-architectures 
o Users-buy-Big-Data-plaSorms,-but 
• How-do-I-start? 
• What-is-my-entry-point-to-the-plaSorm? 
#CassandraSummit-2014 
6"
What our clients demand? 
o Easy-deployment 
o Easy-administra3on 
o Read/write-performance 
o EasyRtoRlearn-query-language-o 
Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
7"
What our clients demand? 
! Easy%deployment% 
! Easy%administra0on% 
! Read/write%performance% 
! Easy6to6learn%query%language% 
o Integra3on-with-BI-Tools 
o Join-opera3ons 
o Support-for-streaming-sources 
o Integra3on-with-other-data-stores 
o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
8"
What our clients demand? 
! Easy"deployment" 
! Easy"administra8on" 
! Read/write"performance" 
! Easy>to>learn"query"language" 
! Integra3on-with-BI-Tools 
! Join-opera3ons 
! Support-for-streaming-sources 
! Integra3on-with-other-data-stores 
! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) 
#CassandraSummit-2014 
9"
Crossdata 
o A-new-technology-that: 
• Is-not-limited-by-the-underlying-datastore-capabili3es 
• Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons 
• Supports-batch-and-streaming-queries 
• Supports-mul3ple-clusters-and-technologies 
#CassandraSummit-2014 
10"
Our architecture 
#CassandraSummit-2014 
11"
Connecting to the outside world 
o Crossdata-defines-an-IConnector-extension-interface 
o User-can-easily-add-new-connectors-to-support 
• Different-datastores 
• Different-processing-engines 
• Different-versions 
o Where-each-connector-defines-its-capabili3es 
#CassandraSummit-2014 
12" 
Our planner will choose the best connector for each query
Query execution 
#CassandraSummit-2014 
13" 
Parsing" Valida8on" Planning" Execu8on" 
C*" 
Connector1" 
Connector2" 
Connector3" 
Our planner will choose the best connector for each query
Multi-cluster support 
o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- 
across-a-set-of-datastores.- 
• Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance 
" E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- 
readRop3mized-cluster,-etc.- 
• A-table-is-saved-in-a-unique-datastore 
#CassandraSummit-2014 
14"
Logical and physical mapping 
SELECT&*&FROM&app.users;& 
Users"table" Test"table" old_users"table" 
#CassandraSummit-2014 
15" 
App"catalog" 
C*"produc8on" C*"development" Other"datastores"
Metadata 
Management 
16"
Metadata in the era of Schemaless NoSQL datastores 
o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- 
• Flexible-schemas-vs-Schemaless 
• Crossdata-provides-a-Metadata-manager-that-stores-schemas- 
for-any-datasource 
" Remember-ODBC-and-those-BI-tools 
" 
1010010101010 
1010110101010 
1111010001111 
?" 001000" 
#CassandraSummit-2014 
17"
Metadata management 
#CassandraSummit-2014 
18" 
Connector" 
C*"produc8on" 
Metadata"Store" 
Infinispan" 
Metadata"Manager" 
2% 
Updated"metadata" 
informa8on"is" 
maintained"among" 
Crossdata"servers" 
using"Infinispan" 
If"the"connector"does" 
not"support"metadata" 
opera8ons"those"are" 
skipped" 1% 2%
Streaming sources 
19"
Managing streaming sources 
o Nowadays-use-cases-expect-some-type-of-streaming-datasource 
• Streaming-data-has-an-ephemeral-nature 
• In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- 
#CassandraSummit-2014 
to-work-with-streaming-sources-as-classical- 
RDBMS-tables 
20" 
streaming" 
source" 
{schema:{col1:…},…}" 
col1:text" col2:int" col3:int" col4:text" 
Streaming_query0" 
…" 
Streaming_queryn"
Streaming queries 
o Streaming-queries-are-infinite-by-defini3on 
• A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- 
ingested-by-the-system-in-that-period 
• The-user-launches-queries-specifying-a-processing-3me-window 
" Crossdata-provides-methods-to-list-and-stop-running-streaming- 
#CassandraSummit-2014 
queries 
21"
Streaming queries: windows syntax 
#CassandraSummit-2014 
22" 
SELECT fieldGroup,avg(Field2) 
FROM eph_table 
WITH WINDOW 5 minutes 
WHERE field1=100 AND field2>100 
GROUP BY fieldGroup;
Joining batch and streaming 
SELECT * FROM demo.temporal 
WITH WINDOW 10 secs 
INNER JOIN demo.users 
#CassandraSummit-2014 
ON users.name = temporal.name; 
SELECT * FROM 
demo.temporal 
WITH WINDOW 10 secs 
" 
SELECT * 
FROM demo.users 
" 
INNER JOIN ON 
users.name = 
temporal.name 
" 
23"
Full text search 
24"
Full text search with 
o Clients-request-the-ability-to-perform-full-text-searches 
o We-have-developed-an-integra3on-between-Lucene-and- 
Cassandra 
o C*-users-can-now-enjoy-all-Lucene-features: 
• Full-text-searches,-range-queries,-fuzzy-queries…. 
#CassandraSummit-2014 
25" 
https://github.com/Stratio/stratio-cassandra
Stratio Lucene 2i 
#CassandraSummit-2014 
26" 
C*" 
node" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
C*" 
node" 
Lucene" 
index" 
Lucene" 
index"
Full text search queries 
o With-Crossdata,-we-simplify: 
• The-crea3on-syntax- 
• The-query-syntax-using-the-match-operator 
#CassandraSummit-2014 
27" 
CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& 
SELECT&*&FROM&app.users&& 
where&email&MATCH&‘*@stratio.com’;&
& Stratio Crossdata 
28"
Why Spark? 
o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons 
o Spark-brings-several-benefits-over-Hadoop-o 
InRMemory-processing 
o RDD-abstrac3on 
o Simpler-API-o 
Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) 
#CassandraSummit-2014 
29"
What about Spark SQL? 
o Different-approach-to-query-execu3on 
• We-only-use-Spark-when-it-speedups-queries 
" Na3ve-drivers-are-faster-for-simple-queries 
" Spark-SQL-has-limited-RDD-sources 
• Avoid-some-Spark-limita3ons 
• Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 
#CassandraSummit-2014 
30"
Query approach 
SparkSQL"approach" Crossdata"approach" 
#CassandraSummit-2014 
SparkSQL" 
Spark" 
Cassandra" 
Spark" Na8ve"driver" 
Cassandra" 
31" 
Stra8o"Crossdata"
Our Cassandra-Spark integration 
o Project-started-in-June-2013 
" With-the-objec3ve-of-providing-a-method-to-interact-with- 
Cassandra-from-Spark 
" Ini3al-approach-based-on-the-HadoopInputFormat-interface 
" Current-version-uses-the-na3ve-Datastax-Java-driver 
#CassandraSummit-2014 
32" 
https://github.com/Stratio/stratio-deep
Our Cassandra-Spark integration 
o Benchmark-in-process-comparing-our-solu3on-with-the- 
Datastax-Spark-driver 
• Results-highly-influenced-by-the-split-size 
• Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- 
Datastax-default-values 
• Group-by-–-up-to-40%-faster 
• Join-–-up-to-17%-faster 
• Stay-tuned-for-the-benchmark-publica3on! 
#CassandraSummit-2014 
33"
Spark vs Lucene 2i 
#CassandraSummit-2014 
34" 
Time" 
Spark" 
Lucen"2i" 
Records"returned"
ODBC 
35"
Stratio Crossdata ODBC 
o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) 
o We-have-implemented-it-using-Simba-SDK 
o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- 
world 
o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel 
#CassandraSummit-2014 
36" 
One ODBC for all datastores!
The future 
37"
The future 
o Security 
o Query-op3mizer-and-smart-query-planner 
o Leverage-system-sta3s3cs 
o Support-for-UDFs 
o Become-an-Apache-project 
#CassandraSummit-2014 
38" 
https://github.com/Stratio/stratio-meta
We are looking for an Apache Champion 
#CassandraSummit-2014 
39" 
Can"you" 
help"us?"
A wish list for Cassandra 
o Ability-to-stop-running-queries 
o Interac3ve-users-are-unpredictable 
o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) 
o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator 
• E.g.,-aggrega3ons-like-count(*) 
#CassandraSummit-2014 
40"
Stratio Crossdata 
An efficient distributed datahub with batch and 
streaming query capabilities 
Daniel Higuero 
Alvaro Agea 
dhiguero@stratio.com 
alvaro@stratio.com 
#CassandraSummit-201441"

Weitere ähnliche Inhalte

Was ist angesagt?

Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Cedric CARBONE
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Brian O'Neill
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Databricks
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkLegacy Typesafe (now Lightbend)
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataSpark Summit
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Communityjeykottalam
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...Modern Data Stack France
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Databricks
 

Was ist angesagt? (20)

Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
The BDAS Open Source Community
The BDAS Open Source CommunityThe BDAS Open Source Community
The BDAS Open Source Community
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 

Andere mochten auch

Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model TreesStratio
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1Stratio
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] SparktaStratio
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosStratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + KerberosStratio
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Stratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
UNION BANCARIA EN LA UNION EUROPEA
UNION BANCARIA EN LA UNION EUROPEAUNION BANCARIA EN LA UNION EUROPEA
UNION BANCARIA EN LA UNION EUROPEARamiro Ojeda
 
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
El modelo europeo de reporting y el lenguaje XBRL - Ignacio BoixoEl modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
El modelo europeo de reporting y el lenguaje XBRL - Ignacio BoixoAsociación XBRL España
 
La Unión Bancaria Europea
La Unión Bancaria EuropeaLa Unión Bancaria Europea
La Unión Bancaria Europeakoball
 
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...Asociación XBRL España
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaStratio
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...Asociación XBRL España
 

Andere mochten auch (20)

Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
UNION BANCARIA EN LA UNION EUROPEA
UNION BANCARIA EN LA UNION EUROPEAUNION BANCARIA EN LA UNION EUROPEA
UNION BANCARIA EN LA UNION EUROPEA
 
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
El modelo europeo de reporting y el lenguaje XBRL - Ignacio BoixoEl modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
 
La Unión Bancaria Europea
La Unión Bancaria EuropeaLa Unión Bancaria Europea
La Unión Bancaria Europea
 
Presentacion
PresentacionPresentacion
Presentacion
 
Recuperación y Unión Bancaria Europea. Emilio Ontiveros
Recuperación y Unión Bancaria Europea. Emilio OntiverosRecuperación y Unión Bancaria Europea. Emilio Ontiveros
Recuperación y Unión Bancaria Europea. Emilio Ontiveros
 
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, KibanaOn-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
La translación del marco regulatorio Solvencia II al estándar XBRL - Aitor Az...
 

Ähnlich wie Stratio CrossData: an efficient distributed datahub with batch and streaming query capabilities

Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationSean Chittenden
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorBlueData, Inc.
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaGuido Schmutz
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Spark Summit
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZconfluent
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014dhiguero
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Andrés de la Peña
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 

Ähnlich wie Stratio CrossData: an efficient distributed datahub with batch and streaming query capabilities (20)

Presentation
PresentationPresentation
Presentation
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Solution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline AcceleratorSolution Brief: Real-Time Pipeline Accelerator
Solution Brief: Real-Time Pipeline Accelerator
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
Advanced search and Top-k queries in Cassandra - Cassandra Summit Europe 2014
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 

Mehr von Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupStratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science MeetupStratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupStratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challengeStratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big DataStratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformStratio
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Stratio
 

Mehr von Stratio (13)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 

Kürzlich hochgeladen

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 

Kürzlich hochgeladen (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 

Stratio CrossData: an efficient distributed datahub with batch and streaming query capabilities

  • 1. Stratio Meta An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-20141"
  • 2. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-20142"
  • 3. Who are we? STRATIO • Stra3o-is-a-Big-Data-Company • Founded-in-2013 • Commercially-launched-in-2014 • 50+-employees-in-Madrid • Office-in-San-Francisco • Cer3fied-Spark-distribu3on #CassandraSummit-2014 3"
  • 4. We love… Cassandra • P2P-architecture • Read/write-performance • Fault-tolerance • Easy-to-deploy • CQL #CassandraSummit-2014 4"
  • 5. • Introduction • Crossdata architecture • Metadata management • Streaming sources • Full text search • Spark and Crossdata • ODBC • The future Agenda 5"
  • 6. Introduction o Big-Data-analysis-is-commonly-associated-with-batch-processing • Users-aiming-to-combine-batch-and-stream-processing-have-to- rely-on-tailorRmade-architectures o Users-buy-Big-Data-plaSorms,-but • How-do-I-start? • What-is-my-entry-point-to-the-plaSorm? #CassandraSummit-2014 6"
  • 7. What our clients demand? o Easy-deployment o Easy-administra3on o Read/write-performance o EasyRtoRlearn-query-language-o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 7"
  • 8. What our clients demand? ! Easy%deployment% ! Easy%administra0on% ! Read/write%performance% ! Easy6to6learn%query%language% o Integra3on-with-BI-Tools o Join-opera3ons o Support-for-streaming-sources o Integra3on-with-other-data-stores o Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 8"
  • 9. What our clients demand? ! Easy"deployment" ! Easy"administra8on" ! Read/write"performance" ! Easy>to>learn"query"language" ! Integra3on-with-BI-Tools ! Join-opera3ons ! Support-for-streaming-sources ! Integra3on-with-other-data-stores ! Ability-to-query-data-without-thinking-about-the-schema-(nonRindexed-data) #CassandraSummit-2014 9"
  • 10. Crossdata o A-new-technology-that: • Is-not-limited-by-the-underlying-datastore-capabili3es • Leverages-Spark-to-perform-nonRna3vely-supported-opera3ons • Supports-batch-and-streaming-queries • Supports-mul3ple-clusters-and-technologies #CassandraSummit-2014 10"
  • 12. Connecting to the outside world o Crossdata-defines-an-IConnector-extension-interface o User-can-easily-add-new-connectors-to-support • Different-datastores • Different-processing-engines • Different-versions o Where-each-connector-defines-its-capabili3es #CassandraSummit-2014 12" Our planner will choose the best connector for each query
  • 13. Query execution #CassandraSummit-2014 13" Parsing" Valida8on" Planning" Execu8on" C*" Connector1" Connector2" Connector3" Our planner will choose the best connector for each query
  • 14. Multi-cluster support o Stra3o-Crossdata-offers-the-possibility-of-accessing-a-single-catalog- across-a-set-of-datastores.- • Mul3ple-clusters-can-coexist-to-op3mize-plaSorm-performance " E.g.,-produc3on-cluster,-test-cluster,-writeRop3mized-cluster,- readRop3mized-cluster,-etc.- • A-table-is-saved-in-a-unique-datastore #CassandraSummit-2014 14"
  • 15. Logical and physical mapping SELECT&*&FROM&app.users;& Users"table" Test"table" old_users"table" #CassandraSummit-2014 15" App"catalog" C*"produc8on" C*"development" Other"datastores"
  • 17. Metadata in the era of Schemaless NoSQL datastores o Some-datastores-are-schemaless-but-our-applica3ons-are-not!- • Flexible-schemas-vs-Schemaless • Crossdata-provides-a-Metadata-manager-that-stores-schemas- for-any-datasource " Remember-ODBC-and-those-BI-tools " 1010010101010 1010110101010 1111010001111 ?" 001000" #CassandraSummit-2014 17"
  • 18. Metadata management #CassandraSummit-2014 18" Connector" C*"produc8on" Metadata"Store" Infinispan" Metadata"Manager" 2% Updated"metadata" informa8on"is" maintained"among" Crossdata"servers" using"Infinispan" If"the"connector"does" not"support"metadata" opera8ons"those"are" skipped" 1% 2%
  • 20. Managing streaming sources o Nowadays-use-cases-expect-some-type-of-streaming-datasource • Streaming-data-has-an-ephemeral-nature • In-Stra3o-Crossdata-we-defined-the-ephemeral-table-abstrac3on- #CassandraSummit-2014 to-work-with-streaming-sources-as-classical- RDBMS-tables 20" streaming" source" {schema:{col1:…},…}" col1:text" col2:int" col3:int" col4:text" Streaming_query0" …" Streaming_queryn"
  • 21. Streaming queries o Streaming-queries-are-infinite-by-defini3on • A-3me-window-is-defined-to-create-a-batch-like-view-of-the-rows- ingested-by-the-system-in-that-period • The-user-launches-queries-specifying-a-processing-3me-window " Crossdata-provides-methods-to-list-and-stop-running-streaming- #CassandraSummit-2014 queries 21"
  • 22. Streaming queries: windows syntax #CassandraSummit-2014 22" SELECT fieldGroup,avg(Field2) FROM eph_table WITH WINDOW 5 minutes WHERE field1=100 AND field2>100 GROUP BY fieldGroup;
  • 23. Joining batch and streaming SELECT * FROM demo.temporal WITH WINDOW 10 secs INNER JOIN demo.users #CassandraSummit-2014 ON users.name = temporal.name; SELECT * FROM demo.temporal WITH WINDOW 10 secs " SELECT * FROM demo.users " INNER JOIN ON users.name = temporal.name " 23"
  • 25. Full text search with o Clients-request-the-ability-to-perform-full-text-searches o We-have-developed-an-integra3on-between-Lucene-and- Cassandra o C*-users-can-now-enjoy-all-Lucene-features: • Full-text-searches,-range-queries,-fuzzy-queries…. #CassandraSummit-2014 25" https://github.com/Stratio/stratio-cassandra
  • 26. Stratio Lucene 2i #CassandraSummit-2014 26" C*" node" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" C*" node" Lucene" index" Lucene" index"
  • 27. Full text search queries o With-Crossdata,-we-simplify: • The-crea3on-syntax- • The-query-syntax-using-the-match-operator #CassandraSummit-2014 27" CREATE&FULLTEXT&INDEX&ON&app.users(name,email);& SELECT&*&FROM&app.users&& where&email&MATCH&‘*@stratio.com’;&
  • 29. Why Spark? o Stra3o-Crossdata-uses-Spark-to-perform-nonRna3vely-supported-opera3ons o Spark-brings-several-benefits-over-Hadoop-o InRMemory-processing o RDD-abstrac3on o Simpler-API-o Increased-flexibility-(e.g.,-not-need-for-iden3ty-mapping) #CassandraSummit-2014 29"
  • 30. What about Spark SQL? o Different-approach-to-query-execu3on • We-only-use-Spark-when-it-speedups-queries " Na3ve-drivers-are-faster-for-simple-queries " Spark-SQL-has-limited-RDD-sources • Avoid-some-Spark-limita3ons • Several-batch-and-streaming-contexts-in-a-single-JVM-SPARKR2243 #CassandraSummit-2014 30"
  • 31. Query approach SparkSQL"approach" Crossdata"approach" #CassandraSummit-2014 SparkSQL" Spark" Cassandra" Spark" Na8ve"driver" Cassandra" 31" Stra8o"Crossdata"
  • 32. Our Cassandra-Spark integration o Project-started-in-June-2013 " With-the-objec3ve-of-providing-a-method-to-interact-with- Cassandra-from-Spark " Ini3al-approach-based-on-the-HadoopInputFormat-interface " Current-version-uses-the-na3ve-Datastax-Java-driver #CassandraSummit-2014 32" https://github.com/Stratio/stratio-deep
  • 33. Our Cassandra-Spark integration o Benchmark-in-process-comparing-our-solu3on-with-the- Datastax-Spark-driver • Results-highly-influenced-by-the-split-size • Ini3al-results-are-promising-for-Stra3o-Spark-Integra3on-using- Datastax-default-values • Group-by-–-up-to-40%-faster • Join-–-up-to-17%-faster • Stay-tuned-for-the-benchmark-publica3on! #CassandraSummit-2014 33"
  • 34. Spark vs Lucene 2i #CassandraSummit-2014 34" Time" Spark" Lucen"2i" Records"returned"
  • 36. Stratio Crossdata ODBC o WellRknown-interface-standard-(for-BI-tools,-external-apps,-…) o We-have-implemented-it-using-Simba-SDK o ODBC-opens-the-full-poten3al-of-Stra3o-Crossdata-to-the-external- world o Currently-tested-with-Tableau,-Qlikview-and-MS-Excel #CassandraSummit-2014 36" One ODBC for all datastores!
  • 38. The future o Security o Query-op3mizer-and-smart-query-planner o Leverage-system-sta3s3cs o Support-for-UDFs o Become-an-Apache-project #CassandraSummit-2014 38" https://github.com/Stratio/stratio-meta
  • 39. We are looking for an Apache Champion #CassandraSummit-2014 39" Can"you" help"us?"
  • 40. A wish list for Cassandra o Ability-to-stop-running-queries o Interac3ve-users-are-unpredictable o Some-excep3on-paths-are-not-clear-or-defined-(e.g.,-secondary-indexes) o Distribute-some-of-the-opera3ons-currently-performed-on-the-coordinator • E.g.,-aggrega3ons-like-count(*) #CassandraSummit-2014 40"
  • 41. Stratio Crossdata An efficient distributed datahub with batch and streaming query capabilities Daniel Higuero Alvaro Agea dhiguero@stratio.com alvaro@stratio.com #CassandraSummit-201441"