More Related Content Similar to implementation of a big data architecture for real-time analytics with data stax enterprise graph, analytics and search (20) implementation of a big data architecture for real-time analytics with data stax enterprise graph, analytics and search1. MAY 21 - 23, 2019
Gaylord National Resort & Convention Center Maryland
2. Implementation of a Big Data
Architecture for Real-Time Analytics
with DataStax Enterprise Graph,
Analytics and Search
Joseph Arriola
3. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
About me…
11+ 3 30+ 5 15+
4. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Agenda
● Challenges
● Context of Architectures of Business
Intelligence
● What does the Big Data world offer?
● Tools based on the use case
● What did DataStax do?
● Where to start with DSE?
● Use Case - Building it
6. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Challenges
● Create an architecture Real-Time Analytics
● Without degrade the transactional service of legacy
● Scalability: Process and storage
● Allow connection with enterprise tools of Business
Intelligences
7. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Old Architecture
Data BaseApplications Analysis
Read & Write Read
Real Time
Analytics
Without
degrade
Scalability Allow connection
8. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Business Intelligence Architecture
Data
Warehouse
Data Base
Legacy
Systems
Data Source
ETL Cube
Process Data Model & Storage OLAP Data Presentation
Dashboard
Reports
Ad-hoc Queries
1 Day of Delay
Real Time
Analytics
Without
degrade
Scalability Allow connection
10. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
What does the Big Data world offer?
11. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Tools based on the use case
Batch Layer
Speed Layer
Serving
Layer
PipelineReal Time
Events
Lambda Architecture
12. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Tools based on the use case
Serving
Layer
Batch
Layer
Speed
Layer
Serving
Layer
Serving
Layer
Batch
Layer
Speed
Layer
Batch
Layer
Speed
Layer
Pipeline
Speed
Layer
14. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
What did DataStax do?
Apache Cassandra, Spark, Lucene, Solr, TinkerPop ® Apache Software Foundation
DSE OpsCenter
DataStax Studio / Drivers
15. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Where to start with DSE?
● Recap Cassandra Topology
● Data replication
● Type of workload Datacenter
16. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Recap Cassandra Topology
DC1 DC2
Cluster
● Node: A single instance
● Datacenter: A logical grouping of nodes
● Cluster: A logical grouping of data centers
17. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Data replication
● Replication automatically handled
● SimpleStrategy
● NetworkTopologyStrategy
80
Client
CREATE KEYSPACE keyspace_name
WITH REPLICATION = { 'class' : 'SimpleStrategy',
'replication_factor' : 3 };
18. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Data replication
● Replication automatically handled
● SimpleStrategy
● NetworkTopologyStrategy
CREATE KEYSPACE keyspace_name
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy',
'DC1' : 3, 'DC2’ : 3 };
Client
Client
19. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
CQL
SELECT id, artist_name FROM music.solr WHERE id = 123 LIMIT 10
● Restriction of Partition Key
● There is no
−GROUP BY – Sum, avg, etc…
−JOINS
−LIKE ‘%%’
https://docs.datastax.com/en/dse/6.7/cql/cql/cqlAbout.html
20. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
● DSE Analytics
Spark
Cassandra
Connector
val result = sqlContext.sql("SELECT category, count(1) FROM
demo GROUP BY category ")
https://spark.apache.org/
https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/analytics/analyticsTOC.html
21. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
● DSE Analytics
https://spark.apache.org/docs/latest/streaming-programming-guide.html
https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/spark/sparkStreamingIntro.html
22. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
● DSE Analytics
● DSE Search
SELECT id, artist_name FROM music.solr WHERE solr_query =
'artist_name:Miles*' LIMIT 10
CQL + Solr_query
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/search/searchAbout.html
23. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
● DSE Analytics
● DSE Search
● DSE SearchAnalytics
val result = sqlContext.sql("SELECT artist_country, sum(1) FROM
music.solr
GROUP BY artist_country
WHERE solr_query = 'artist_name:A*’ ")
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/analytics/dseSearchAnalyticsOverview.html
24. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
● DSE Analytics
● DSE Search
● DSE SearchAnalytics
● DSE Graph
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/graph/graphTOC.html
25. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DataStax Enterprise Graph Analytics
https://academy.datastax.com/resources/ds332
26. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DataStax Enterprise Graph Analytics
https://academy.datastax.com/resources/ds332
27. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DataStax Enterprise Graph Analytics
https://academy.datastax.com/resources/ds332
28. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DataStax Enterprise Graph Analytics
https://academy.datastax.com/resources/ds332
29. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DataStax Enterprise Graph Analytics
https://academy.datastax.com/resources/ds332
1
1
2
2
30. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Type of workload Datacenter
● Transactional
● DSE Analytics
● DSE Search
● DSE SearchAnalytics
● DSE Graph
/etc/default/dse
GRAPH_ENABLED=1
SPARK_ENABLED=1
SOLR_ENABLED=1
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/production/initializeDCPerType.html
31. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DSE - AlwaysOn SQL
● Is a high availability service that responds to SQL queries
from JDBC and ODBC applications.
● It is built on top of the Spark SQL Thriftserver.
● DSE Graph data is also available via SQL in the form of
vertex tables and edge tables, just like in DseGraphFrames.
https://www.datastax.com/2018/05/introducing-alwayson-sql-for-dse-analytics
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/spark/alwaysOnSql.html?#alwaysOnSql__alwaysOnSqlEnabling
32. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
DSE - AlwaysOn SQL
● What do we need to have?
−A running datacenter with DSE Analytics nodes enabled.
−Setup and Enabled AlwaysOn SQL in the alwayson_sql_options
section in dse.yaml.
−Start the Service.
https://www.datastax.com/2018/05/introducing-alwayson-sql-for-dse-analytics
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/spark/alwaysOnSql.html?#alwaysOnSql__alwaysOnSqlEnabling
34. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Use Case
● A telco company needs to implement a Big Data Real Time
analytics architecture in order to monitor effective SMS
campaigns.
● It is important to have a real-time dashboard and available
connections to perform custom analyzes.
35. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture – Transactional
36. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture – Transactional
Central EDR
https://streamsets.com/
https://streamsets.com/documentation/datacollector/latest/help/index.html
37. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture
Analytics + Search + Graph
ASG
DataStax Studio 2.0
CREATE KEYSPACE sms_campaigns
WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', ‘DC_T' : 3, ‘DC_ANLTCS’ : 3 };
38. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture
Analytics + Search + Graph
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/studio/installStudio.html
https://www.datastax.com/2017/04/announcing-datastax-studio-2-0-a-powerful-developer-environment-for-datastax-enterprise
39. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture
Analytics + Search + Graph
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/studio/installStudio.html
DataStax Studio 2.0
ASG
AlwaysOn SQL
40. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture - Real Time Analytics
ASG
41. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture - Real Time Analytics
42. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture - Real Time Analytics
ASG
43. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture - Real Time Analytics
Real-Time Card
Real-Time and Historic Data Analysis
44. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Proposed architecture - Real Time Analytics
https://www.pubnub.com/tutorials/microsoft-power-bi/streaming-business-data-to-dashboards/
45. © DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
Whole Architecture
ASG
AlwaysOn SQL
BI Tools
AD-HOC Queries
Data exploration
Real-Time Data
DseGraphFrames
Gremlin Query
Real Time
Analytics
Without
degrade
Scalability Allow connection
46. © DataStax, All Rights Reserved.Confidential
Q&A
Joseph Arriola
jcarriolaa@gmail.com
jcarriola@solcomp.com
https://www.linkedin.com/in/jcarriolaa/
Editor's Notes I have been working in IT for around 11 years, in different industries such as Public Sector, Bank, Reteilers and telco company. I started as developer en java, C++, C#. and then in data werouse. that´s allowed to me start in the world of big data.
I am an Information System Engineering, I studied a master's degree in business intelligece and also in philosophy.
In order to prepare myself in the world of big data, I have studied different online certifications including "Apache Cassandra professional" by DataStax.
I have been working around 5 years in the digital transformation and big data. implementing projects, providing consultancies and trannings.
I am the founder of the Big Data Guatemala community, with the purpose of making known different technologies. in my speaker profile there are already around 15 conferences in different countries such as Guatemala, El Salvador, Mexico and for the first time in the USA. https://www.datastax.com/2018/05/introducing-alwayson-sql-for-dse-analytics
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/spark/alwaysOnSql.html?#alwaysOnSql__alwaysOnSqlEnabling
https://www.datastax.com/2018/05/introducing-alwayson-sql-for-dse-analytics
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/spark/alwaysOnSql.html?#alwaysOnSql__alwaysOnSqlEnabling