SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Software & Services Solution Brief 1
Real-Time Pipeline Accelerator
Speed is a key driver for an increasing number of analytics use cases, ranging
from fraud detection for financial transactions to Internet of Things (IoT) monitoring
with sensor-generated data. Immediate analysis of these new data streams in
real-time can bring tremendous value—whether in delivering competitive business
advantage, averting potential crises, or creating new revenue opportunities.
If your organization wants to enable rapid prototyping and development for
real-time analytics, there is now an on-premises solution to help you get started
quickly and easily.
Spark Streaming + Kafka + Cassandra
BlueData makes it easy to deploy Spark infrastructure and applications on-
premises. The BlueData EPIC™ software platform is purpose-built to simplify
and accelerate the deployment of Spark, Hadoop, and other tools for Big Data
analytics—leveraging Docker containers and virtualized infrastructure.
Our new Real-Time Pipeline Accelerator solution provides the software and
professional services you need for building data pipelines in a multi-tenant
environment for Spark Streaming, Kafka, and Cassandra. With this solution,
your data scientists will be able to create integrated pipelines to capture, analyze
(model/score), and store real-time data streams within a matter of minutes.
Now your data scientists and developers can focus on their use cases and
pipelines, without worrying about the infrastructure complexities of technologies
like Spark, Kafka, and Cassandra. And as your uses cases mature and expand
over time, you can use the BlueData EPIC platform to extend your pipelines with
complementary applications and frameworks.
Target Audience
•	Organizations looking to get started with real-time data pipelines
•	Organizations with existing data pipelines using Spark Streaming, Kafka,
and Cassandra that need a sandbox environment for Dev/QA/UAT
•	Data scientists, analysts, engineers, architects, and IT infrastructure teams
Get started with Spark Streaming, Kafka, and Cassandra for real-time data analytics.
BlueData™ provides a turnkey solution with software and services to accelerate your
deployment and build end-to-end real-time data pipelines within minutes.
www.bluedata.com
Software & Services Solution Brief
SOLUTION HIGHLIGHTS
	Accelerate the deployment
of your lab for real-time
analytics with Spark
Streaming, Kafka, and
Cassandra.
	Build real-time data
pipelines with a turnkey
environment for rapid
prototyping, development,
testing, and quality
assurance.
	Provide a standardized user
experience for creating
consistent and repeatable
pipelines, with support
for various stages of the
application lifecycle.
	Improve agility through
self-service, empowering
data scientists to spin up
new clusters in a matter of
minutes—with just a few
mouse clicks.
	Increase developer
productivity with a multi-
tenant sandbox for real-time
analytics, including Zeppelin
notebooks and other
JDBC-supported tools.
	1 year subscription
of BlueData EPIC
Enterprise software for
up to 60 physical cores
(approximately 5 nodes)
+ professional services.
Software & Services Solution Brief 2
What is a Real-Time Data Pipeline?
Today, there are continuous streams of data in large volumes being generated from financial markets, sensors,
machine logs, social media, mobile applications, and many other sources. Rather than staging this data and
analyzing it after it’s been stored, it’s becoming increasingly important to analyze the data in real-time as events
are happening—to make instant decisions and take immediate action. This data is often perishable and may lose its
operational value in a very short time frame. Speed is of the essence.
In the general sense, a data pipeline consists of all the steps necessary to capture, prepare, and process your data
for analysis. Building data pipelines for real-time analysis requires specific technologies and methodologies:
•	Real-time means you can’t afford to wait for days, hours, or even minutes before running your analysis and
generating insights–the tools need to support this fundamental requirement.
•	The technologies and frameworks for real-time analytics are different from existing enterprise systems and
batch-oriented data processing frameworks.
•	There are multiple components (both software and infrastructure) needed to capture and hold streaming data, process
and analyze multiple sets of streams in small batches, and store these valuable results for continuous analysis.
•	It is complex and time-consuming to assemble all the infrastructure and software required, and most
organizations lack the skills to deploy and wire together all of these components.
Implementing the Real-Time Trinity
For data scientists and developers working with real-time data pipelines, the stack of Spark-Kafka-Cassandra has
quickly emerged as the best place to start. This new trinity delivers on key requirements for real-time
data analytics:
•	Spark: a fast in-memory data processing engine, and the fastest growing Apache open source technology.
Spark Streaming is an extension of the core Spark API; it allows integration of real-time data from disparate
event streams.
•	Kafka: a messaging system to capture and publish streams of data. With Spark you can ingest data from
Kafka, filter that stream down to a smaller data set, augment the data, and then push that refined data set
to a persistent data store.
•	Cassandra: this data needs to be written to a scalable and resilient operational database like Cassandra for
persistence, easy application development, and real-time analytics.
These open source frameworks are powerful and well-suited for the requirements of building real-time data
pipelines. However, it may take weeks and even months for your team to get ramped up and started. For example,
you will likely need to train at least one or two team members on Spark, Kafka, and Cassandra (typically one week
of training or more, for each framework). You will need to build pipeline integrations between these different
frameworks and test them internally on your infrastructure (requiring at least another couple weeks or more).
Creating a Spark-Kafka-Cassandra lab for multiple data scientists and developers—to support rapid prototyping,
development, testing, and quality assurance—can be a challenging and expensive initiative. And as you begin to
add more use cases and users, you will need to expand the infrastructure, provision more hardware, and
integrate more tools.
BlueData’s Real-Time Pipeline Accelerator solution is designed to address these challenges—making it simple
and easy to get up and running with this new stack for real-time analytics.
www.bluedata.com
Software & Services Solution Brief 3
www.bluedata.com
Real-Time Pipeline Accelerator
With the new Real-Time Pipeline Accelerator solution from BlueData, your organization will have a ready-to-run,
multi-tenant lab environment for Spark Streaming, Kafka, and Cassandra.
The figure below illustrates an example sandbox environment for multiple data scientists and developers (tenants)
with the BlueData EPIC software platform running on shared, cost-effective infrastructure. BlueData provides an
easy-to-use interface and out-of-the-box support for Zeppelin notebooks, command line access, and other JDBC
tools–along with sample data and use cases for real-time pipelines. Developers and data scientists can self-
provision the key components for their real-time data pipelines in a matter of minutes–using pre-packaged Docker
images for Spark, Kafka, and Cassandra.
The key components of the Real-Time Pipeline Accelerator solution include:
•	1 year subscription license for BlueData EPIC Enterprise up to 60 physical cores (approximately 5 nodes)
•	Pre-configured Docker images for Spark, Kafka, and Cassandra
•	UI-based cluster manager to build real-time data pipelines with Spark Streaming, Kafka and Cassandra
•	Quickstart examples and sample documentation for two end-to-end data pipelines (e.g. log analysis)
•	5 days of remote professional services and consulting from BlueData experts
For pricing questions or additional information, contact sales@bluedata.com
Sandbox for multiple developers
Easy-to-use interface to spin
up instant clusters
Command line access and
web-based notebooks
Sample data and sample use
cases (e.g. log analysis)
Integrated Spark + Kafka +
Cassandra solution
Pre-packaged Docker images
BlueData software platform
Shared infrastructure
Real-Time Data Pipeline
Shared, Centrally Managed Infrastructure
Kafka Spark Streaming Cassandra
>
Software & Services Solution Brief 4
www.bluedata.com
Solution Benefits
BlueData’s Real-Time Pipeline Accelerator solution is designed to accelerate the deployment of data pipelines for
real-time analytics. Some of the benefits include:
•	Self-service agility. Users can spin up or spin down instant virtual clusters of Spark, Kafka, and Cassandra–
on-demand, within minutes.
•	Improved productivity. Developers can start being productive immediately and focus on their use cases.
With web-based Zeppelin notebooks, developers can easily share their datasets and analysis with other users.
•	Consistent and reproducible pipelines. A standardized user experience enables the creation of consistent
and repeatable data pipelines, with support for various stages of the application lifecycle.
•	Sample data pipelines. Sample datasets and use cases are available for immediate use. BlueData experts
will help deploy the Spark-Kafka-Cassandra stack and implement sample data pipelines.
•	Lower cost. Your organization can save up to 75% on server and storage infrastructure, with the ability to
run multiple virtual nodes on shared infrastructure.
•	Scalable. With the solution’s multi-tenant architecture, it’s easy to scale your lab environment and add more
users or infrastructure resources as your deployment grows.
•	Extensible. As your use cases mature and expand beyond the Spark-Kafka-Cassandra stack, BlueData
supports complementary applications and frameworks to extend your pipelines and integrate additional tools.
•	No lock-in. The solution stitches together open source components in a loosely coupled fashion, so there is
no vendor lock-in. The BlueData EPIC Platform is highly configurable, and designed to support a wide variety
of different Big Data use cases and applications.
With this solution, your organization will have a ready-to-run lab environment for rapid prototyping, development,
testing, and quality assurance with Spark Streaming, Kafka, and Cassandra. With help from the BlueData team,
you’ll also have two end-to-end real-time data pipelines as a starting point.
And with your Enterprise license of the BlueData EPIC software platform, you’ll have a multi-tenant infrastructure
platform that can be easily extended to additional Big Data uses cases and applications—for both data in motion and
data at rest—with support for Spark and Hadoop as well as leading business intelligence, analytics, visualization,
and data preparation tools.
To learn more about the BlueData EPIC software platform, visit www.bluedata.com
© 2016 bluedata

Weitere ähnliche Inhalte

Was ist angesagt?

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 

Was ist angesagt? (20)

Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache Kafka
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 

Andere mochten auch

Kasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina Suikki
Kasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina SuikkiKasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina Suikki
Kasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina Suikki
Jemina Suikki
 
The Internal Environment, SWOT and Marketing Plan
The Internal Environment, SWOT and Marketing PlanThe Internal Environment, SWOT and Marketing Plan
The Internal Environment, SWOT and Marketing Plan
Chris Veenker
 

Andere mochten auch (16)

Exec. summary
Exec. summaryExec. summary
Exec. summary
 
Kasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina Suikki
Kasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina SuikkiKasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina Suikki
Kasvisten tuoteturvallisuus alkutuotannossa. opinnäytetyö Jemina Suikki
 
La Flexibilidad (SEBASTIAN AGUILAR GAJARDO)
La Flexibilidad (SEBASTIAN AGUILAR GAJARDO)La Flexibilidad (SEBASTIAN AGUILAR GAJARDO)
La Flexibilidad (SEBASTIAN AGUILAR GAJARDO)
 
「部品」としてのマイコン・半導体
「部品」としてのマイコン・半導体「部品」としてのマイコン・半導体
「部品」としてのマイコン・半導体
 
table of contents
table of contentstable of contents
table of contents
 
Los Carbohidratos (SEBASTIAN AGUILAR GAJARDO)
Los Carbohidratos (SEBASTIAN AGUILAR GAJARDO)Los Carbohidratos (SEBASTIAN AGUILAR GAJARDO)
Los Carbohidratos (SEBASTIAN AGUILAR GAJARDO)
 
「道具」としての 集積回路(LSI)
「道具」としての集積回路(LSI)「道具」としての集積回路(LSI)
「道具」としての 集積回路(LSI)
 
Defence of duress
Defence of duressDefence of duress
Defence of duress
 
Xaco futbol
Xaco futbolXaco futbol
Xaco futbol
 
The Internal Environment, SWOT and Marketing Plan
The Internal Environment, SWOT and Marketing PlanThe Internal Environment, SWOT and Marketing Plan
The Internal Environment, SWOT and Marketing Plan
 
CTOが仕事に対する新しい考え方を教えてくれた話
CTOが仕事に対する新しい考え方を教えてくれた話CTOが仕事に対する新しい考え方を教えてくれた話
CTOが仕事に対する新しい考え方を教えてくれた話
 
WSDM2016報告会−論文紹介(Understanding User Attention and Engagement in Online News R...
WSDM2016報告会−論文紹介(Understanding User Attention and Engagement in Online News R...WSDM2016報告会−論文紹介(Understanding User Attention and Engagement in Online News R...
WSDM2016報告会−論文紹介(Understanding User Attention and Engagement in Online News R...
 
Planes clase practica iv boxeo
Planes clase practica iv boxeoPlanes clase practica iv boxeo
Planes clase practica iv boxeo
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
親に知ってほしい受験勉強
親に知ってほしい受験勉強親に知ってほしい受験勉強
親に知ってほしい受験勉強
 

Ähnlich wie Solution Brief: Real-Time Pipeline Accelerator

StreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformStreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics Platform
Atul Sharma
 
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
VMware Tanzu
 

Ähnlich wie Solution Brief: Real-Time Pipeline Accelerator (20)

Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Business Growth Is Fueled By Your Event-Centric Digital Strategy
Business Growth Is Fueled By Your Event-Centric Digital StrategyBusiness Growth Is Fueled By Your Event-Centric Digital Strategy
Business Growth Is Fueled By Your Event-Centric Digital Strategy
 
StreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformStreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics Platform
 
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdfApache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
 
Computaris builds analytics solution for large datacenter network traffic
Computaris builds analytics solution for large datacenter network trafficComputaris builds analytics solution for large datacenter network traffic
Computaris builds analytics solution for large datacenter network traffic
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analytics
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)
 

Mehr von BlueData, Inc.

How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
BlueData, Inc.
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 

Mehr von BlueData, Inc. (18)

Introduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes MeetupIntroduction to KubeDirector - SF Kubernetes Meetup
Introduction to KubeDirector - SF Kubernetes Meetup
 
Dell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big DataDell EMC Ready Solutions for Big Data
Dell EMC Ready Solutions for Big Data
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)BlueData EPIC datasheet (en Français)
BlueData EPIC datasheet (en Français)
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Lessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark WorkloadsLessons Learned from Dockerizing Spark Workloads
Lessons Learned from Dockerizing Spark Workloads
 
BlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec SheetBlueData EPIC on AWS - Spec Sheet
BlueData EPIC on AWS - Spec Sheet
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Big Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 TelcoBig Data Case Study: Fortune 100 Telco
Big Data Case Study: Fortune 100 Telco
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Spark Infrastructure Made Easy
Spark Infrastructure Made EasySpark Infrastructure Made Easy
Spark Infrastructure Made Easy
 
BlueData Integration with Cloudera Manager
BlueData Integration with Cloudera ManagerBlueData Integration with Cloudera Manager
BlueData Integration with Cloudera Manager
 

Kürzlich hochgeladen

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Kürzlich hochgeladen (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

Solution Brief: Real-Time Pipeline Accelerator

  • 1. Software & Services Solution Brief 1 Real-Time Pipeline Accelerator Speed is a key driver for an increasing number of analytics use cases, ranging from fraud detection for financial transactions to Internet of Things (IoT) monitoring with sensor-generated data. Immediate analysis of these new data streams in real-time can bring tremendous value—whether in delivering competitive business advantage, averting potential crises, or creating new revenue opportunities. If your organization wants to enable rapid prototyping and development for real-time analytics, there is now an on-premises solution to help you get started quickly and easily. Spark Streaming + Kafka + Cassandra BlueData makes it easy to deploy Spark infrastructure and applications on- premises. The BlueData EPIC™ software platform is purpose-built to simplify and accelerate the deployment of Spark, Hadoop, and other tools for Big Data analytics—leveraging Docker containers and virtualized infrastructure. Our new Real-Time Pipeline Accelerator solution provides the software and professional services you need for building data pipelines in a multi-tenant environment for Spark Streaming, Kafka, and Cassandra. With this solution, your data scientists will be able to create integrated pipelines to capture, analyze (model/score), and store real-time data streams within a matter of minutes. Now your data scientists and developers can focus on their use cases and pipelines, without worrying about the infrastructure complexities of technologies like Spark, Kafka, and Cassandra. And as your uses cases mature and expand over time, you can use the BlueData EPIC platform to extend your pipelines with complementary applications and frameworks. Target Audience • Organizations looking to get started with real-time data pipelines • Organizations with existing data pipelines using Spark Streaming, Kafka, and Cassandra that need a sandbox environment for Dev/QA/UAT • Data scientists, analysts, engineers, architects, and IT infrastructure teams Get started with Spark Streaming, Kafka, and Cassandra for real-time data analytics. BlueData™ provides a turnkey solution with software and services to accelerate your deployment and build end-to-end real-time data pipelines within minutes. www.bluedata.com Software & Services Solution Brief SOLUTION HIGHLIGHTS Accelerate the deployment of your lab for real-time analytics with Spark Streaming, Kafka, and Cassandra. Build real-time data pipelines with a turnkey environment for rapid prototyping, development, testing, and quality assurance. Provide a standardized user experience for creating consistent and repeatable pipelines, with support for various stages of the application lifecycle. Improve agility through self-service, empowering data scientists to spin up new clusters in a matter of minutes—with just a few mouse clicks. Increase developer productivity with a multi- tenant sandbox for real-time analytics, including Zeppelin notebooks and other JDBC-supported tools. 1 year subscription of BlueData EPIC Enterprise software for up to 60 physical cores (approximately 5 nodes) + professional services.
  • 2. Software & Services Solution Brief 2 What is a Real-Time Data Pipeline? Today, there are continuous streams of data in large volumes being generated from financial markets, sensors, machine logs, social media, mobile applications, and many other sources. Rather than staging this data and analyzing it after it’s been stored, it’s becoming increasingly important to analyze the data in real-time as events are happening—to make instant decisions and take immediate action. This data is often perishable and may lose its operational value in a very short time frame. Speed is of the essence. In the general sense, a data pipeline consists of all the steps necessary to capture, prepare, and process your data for analysis. Building data pipelines for real-time analysis requires specific technologies and methodologies: • Real-time means you can’t afford to wait for days, hours, or even minutes before running your analysis and generating insights–the tools need to support this fundamental requirement. • The technologies and frameworks for real-time analytics are different from existing enterprise systems and batch-oriented data processing frameworks. • There are multiple components (both software and infrastructure) needed to capture and hold streaming data, process and analyze multiple sets of streams in small batches, and store these valuable results for continuous analysis. • It is complex and time-consuming to assemble all the infrastructure and software required, and most organizations lack the skills to deploy and wire together all of these components. Implementing the Real-Time Trinity For data scientists and developers working with real-time data pipelines, the stack of Spark-Kafka-Cassandra has quickly emerged as the best place to start. This new trinity delivers on key requirements for real-time data analytics: • Spark: a fast in-memory data processing engine, and the fastest growing Apache open source technology. Spark Streaming is an extension of the core Spark API; it allows integration of real-time data from disparate event streams. • Kafka: a messaging system to capture and publish streams of data. With Spark you can ingest data from Kafka, filter that stream down to a smaller data set, augment the data, and then push that refined data set to a persistent data store. • Cassandra: this data needs to be written to a scalable and resilient operational database like Cassandra for persistence, easy application development, and real-time analytics. These open source frameworks are powerful and well-suited for the requirements of building real-time data pipelines. However, it may take weeks and even months for your team to get ramped up and started. For example, you will likely need to train at least one or two team members on Spark, Kafka, and Cassandra (typically one week of training or more, for each framework). You will need to build pipeline integrations between these different frameworks and test them internally on your infrastructure (requiring at least another couple weeks or more). Creating a Spark-Kafka-Cassandra lab for multiple data scientists and developers—to support rapid prototyping, development, testing, and quality assurance—can be a challenging and expensive initiative. And as you begin to add more use cases and users, you will need to expand the infrastructure, provision more hardware, and integrate more tools. BlueData’s Real-Time Pipeline Accelerator solution is designed to address these challenges—making it simple and easy to get up and running with this new stack for real-time analytics. www.bluedata.com
  • 3. Software & Services Solution Brief 3 www.bluedata.com Real-Time Pipeline Accelerator With the new Real-Time Pipeline Accelerator solution from BlueData, your organization will have a ready-to-run, multi-tenant lab environment for Spark Streaming, Kafka, and Cassandra. The figure below illustrates an example sandbox environment for multiple data scientists and developers (tenants) with the BlueData EPIC software platform running on shared, cost-effective infrastructure. BlueData provides an easy-to-use interface and out-of-the-box support for Zeppelin notebooks, command line access, and other JDBC tools–along with sample data and use cases for real-time pipelines. Developers and data scientists can self- provision the key components for their real-time data pipelines in a matter of minutes–using pre-packaged Docker images for Spark, Kafka, and Cassandra. The key components of the Real-Time Pipeline Accelerator solution include: • 1 year subscription license for BlueData EPIC Enterprise up to 60 physical cores (approximately 5 nodes) • Pre-configured Docker images for Spark, Kafka, and Cassandra • UI-based cluster manager to build real-time data pipelines with Spark Streaming, Kafka and Cassandra • Quickstart examples and sample documentation for two end-to-end data pipelines (e.g. log analysis) • 5 days of remote professional services and consulting from BlueData experts For pricing questions or additional information, contact sales@bluedata.com Sandbox for multiple developers Easy-to-use interface to spin up instant clusters Command line access and web-based notebooks Sample data and sample use cases (e.g. log analysis) Integrated Spark + Kafka + Cassandra solution Pre-packaged Docker images BlueData software platform Shared infrastructure Real-Time Data Pipeline Shared, Centrally Managed Infrastructure Kafka Spark Streaming Cassandra >
  • 4. Software & Services Solution Brief 4 www.bluedata.com Solution Benefits BlueData’s Real-Time Pipeline Accelerator solution is designed to accelerate the deployment of data pipelines for real-time analytics. Some of the benefits include: • Self-service agility. Users can spin up or spin down instant virtual clusters of Spark, Kafka, and Cassandra– on-demand, within minutes. • Improved productivity. Developers can start being productive immediately and focus on their use cases. With web-based Zeppelin notebooks, developers can easily share their datasets and analysis with other users. • Consistent and reproducible pipelines. A standardized user experience enables the creation of consistent and repeatable data pipelines, with support for various stages of the application lifecycle. • Sample data pipelines. Sample datasets and use cases are available for immediate use. BlueData experts will help deploy the Spark-Kafka-Cassandra stack and implement sample data pipelines. • Lower cost. Your organization can save up to 75% on server and storage infrastructure, with the ability to run multiple virtual nodes on shared infrastructure. • Scalable. With the solution’s multi-tenant architecture, it’s easy to scale your lab environment and add more users or infrastructure resources as your deployment grows. • Extensible. As your use cases mature and expand beyond the Spark-Kafka-Cassandra stack, BlueData supports complementary applications and frameworks to extend your pipelines and integrate additional tools. • No lock-in. The solution stitches together open source components in a loosely coupled fashion, so there is no vendor lock-in. The BlueData EPIC Platform is highly configurable, and designed to support a wide variety of different Big Data use cases and applications. With this solution, your organization will have a ready-to-run lab environment for rapid prototyping, development, testing, and quality assurance with Spark Streaming, Kafka, and Cassandra. With help from the BlueData team, you’ll also have two end-to-end real-time data pipelines as a starting point. And with your Enterprise license of the BlueData EPIC software platform, you’ll have a multi-tenant infrastructure platform that can be easily extended to additional Big Data uses cases and applications—for both data in motion and data at rest—with support for Spark and Hadoop as well as leading business intelligence, analytics, visualization, and data preparation tools. To learn more about the BlueData EPIC software platform, visit www.bluedata.com © 2016 bluedata