Do you know how to use StreamSets Data Collector with Google Cloud Platform (GCP)? In this session we'll explain how YaloChat designed and implemented a streaming architecture that is sustainable, operable and scalable. Discover how we deployed Data Collector to integrate GCP components such as Pub / Sub and BigQuery to achieve DataOps in the cloud
10. https://www.yalochat.com/
DataOps
âDataOps is a methodology that spans people, processes, tools, and services to enable
enterprises to rapidly, repeatedly, and reliably deliver production data from a vast array
of enterprise data sources to a vast array of enterprise data consumers.â
Getting DataOps Right
DataOps Principles:
âą Continually satisfy your customer
âą Orchestrate
âą Monitor quality and performance
https://www.dataopsmanifesto.org/
12. https://www.yalochat.com/
Some Steps to Implement DataOps
Add data and logic tests
Use a version control system
Branch and merge
Use multiple environments
Reuse & Containerize
Parameterize your processing
https://www.datakitchen.io/content/DataKitchen_dataops_cookbook.pdf
22. https://www.yalochat.com/
Manage pipelines from a central repository
View published pipelines, filter
by type, drill down into pipeline
config
Inspect pipeline version history
View and monitor status of
pipelines
31. https://www.yalochat.com/
Compute Engine focuses on having an
infrastructure as a service, in which we have to
configure every aspect of this infrastructure and
manage our resources. This service is charged for
resource use.
https://cloud.google.com/docs/
32. https://www.yalochat.com/
Object storage system, which allows you to archive
unstructured data and large files (PB), self-
manageable and easily integrated with the other
services of the Google Cloud Platform.
https://cloud.google.com/docs/
33. https://www.yalochat.com/
Interactive database to analyze large volumes of
data with very fast response times. It manages the
infrastructure and resources automatically for fast
and efficient operation. Use a fee for use and
storage.
https://cloud.google.com/docs/
34. https://www.yalochat.com/
Cloud Pub/Sub brings the flexibility and reliability
of enterprise message-oriented middleware to the
cloud. Is a scalable, durable event ingestion and
delivery system. Delivers low-latency, durable
messaging that helps developers quickly integrate
systems.
https://cloud.google.com/docs/
36. https://www.yalochat.com/
Small steps for a great result
Assessment
âą Inventory of
Information
Sources
âą Identify Metrics
âą Introduce BI
terminology
Identified gaps
âą Technical Debt
âą There aren't
defined processes
âą Several efforts
separately
âą Silos of data
Define action
points
âą Implement a data
government.
âą Implement a
DataOps
framework.
âą Define the right
tools.
Align with the
Goals
âą Prioritize according
to Business
âą Think Outcomes
âą Small deliverables
39. https://www.yalochat.com/
Lessons
It represents a cultural change
Think of the Use Cases to achieve outcomes
Start development a Data Governance
Start with a little sprints a goals and then growh
StreamSets support us to achive the goals
Looking for automationâs Ops through cloud
I recommend starting with SDC and evolving to Control Hub
41. https://www.yalochat.com/
Use Case
Implement an architecture that allows capturing events
in real time, keeping them in a message queue. It is
necessary to send the events to a historical repository
and to the Data Warehouse for further analysis.