This course teaches participants the following skills:
Design and build data processing systems on Google Cloud Platform
Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
Derive business insights from extremely large datasets using Google BigQuery
Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML
Leverage unstructured data using Spark and ML APIs on Cloud Dataproc
Enable instant insights from streaming data
6. Data Services
Use Google Cloud Dataproc, an Apache Hadoop, Apache Spark, Apache
Pig, and Apache Hive service, to easily process big datasets at low cost.Use
Control your costs by quickly creating managed clusters of any size and
turning them off when you're done.Control
Cloud Dataproc integrates across Google Cloud Platform products, giving
you a powerful and complete data processing platform.Integrate
8. Data
Services Dataflow is a unified programming model
and a managed service for developing
and executing a wide range of data
processing patterns including ETL, batch
computation, and continuous computation.
Cloud Dataflow frees you from operational
tasks like resource management and
performance optimization.
9. Data Services
These examples give you a sense of
the processing capabilities of Dataflow.
In the simple model pipeline, data is
input from source into a PCollection,
transformed, and output. The pipeline
is a Directed Acyclic Graph (DAG).
In the multiple transform pipelines,
data read from BigQuery is filtered into
two collections based on the initial
character of the name.
11. Data Services
BigQuery is Google's fully managed, petabyte scale, low cost
enterprise data warehouse for analytics. BigQuery
is serverless.
There is no infrastructure to manage and you don't need a
database administrator, so you can focus on analyzing data to
find meaningful insights using familiar SQL.
BigQuery is a powerful Big Data analytics platform used by all
types of organizations, from startups to Fortune 500 companies.
14. Cloud PUB/SUB
Cloud Pub/Sub is a fully-managed real-time messaging
service that allows you to send and receive messages
between independent applications
Decouples the sender and receiver
Push/Pull Data Service
Asynchronous communications
Many benefits over direct communication
https://cloud.google.com/pubsub/architecture
15. Benefits of Cloud Pub/Sub
Scales globally
Low latency
Dynamic rate limiting
Availability
Durability - replicated storage of messages
Reliability
End-to-end reliability via application ACKs
Security
Encryption in motion and at rest
Maintenance
PUB/SUB
18. Balancing workloads in network clusters
Implementing asynchronous workflows
Distributing event notifications
Refreshing distributed caches
Logging to multiple systems
Data streaming from various processes or devices
Reliability improvement
CLOUD PUB/SUB USE CASES
20. Exposes an API for front-end client for mobile or web-
application to make use of cloud-based application services
Frees developers from writing wrapper to access App
Engine resources from a mobile or web client
Cloud ENPOINTS
24. Event-based microservices
● Fully managed, serverless,
secure
● Triggers
○ Cloud Pub/Sub, HTTP,
Cloud Storage
● Code
Deploy functions from a Cloud
Storage bucket, Github or
Bitbucket repo
Written in Javascript and runs
in Node.js
● Stackdriver integration
Cloud FUNCTIONS
25. Cloud FUNCTION
Compare Cloud Functions with Cloud Endpoints. Cloud Endpoints exposes an array
of endpoint or API functions, whereas Cloud Functions exposes a single endpoint.
The Cloud Endpoints backend is an App Engine backend, so you have a long-running
programming environment with full access to complex data and storage services.
In Cloud Functions, you have one single piece of code that accepts a limited input,
executes rapidly, produces some output, and exits.