2. almo
Google Developer Program Lead PAN EU
@davilagrau
https://www.linkedin.com/in/aleonar
https://www.instagram.com/davilagrau
https://github.com/almo
https://www.facebook.com/davilagrau
3. Lucas Käldström’s Motivation
● Kubernetes’ Dev Community
○ committer
○ maintainer
● Main motivations
○ Learning Google’s
technologies
○ Developing Open Source
○ Re-use old/cheap HW
4. Photo by Ken Treloar on Unsplash
● Hyperparameter tuning
○ Auto ML
● Scaling on QPS
○ ML API
● Ensemble learning
● Data Parallelism
Model Parallelism?
Don’t ask, don’t tell
9. Kubernetes
● Kubernetes is an open-source system
for automating deployment, scaling,
and management of containerized
applications.
● Horizontal scaling
● Service discovery and load balancing
● Self-healing
11. TensorFlow
● TensorFlow is an open source
software library for numerical
computation using data flow
graphs.
● TensorFlow has APIs available in
C++, Python, Java and Go.
● TensorFlow has also bindings
for: C#, Haskell, Julia, Ruby, Rust,
and Scala.
● TensorFlow Lite is TensorFlow’s
lightweight solution for mobile
and embedded devices
Raspberry Pi version
is coming soon!
15. Setting up the master
Settings
● OS: Installing Docker on Raspbian OR
○ Download and flash HypriotOS v1.7.1 from
https://goo.gl/y9Jyzd
● Setting up Kubernetes repositories
○ Key / Source
Master
Kubeadm
Commands
● apt-get update && apt-get install -y kubeadm
● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset >
/boot/cmdline.txt
● swapoff -a #Note: Kubernetes 1.8
● kubeadm init --pod-network-cidr 10.244.0.0/16
16. Setting up the node (each)
Settings
● OS: Installing Docker on Raspbian OR
○ Download and flash HypriotOS v1.7.1 from
https://goo.gl/y9Jyzd
● Setting up Kubernetes repositories
○ Key / Source
Commands
● apt-get update && apt-get install -y kubeadm
● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset
> /boot/cmdline.txt
● swapoff -a #Note: Kubernetes 1.8
● kubeadm join --token=XXXXX Master-IP
Node #2
Kubelet
Docker
17. Setting the network
Flannel: flannel is a virtual network that
gives a subnet to each host for use with
container runtimes
“Platforms like Google's Kubernetes assume that
each container (pod) has a unique, routable IP inside
the cluster. The advantage of this model is that it
reduces the complexity of doing port mapping”
20. Parallelization Strategies
Distributed TensorFlow
Explicit (device block): TensorFlow will insert
the appropriate data transfers between the jobs.
with tf.device(“/cpu:0”):
a = tf.Variable(3.0)
b = tf.Variable(3.0)
c = a * b
Parallelization strategies:
● In-graph replication
● Between-graph replication
● Asynchronous training
● Synchronous training
TensorFlow Serving
It might be also
a function
21. TensorFlow Cluster
A TensorFlow "cluster" is a set of "tasks" that participate
in the distributed execution of a TensorFlow graph.
Steps:
1. Create a tf.train.ClusterSpec that describes all of the
tasks in the cluster. This should be the same for each
task.
2. Create a tf.train.Server, passing the
tf.train.ClusterSpec to the constructor, and identifying
the local task with a job name and task index.
25. Sharding Variables in Multiples Parameters Servers
with tf.device(tf.train.replica_device_setter())
with
tf.train.MonitoredTrainingSession(master=server.target,
is_chief=(FLAGS.task_index == 0),
checkpoint_dir="/tmp/train_logs",
hooks=hooks) as mon_sess:
Node #1
PS Node #2
PS
Node #0
PS
26. Docker Image
CPU Only i.e. Raspberry Pi
● TensorFlow 1.1 https://goo.gl/URUpko
● Official TensorFlow Lite for Raspberry Pi,
Coming Soon! https://goo.gl/viqtuQ
● resin/rpi-raspbian + tensorflow 1.4
Coming soon!
28. Pod Controllers: Stateful Sets (PS & Workers)
● Manages the deployment and
scaling of a set of Pods.
● Provides guarantees about the
ordering and uniqueness of
these Pods.
● StatefulSet manages Pods that
are based on an identical
container spec.
● StatefulSet maintains a sticky
identity for each of their Pods.
StatefulSet
#0
Node #1
PS
Node #0
PS
Node #1
Worker
Node #0
Worker
StatefulSet
#1
34. Kubernetes: orquestación de imagenes de Rasberri Pi
para el despliegue de tensorflow server.
Description:
Structure:
- Use case: IoT / processing information
- Architecture Kubernetes + Tensorflow+ rasberri pi
- Python
- Introduction to Kubernetes (Ansible?) (Laura)
- Master / Slave architecture (Laura)
- Container description: labelling and pod matching
(Laura)
- Configuration (Laura)
- Load balancer (Laura)
- round robin HTTP request (Laura)
- Monitoring load of the replicas (Laura)
- Failure tolerance (Laura)
- Introduction to Tensor Flow
- Introduction to TensorFlow / MachineLearning
- Computation Graph
- Introduction to TensorFlow Server
- Development of Use Case