"Awesomeness of Container Orchestras - Data Crunching in a Company Builder", Martin Held, Data Scientist at FinLeap
Watch more from Data Natives Tel Aviv 2016 here: http://bit.ly/2hw1MY0
Visit the conference website to learn more: http://telaviv.datanatives.io/
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Martin made his first contact with the great potential of data analysis when he saw how it can be utilised to significantly improve the diagnostics of cancer. Motivated by this early experience he decided to pursue a PhD in the field, setting out to investigate the dynamics of molecular systems using fancy statistical modelling techniques. After finishing his PhD he moved on to a big pharmaceutical cooperation to facilitate the automation of statistical analysis and reporting of pre-clinical research experiments. His next career step took him into a more dynamic environment: an organisation whose mission it is to build and incubate startups. Where he now works on different venture projects tackling a wide range of data related questions, ranging from click through rate optimisation to microservice based data processing.
5. How we did it
WebUI Backend
Service A Service B
Monolithic Application (typically RoR)
SQL
Data
6. Some Learnings
âą finding developers is hard, finding developers
experienced in a specific tech stack is even
harder
âą integrating data science like functionality is
not straightforward
âą scaling the monolithic application can be
challenging
7. How we do it these days
WebUI
SQL
APIGateway
MsgBroker
Backend
Service A
Service B
we plan and implement containerised microservice architectures
8. This allows us
âą multilingual applications
âą better utilisation of existing dev resources
âą broader talent pool for recruitment
âą plug-in data science solutions as service
âą scaling
âą tech wise - âscale bottleneck servicesâ
âą business wise - dedicated teams for different
tasks
âą reuse of services (authentication etc.)
10. A Docker Container
Container 1 Container 2 Container 3
-package
applications with
its dependencies
-more lightweight
than virtual
machines (shared
the OS)
-run on any
computer, any
infrastructure, any
cloud
11. Life in a (DataScientist) Container
FROM ubuntu:14.04
ENV PYTHONPATH /opt/caffe/python
# Add caffe binaries to path
ENV PATH $PATH:/opt/caffe/.build_release/tools
# Get dependencies
RUN apt-get update && apt-get install -y bc cmake curl gcc-4.6 g++-4.6 wget
# Use gcc 4.6
RUN update-alternatives --install /usr/bin/cc cc /usr/bin/gcc-4.6 30 &&
# Clone the Caffe repo
RUN cd /opt && git clone https://github.com/BVLC/caffe.git
# Build Caffe core
RUN cd /opt/caffe && cp MakeïŹle.conïŹg.example MakeïŹle.conïŹg &&
# Add ld-so.conf so it can ïŹnd libcaffe.so
ADD caffe-ld-so.conf /etc/ld.so.conf.d/
# Run ldconïŹg again (not sure if needed)
RUN ldconïŹg
# Install python deps
RUN cd /opt/caffe &&
12. Life in a (DataScientist) Container
RUN wget -O models/deploy.prototxt https://raw.githubusercontent.com/BVLC/caff
deploy.prototxt
13. Life in a (DataScientist) Container
USE ubuntu base Image
Install Os dependencies (g++, python, git, fortran, curl, etc )
Clone Caffe repo (open source Deep Learning Lib)
Build Caffe Core
Install Python Dependencies
Build Caffe Python bindings
Add Model and Source Code
Specify Execution Command
+
+
+
14. Life in a (DataScientist) Container
Image Recognition Al
fish, aquarium, child
16. The Orchestra
Scheduling
place and start container on host(s) offering required
resources
Service Discovery + Registration
allow containers to communicate with each other and
the rest of the world
Implement Resilience
e.g. auto restart containers in case of failure
22. Reverse Image Search
Crawler
Product Pictures
Metadata
Product
Picture
Raw Metadata
Message Broker
Feature
Extractor
Picture Features
Metadata
Parser
Raw
Metadata
Structured
Metadata
23. Reverse Image Search
Crawler
Product Pictures
Metadata
Product
Picture
Raw Metadata
Message Broker
Feature
Extractor
Picture Features
Metadata
Parser
Raw
Metadata
Structured
Metadata
StorageSink
Store
Picture
Features,
Structured Data
24. Reverse Image Search
Crawler
Product Pictures
Metadata
Product
Picture
Raw Metadata
Message Broker
Feature
Extractor
Picture Features
Metadata
Parser
Raw
Metadata
Structured
Metadata
StorageSink
Store
Picture
Features,
Structured Data Nearest
Neighbor
Search
API
Query
Picture
Similar
Pictures
Query
Features
Similar
Pictures
25. to take home
containers are great tool to package code with all its
dependencies and make it usable by others
allow us develop scalable plug-in ready data science solutions
complex scalable and resilient architectures for the masses
26. Thank you for your attention!
www.linkedin.com/in/martin-held
@