SlideShare ist ein Scribd-Unternehmen logo
1 von 37
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Scaling Face Recognition
with Big Data
Bogdan BOCȘE
Solutions Architect & Co-founder VisageCloud
https://VisageCloud.com
https://www.linkedin.com/in/bogdanbocse/
https://twitter.com/bocse
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• How to learn ?
• What to learn?
• Defining learning objectives
• How to scale learning?
• Gotchas
• VisageCloud
–Architecture
–Use Cases
Agenda
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• What questions to ask before writing the code?
• How to look at the data before feeding it to the
machine?
• What is the state of the art regarding ML?
• What frameworks to use?
• What are the common traps to avoid?
• How to design for scale?
Objectives
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
HOW TO LEARN?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Vision
• Convolutional Neural Networks
• Inception Paper
NLP
• Word2Vec
• GloVe: Global Vectors for Words Representation
Generic
• Classification
• Prediction
How to Learn?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Convolutional Neural Networks: Big Picture
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Pooling / Max Pooling
• Convolution
• Fully Connected Activation
– Activation Function, eg. ReLu
Convolutional Neural Networks : Components
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Learning is an optimization problem
–Find parameters of a system (neural network) that
minimize a fixed error function
–Not unlike planning orbital paths
• Defining the network architecture
• Defining the training algorithm
–Stochastic Gradient Descent
• With momentum
• With noisy
Taking a Step Back: The Math
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• DeepLearning4j
– Independent company
– Java interface with C-bindings for performance
• TensorFlow
– Python & C++ API
– Developed by Google
– Compatible with TPU
• Torch
– Developed by Facebook
– Written in LuaJIT, with Python bindings
Frameworks
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
WHAT TO LEARN?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Public data sets
–Labelled Faces in the Wild (LFW)
–Youtube faces
–Kaggle
• Private data sets
• Build your own
–Outsourcing: Mechanical Turk
–Crowsourcing: ReCaptcha model
Data Sets
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Preparing Data
Clean
data
Cropping
Structure
Homogeneity
Normalization
Histograms
Filtering
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Machine learning is not magic
• If you can’t understand the data, a machine probably
won’t either
• Preprocessing makes the difference between results
• Applying filters, normalization, anomaly detection is
computationally inexpensive
Preparing Data
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
DEFINING LEARNING OBJECTIVES
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Supervised
–Classification
–Scoring and regression
–Identification
• Unsupervised
–Clustering
Defining learning objectives
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Projecting input onto a fixed set of classes
• “Don’t use a cannon to kill a fly”
–Support Vector Machines
• Linear
• Radial Based Functions
Classification
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Embedding
–Projecting input (image) onto an vector space with a
known property
• Triplet Loss Function
Identification
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Splitting a set of items into non-overlapping subsets,
based on item attributes
• Counting people in video streams
• Algorithms:
–Fixed threshold
–K-means
–Rank-order clustering
Clustering
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
HOW TO SCALE LEARNING?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Scaling training
– Requires shared memory space
– Vertical scaling
• GPU
• Soon-to-come: TPU (tensor processing unit)
• Scaling evaluation
– Shared nothing architecture
– Neural network/classifier rarely change
– Load balancing pattern
– Partitioning data if needed
How to scale learning?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• There is no “reduce” for neural networks
• Averaging weights/parameters
– Usually not a good idea
• Genetic algorithms
– Requires a lot of processing power
– Running independent iterations on different machines
– Crossover between weights/parameters of independently
trained neural networks after each epoch
Ideas for horizontal scaling
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
GOTCHAS
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Our 2D and 3D intuition often fails in high dimensions
• Distances tend to become relatively “the same” as
number of dimensions increases
• Dimensionality reduction
– Embedding functions
– Principal component analysis
The Curse of Dimensionality
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• “The bottom of a valley is not necessarily the lowest
point on Earth”
• Learning algorithms may get stuck in local optima
• Using momentum or some random noise reduces
this possibility
• Using genetic algorithms can be even more robust,
but it’s computationally expensive
Local Optima
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Visualizing Local Optima
monkey saddle
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
“Based on state-of-the-art machine learning, our
weather forecast system can predict tomorrow’s
weather with 72% accuracy”
Evaluating of Learning
You get the same results by saying “it’s going to be the same as today”
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Don’t test on the data you train on
– Use different data set
– Split the data sets you have
• Beware of data biases
– Confirmation bias
– Survivorship bias
– Selection bias
• Compare against a benchmark, even a dummy one
– Coin flip
– Linear algorithms
– “Same-as-before”
Evaluation of Learning
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Architecture and Use Cases
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
High Level Architecture
VisageCloud Production
HAProxy
(reverse proxy)
Image Storage
AWS S3
Service
(API Controller)
Cassandra
Containers
(Docker)
Neural Networks
(OpenCV, Dlib,
Torch, pixie magic)
CQL Binary
HTTP
API Consumer
(Customer Infrastructure)
HTTPS
HTTP
HTTPS
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Detect
faces
Align faces
Pre-
processing
Feature
extraction
Feature
comparison
Processing Pipeline
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• The collection
–Slice of data used together
–10K-100K records
• The Cache-Inside Pattern
–Loading / preloading collection in one application server
–Content based routing/balancing to maximize cache hits
–No logic in the database layer
–Requires periodic polling for updates
• Weaker consistency
Partitioning Data: Application Level Logic
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Partitioning Data: Application Level Logic
Application Layer
Application Application Application
Cassandra (Database Layer)
Cassandra Node Cassandra Node Cassandra Node Cassandra Node
Content-based balancing/routing
Preload collectionPoll for updatesWrite updates
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Perform comparison logic in database
–User Defined Aggregate Functions
• Removes the need to move data around between
application and database
• Harder to deploy/test
• Stronger consistency
Partitioning Data: Application Level Logic
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• It’s math, not magic
• If you don’t understand the data, neither will the
machine
• Preprocessing makes the difference
• Test against a benchmark, any benchmark
• Evaluate first, scale later
Key Take-away
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Bogdan@VisageCloud.com
+(40) 724 714 234
https://www.linkedin.com/in/bogdanbocse/
https://twitter.com/bocse
Let’s keep in touch
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Many thanks to our sponsors & partners!
GOLD
SILVER
PARTNERS
PLATINUM
POWERED BY
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldSrivatsan Srinivasan
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in AzureValdas Maksimavičius
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platformJesse Wang
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...Dataconomy Media
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...Dataconomy Media
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Donghui Zhang
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeIBM Analytics
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopCCG
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 

Was ist angesagt? (20)

Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native world
 
Semantic Data Management
Semantic Data ManagementSemantic Data Management
Semantic Data Management
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse..."Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
"Industrializing Machine Learning – How to Integrate ML in Existing Businesse...
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 

Ähnlich wie Scaling Face Recognition with Big Data

Scaling face recognition with big data - Bogdan Bocse
 Scaling face recognition with big data - Bogdan Bocse Scaling face recognition with big data - Bogdan Bocse
Scaling face recognition with big data - Bogdan BocseITCamp
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp
 
Azure SQL Database From A Developer's Perspective - Alex Mang
Azure SQL Database From A Developer's Perspective - Alex MangAzure SQL Database From A Developer's Perspective - Alex Mang
Azure SQL Database From A Developer's Perspective - Alex MangITCamp
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienITCamp
 
The fight for surviving in the IoT world
The fight for surviving in the IoT worldThe fight for surviving in the IoT world
The fight for surviving in the IoT worldRadu Vunvulea
 
The fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu VunvuleaThe fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu VunvuleaITCamp
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp
 
Execution Plans in practice - how to make SQL Server queries faster - Damian ...
Execution Plans in practice - how to make SQL Server queries faster - Damian ...Execution Plans in practice - how to make SQL Server queries faster - Damian ...
Execution Plans in practice - how to make SQL Server queries faster - Damian ...ITCamp
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloITCamp
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...ITCamp
 
It camp 2015 how to scale above clouds limits, radu vunvulea
It camp 2015   how to scale above clouds limits, radu vunvuleaIt camp 2015   how to scale above clouds limits, radu vunvulea
It camp 2015 how to scale above clouds limits, radu vunvuleaRadu Vunvulea
 
A new world of possibilities for contextual awareness with beacons - Dan Arde...
A new world of possibilities for contextual awareness with beacons - Dan Arde...A new world of possibilities for contextual awareness with beacons - Dan Arde...
A new world of possibilities for contextual awareness with beacons - Dan Arde...ITCamp
 
Blockchain for mere mortals - understand the fundamentals and start building ...
Blockchain for mere mortals - understand the fundamentals and start building ...Blockchain for mere mortals - understand the fundamentals and start building ...
Blockchain for mere mortals - understand the fundamentals and start building ...ITCamp
 
A new world of possibilities for contextual awareness with beacons
A new world of possibilities for contextual awareness with beaconsA new world of possibilities for contextual awareness with beacons
A new world of possibilities for contextual awareness with beaconsDan Ardelean
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Diego Oppenheimer
 
Testing your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin LoghiadeTesting your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin LoghiadeITCamp
 
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...ITCamp
 
Enacting Scrum - What it takes to maximize the chances for a successful adopt...
Enacting Scrum - What it takes to maximize the chances for a successful adopt...Enacting Scrum - What it takes to maximize the chances for a successful adopt...
Enacting Scrum - What it takes to maximize the chances for a successful adopt...ITCamp
 
[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...
[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...
[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...Enterprise Management Associates
 

Ähnlich wie Scaling Face Recognition with Big Data (20)

Scaling face recognition with big data - Bogdan Bocse
 Scaling face recognition with big data - Bogdan Bocse Scaling face recognition with big data - Bogdan Bocse
Scaling face recognition with big data - Bogdan Bocse
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
 
Azure SQL Database From A Developer's Perspective - Alex Mang
Azure SQL Database From A Developer's Perspective - Alex MangAzure SQL Database From A Developer's Perspective - Alex Mang
Azure SQL Database From A Developer's Perspective - Alex Mang
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines Kergosien
 
The fight for surviving in the IoT world
The fight for surviving in the IoT worldThe fight for surviving in the IoT world
The fight for surviving in the IoT world
 
The fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu VunvuleaThe fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu Vunvulea
 
ITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depthITCamp 2018 - Damian Widera U-SQL in great depth
ITCamp 2018 - Damian Widera U-SQL in great depth
 
Execution Plans in practice - how to make SQL Server queries faster - Damian ...
Execution Plans in practice - how to make SQL Server queries faster - Damian ...Execution Plans in practice - how to make SQL Server queries faster - Damian ...
Execution Plans in practice - how to make SQL Server queries faster - Damian ...
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea SaltarelloAzure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
 
It camp 2015 how to scale above clouds limits, radu vunvulea
It camp 2015   how to scale above clouds limits, radu vunvuleaIt camp 2015   how to scale above clouds limits, radu vunvulea
It camp 2015 how to scale above clouds limits, radu vunvulea
 
A new world of possibilities for contextual awareness with beacons - Dan Arde...
A new world of possibilities for contextual awareness with beacons - Dan Arde...A new world of possibilities for contextual awareness with beacons - Dan Arde...
A new world of possibilities for contextual awareness with beacons - Dan Arde...
 
Blockchain for mere mortals - understand the fundamentals and start building ...
Blockchain for mere mortals - understand the fundamentals and start building ...Blockchain for mere mortals - understand the fundamentals and start building ...
Blockchain for mere mortals - understand the fundamentals and start building ...
 
A new world of possibilities for contextual awareness with beacons
A new world of possibilities for contextual awareness with beaconsA new world of possibilities for contextual awareness with beacons
A new world of possibilities for contextual awareness with beacons
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
 
Testing your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin LoghiadeTesting your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin Loghiade
 
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
 
Enacting Scrum - What it takes to maximize the chances for a successful adopt...
Enacting Scrum - What it takes to maximize the chances for a successful adopt...Enacting Scrum - What it takes to maximize the chances for a successful adopt...
Enacting Scrum - What it takes to maximize the chances for a successful adopt...
 
[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...
[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...
[Analyst Research Slides] Build vs. Buy: Finding the Best Path to Network Aut...
 

Mehr von Bogdan Bocse

Whatever your question is, math already has a map to the answer
Whatever your question is, math already has a map to the answerWhatever your question is, math already has a map to the answer
Whatever your question is, math already has a map to the answerBogdan Bocse
 
The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...
The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...
The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...Bogdan Bocse
 
The deconstruction of the Chinese Room
The deconstruction of the Chinese Room The deconstruction of the Chinese Room
The deconstruction of the Chinese Room Bogdan Bocse
 
#SafeNet - COVID-19 Contact Tracing
#SafeNet - COVID-19 Contact Tracing#SafeNet - COVID-19 Contact Tracing
#SafeNet - COVID-19 Contact TracingBogdan Bocse
 
The Commoditization of Intelligence
The Commoditization of IntelligenceThe Commoditization of Intelligence
The Commoditization of IntelligenceBogdan Bocse
 
Computer Vision - The New Renaissance or 1983?
Computer Vision - The New Renaissance or 1983?Computer Vision - The New Renaissance or 1983?
Computer Vision - The New Renaissance or 1983?Bogdan Bocse
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureBogdan Bocse
 
The VisageCloud Domain Model
The VisageCloud Domain ModelThe VisageCloud Domain Model
The VisageCloud Domain ModelBogdan Bocse
 
Training and Face Recognition in 5 Easy Steps with VisageCloud
Training and Face Recognition in 5 Easy Steps with VisageCloudTraining and Face Recognition in 5 Easy Steps with VisageCloud
Training and Face Recognition in 5 Easy Steps with VisageCloudBogdan Bocse
 
VisageCloud - Face Recognition meets Big Data.
VisageCloud - Face Recognition meets Big Data.VisageCloud - Face Recognition meets Big Data.
VisageCloud - Face Recognition meets Big Data.Bogdan Bocse
 
Agile Business Analysis - Certificate
Agile Business Analysis - CertificateAgile Business Analysis - Certificate
Agile Business Analysis - CertificateBogdan Bocse
 
Axway - comunicat de presa - Hackathon
Axway  - comunicat de presa - HackathonAxway  - comunicat de presa - Hackathon
Axway - comunicat de presa - HackathonBogdan Bocse
 
ScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazione
ScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazioneScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazione
ScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazioneBogdan Bocse
 
Certification - Agile Business Analysis
Certification - Agile Business AnalysisCertification - Agile Business Analysis
Certification - Agile Business AnalysisBogdan Bocse
 
ScentSee - Consilier virtual pentru descoperire și recomandare de parfum
ScentSee - Consilier virtual pentru descoperire și recomandare de parfumScentSee - Consilier virtual pentru descoperire și recomandare de parfum
ScentSee - Consilier virtual pentru descoperire și recomandare de parfumBogdan Bocse
 
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)Bogdan Bocse
 
What is Solution Architecture?
What is Solution Architecture?What is Solution Architecture?
What is Solution Architecture?Bogdan Bocse
 
Certificate for Architect Enterprise Applications with Java EE
Certificate for Architect Enterprise Applications with Java EECertificate for Architect Enterprise Applications with Java EE
Certificate for Architect Enterprise Applications with Java EEBogdan Bocse
 
TimeOP: Automated System for PC Activity Tracking and User Productivity Analysis
TimeOP: Automated System for PC Activity Tracking and User Productivity AnalysisTimeOP: Automated System for PC Activity Tracking and User Productivity Analysis
TimeOP: Automated System for PC Activity Tracking and User Productivity AnalysisBogdan Bocse
 
Performanta si Inovatie
Performanta si InovatiePerformanta si Inovatie
Performanta si InovatieBogdan Bocse
 

Mehr von Bogdan Bocse (20)

Whatever your question is, math already has a map to the answer
Whatever your question is, math already has a map to the answerWhatever your question is, math already has a map to the answer
Whatever your question is, math already has a map to the answer
 
The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...
The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...
The Intelligence Wars -Neopolitics of so-called ”A.I.” in the Digital Post-tr...
 
The deconstruction of the Chinese Room
The deconstruction of the Chinese Room The deconstruction of the Chinese Room
The deconstruction of the Chinese Room
 
#SafeNet - COVID-19 Contact Tracing
#SafeNet - COVID-19 Contact Tracing#SafeNet - COVID-19 Contact Tracing
#SafeNet - COVID-19 Contact Tracing
 
The Commoditization of Intelligence
The Commoditization of IntelligenceThe Commoditization of Intelligence
The Commoditization of Intelligence
 
Computer Vision - The New Renaissance or 1983?
Computer Vision - The New Renaissance or 1983?Computer Vision - The New Renaissance or 1983?
Computer Vision - The New Renaissance or 1983?
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
 
The VisageCloud Domain Model
The VisageCloud Domain ModelThe VisageCloud Domain Model
The VisageCloud Domain Model
 
Training and Face Recognition in 5 Easy Steps with VisageCloud
Training and Face Recognition in 5 Easy Steps with VisageCloudTraining and Face Recognition in 5 Easy Steps with VisageCloud
Training and Face Recognition in 5 Easy Steps with VisageCloud
 
VisageCloud - Face Recognition meets Big Data.
VisageCloud - Face Recognition meets Big Data.VisageCloud - Face Recognition meets Big Data.
VisageCloud - Face Recognition meets Big Data.
 
Agile Business Analysis - Certificate
Agile Business Analysis - CertificateAgile Business Analysis - Certificate
Agile Business Analysis - Certificate
 
Axway - comunicat de presa - Hackathon
Axway  - comunicat de presa - HackathonAxway  - comunicat de presa - Hackathon
Axway - comunicat de presa - Hackathon
 
ScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazione
ScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazioneScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazione
ScentSee - Consigliere virtuale per la scoperta fragranza e la raccomandazione
 
Certification - Agile Business Analysis
Certification - Agile Business AnalysisCertification - Agile Business Analysis
Certification - Agile Business Analysis
 
ScentSee - Consilier virtual pentru descoperire și recomandare de parfum
ScentSee - Consilier virtual pentru descoperire și recomandare de parfumScentSee - Consilier virtual pentru descoperire și recomandare de parfum
ScentSee - Consilier virtual pentru descoperire și recomandare de parfum
 
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
The Rise of Digital Audio (AdsWizz, DevTalks Bucharest, 2015)
 
What is Solution Architecture?
What is Solution Architecture?What is Solution Architecture?
What is Solution Architecture?
 
Certificate for Architect Enterprise Applications with Java EE
Certificate for Architect Enterprise Applications with Java EECertificate for Architect Enterprise Applications with Java EE
Certificate for Architect Enterprise Applications with Java EE
 
TimeOP: Automated System for PC Activity Tracking and User Productivity Analysis
TimeOP: Automated System for PC Activity Tracking and User Productivity AnalysisTimeOP: Automated System for PC Activity Tracking and User Productivity Analysis
TimeOP: Automated System for PC Activity Tracking and User Productivity Analysis
 
Performanta si Inovatie
Performanta si InovatiePerformanta si Inovatie
Performanta si Inovatie
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Scaling Face Recognition with Big Data

  • 1. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Scaling Face Recognition with Big Data Bogdan BOCȘE Solutions Architect & Co-founder VisageCloud https://VisageCloud.com https://www.linkedin.com/in/bogdanbocse/ https://twitter.com/bocse
  • 2. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • How to learn ? • What to learn? • Defining learning objectives • How to scale learning? • Gotchas • VisageCloud –Architecture –Use Cases Agenda
  • 3. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • What questions to ask before writing the code? • How to look at the data before feeding it to the machine? • What is the state of the art regarding ML? • What frameworks to use? • What are the common traps to avoid? • How to design for scale? Objectives
  • 4. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals HOW TO LEARN?
  • 5. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Vision • Convolutional Neural Networks • Inception Paper NLP • Word2Vec • GloVe: Global Vectors for Words Representation Generic • Classification • Prediction How to Learn?
  • 6. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Convolutional Neural Networks: Big Picture
  • 7. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Pooling / Max Pooling • Convolution • Fully Connected Activation – Activation Function, eg. ReLu Convolutional Neural Networks : Components
  • 8. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Learning is an optimization problem –Find parameters of a system (neural network) that minimize a fixed error function –Not unlike planning orbital paths • Defining the network architecture • Defining the training algorithm –Stochastic Gradient Descent • With momentum • With noisy Taking a Step Back: The Math
  • 9. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • DeepLearning4j – Independent company – Java interface with C-bindings for performance • TensorFlow – Python & C++ API – Developed by Google – Compatible with TPU • Torch – Developed by Facebook – Written in LuaJIT, with Python bindings Frameworks
  • 10. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals WHAT TO LEARN?
  • 11. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Public data sets –Labelled Faces in the Wild (LFW) –Youtube faces –Kaggle • Private data sets • Build your own –Outsourcing: Mechanical Turk –Crowsourcing: ReCaptcha model Data Sets
  • 12. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Preparing Data Clean data Cropping Structure Homogeneity Normalization Histograms Filtering
  • 13. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Machine learning is not magic • If you can’t understand the data, a machine probably won’t either • Preprocessing makes the difference between results • Applying filters, normalization, anomaly detection is computationally inexpensive Preparing Data
  • 14. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals DEFINING LEARNING OBJECTIVES
  • 15. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Supervised –Classification –Scoring and regression –Identification • Unsupervised –Clustering Defining learning objectives
  • 16. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Projecting input onto a fixed set of classes • “Don’t use a cannon to kill a fly” –Support Vector Machines • Linear • Radial Based Functions Classification
  • 17. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Embedding –Projecting input (image) onto an vector space with a known property • Triplet Loss Function Identification
  • 18. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Splitting a set of items into non-overlapping subsets, based on item attributes • Counting people in video streams • Algorithms: –Fixed threshold –K-means –Rank-order clustering Clustering
  • 19. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals HOW TO SCALE LEARNING?
  • 20. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Scaling training – Requires shared memory space – Vertical scaling • GPU • Soon-to-come: TPU (tensor processing unit) • Scaling evaluation – Shared nothing architecture – Neural network/classifier rarely change – Load balancing pattern – Partitioning data if needed How to scale learning?
  • 21. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • There is no “reduce” for neural networks • Averaging weights/parameters – Usually not a good idea • Genetic algorithms – Requires a lot of processing power – Running independent iterations on different machines – Crossover between weights/parameters of independently trained neural networks after each epoch Ideas for horizontal scaling
  • 22. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals GOTCHAS
  • 23. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Our 2D and 3D intuition often fails in high dimensions • Distances tend to become relatively “the same” as number of dimensions increases • Dimensionality reduction – Embedding functions – Principal component analysis The Curse of Dimensionality
  • 24. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • “The bottom of a valley is not necessarily the lowest point on Earth” • Learning algorithms may get stuck in local optima • Using momentum or some random noise reduces this possibility • Using genetic algorithms can be even more robust, but it’s computationally expensive Local Optima
  • 25. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Visualizing Local Optima monkey saddle
  • 26. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals “Based on state-of-the-art machine learning, our weather forecast system can predict tomorrow’s weather with 72% accuracy” Evaluating of Learning You get the same results by saying “it’s going to be the same as today”
  • 27. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Don’t test on the data you train on – Use different data set – Split the data sets you have • Beware of data biases – Confirmation bias – Survivorship bias – Selection bias • Compare against a benchmark, even a dummy one – Coin flip – Linear algorithms – “Same-as-before” Evaluation of Learning
  • 28. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Architecture and Use Cases
  • 29. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals High Level Architecture VisageCloud Production HAProxy (reverse proxy) Image Storage AWS S3 Service (API Controller) Cassandra Containers (Docker) Neural Networks (OpenCV, Dlib, Torch, pixie magic) CQL Binary HTTP API Consumer (Customer Infrastructure) HTTPS HTTP HTTPS
  • 30. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Detect faces Align faces Pre- processing Feature extraction Feature comparison Processing Pipeline
  • 31. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • The collection –Slice of data used together –10K-100K records • The Cache-Inside Pattern –Loading / preloading collection in one application server –Content based routing/balancing to maximize cache hits –No logic in the database layer –Requires periodic polling for updates • Weaker consistency Partitioning Data: Application Level Logic
  • 32. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Partitioning Data: Application Level Logic Application Layer Application Application Application Cassandra (Database Layer) Cassandra Node Cassandra Node Cassandra Node Cassandra Node Content-based balancing/routing Preload collectionPoll for updatesWrite updates
  • 33. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Perform comparison logic in database –User Defined Aggregate Functions • Removes the need to move data around between application and database • Harder to deploy/test • Stronger consistency Partitioning Data: Application Level Logic
  • 34. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • It’s math, not magic • If you don’t understand the data, neither will the machine • Preprocessing makes the difference • Test against a benchmark, any benchmark • Evaluate first, scale later Key Take-away
  • 35. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Bogdan@VisageCloud.com +(40) 724 714 234 https://www.linkedin.com/in/bogdanbocse/ https://twitter.com/bocse Let’s keep in touch
  • 36. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Many thanks to our sponsors & partners! GOLD SILVER PARTNERS PLATINUM POWERED BY
  • 37. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Q & A