Hi everyone!
My name is Hagay Lupesko, I’m an engineering manager with Amazon AI, and I focus on Deep Learning Systems. Deep Learning Systems is an umbrella term describing the systems used to build, train and run deep learning models.
In this talk, we're going to have an introductory to Deep Learning, which is an exciting field in ML, with high impact applications and use cases. I'm assuming this is a new topic for you, and this may not hold for some of you, but we can adjust the talk as we forward.
A bit about myself: I’m an engineering manager, did my undergrad right here in BGU CS, later did my Master’s in TAU.I built software for a variety of domains, including machine vision, 3D modeling, audio streaming and large scale web systems. Touched and learned lots of domains, and this is a fun part about our field - software is everywhere so throughout one’s career you learn about many different domains and businesses.As I mentioned – I am now in Amazon AI, based at the heart of the silicon valley, and working on deep learning systems.
OK, let’s get going. We'll start with a brief intro to deep learning to understand what it is and why it matters.
With a show of hands – How many of you have experience with ML? know what Deep Learning is? How many have ever implemented a neural network? How many have deployed one to production?
Let’s start with AI. AI is an active research area dating back to at least the 50s if not earlier. Investigating the various aspects of enabling machines to mimic, and surpass, human intelligence. Alan Turing, a computing pioneer, calibrated the essence of AI by moving from the philosophical question “can machines think” to a more relevant question “can machines do what us humans can?”
ML is a subset of AI, and is really a different programming paradigm. Traditional programming, that is mostly taught at schools, is about us humans programming rules, and the machine executing these rules on data to provide answers. ML is taking in data and answers, and constructs the rules by itself. This is closer to how humans learn from experience.
So ML is the set of techniques that enables machines to learn rules from data, without being explicitly programmed. ML is really an umbrella term that includes algorithms like decision trees, SVM and also neural networks.
This takes us to Deep Learning. DL is a subset of ML, a technique inspired by the human brain – or neurons to be more exact – that uses interconnected artificial neurons to learn from samples.
So, how is Deep Learning different from Machine Learning? Why does it deserve a category of its own?
There’s a few key ways in how DL is different than other ML techniques.
Automated feature learning – with ML, when you go about solving a problem, you need to identify the important features, write the code to extract these features, and then feed it to the learning algorithm. In problems with high dimensionality, this is very difficult to do, is very time consuming, and tend to not transfer well between domains. With DL, this is mostly not needed - the neural network takes care of identifying the features itself – which greatly simplifies the work for us humans.
Data – DL tends to require lots of data, typically much more than other ML techniques. ImageNet, as an example, is a database with labeled images, used for training vision models such as image classification. It consists of more than 14M images.What is even more interesting, is that DL tends to work better the more data you feed in for training. This is different than most other ML techniques that do not improve further.
Computationally Intensive – DL is very intensive for training but also for inference. Training a modern network can take days or even weeks, depending on the size of the model. One feed forward through a modern DNN can take billions of FLOPsGeneric Architecture – DL, or more specifically DNN, have an architecture that works effectively across different problem domains such as Vision, NLP and more.
A bit about why Deep Learning is a big deal
Whether you are aware or not - Deep Learning is already applied in many domains today, and the list is growing, and so is the impact on our lives.
If you look at the breadth of AI applied within Amazon alone, you can see DL in the Retail Website within personalization and recs, you can see it optimizing Amazon’s logistics, you probably noticed the boom voice-enabled personal assistants, and you may have heard that Amazon drones also rely on deep learning, just as other autonomous vehicles tech is relying on it. And of course the list goes on.
Beyond the growing usage of DL in applications and devices around us, there is another interesting aspect to deep learning, and that is how well it does compared to the dominant species on this planet: us!
One of the first areas Deep Learning was able to demonstrate state of the art results, was in the domain of Machine Vision. A classical problem in that domain is Object Classification: given an image, identify the most prominent object in that image out of a set of pre-defined classes. A DNN presented in 2012 by Alex Krizhevsky, was able to leap-frog the best known algo to date by over 30%. That was really a major leap, and since then every year the best algorithm for Object Classification, and many other Vision tasks, are based on Deep Learning, with results that keep on getting better. Research paper by Geirhos from 2017 shows that DNNs already outperform humans in Object Classification – a task us humans have been programmed to specialize in by evolution. The paper also shows that human vision actually performs better when noise is introduced – it may make you feel better, it worked for me
AlexNet paper: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Humans vs DNNs paper: https://arxiv.org/pdf/1706.06969.pdf
Now, to wrap up this introduction of why Deep Learning is a very significant piece of new technology, let’s take a look at a demo, published by Nvidia, showing a self driving technology that relies on Deep Learning for detection and classification.
Talk about:
3D detection of objects
Classification of objects: cars, humans, separation lines
All done in real time
Imagine how this will change people's lives when self driving cars are a reality: commute to work will be different, real estate will be different, elderly care will be different...
So at the base of Deep Learning there is the artificial neural network, and at the base of that there is the artificial neurons.
Artificial neurons are inspired by the human brain’s neuron cells. These cells are abundant in the brain, we have ~100B of them, they receive, process and transmit information through electrical and chemical signals. They are connected to one another via Synapses, which we have ~1Quadrillion of (that’s 1000 trillion) – and form neural networks that are responsible for much of our brain and spinal cord activities.
Artificial neurons are only inspired by real neurons. In fact, the AN construct is pretty simple:
We have inputs coming into the AN, and each input has a weight parameter assigned to it. There is a linear combination of the input vector and the weight vector, and the output scalar is fed into a non linear function that then spits out another scalar. The non-linear function is very important, since it makes the overall network able to handle non-linear features.
That’s it. This output is, in turn, becoming one of the input of another neuron.
Now, to create an artificial neural network, we simply stack neurons into layers, and interconnect layers into networks, where each neuron in a given layer is connected to every neuron in the next layer.
Our input layer takes in our inputs x1 to xn, and the output layer produces the output y.The layers between the input and output are called "Hidden Layers", and this is also why this is called "Deep Learning" - because modern networks that are effectively solving problems are also very "deep" - i.e. have lots of hidden layers.
Now what is remarkable is that it was proven that under some assumptions on the activation function, a neural network with just one hidden layer can approximate any function f(x).
However, the tough part in Neural Networks is not building them, it is training them…
The difficult part is training the network, so we can find the right weights that will approximate the function modeling the problem we are trying to solve.
We start with the “Forward Pass”, in which we take a sample from our labeled input data, feed it through the network to get the inference, or prediction result.
We then do the “Backwards Pass”, also called “Backprop”, where we calculate the loss, i.e. how bad did the network did compared to the “Ground Truth” – the label of the sample input data – and then we back propagate the loss across the network, finding the gradients of each weight to identify the direction of the error.
We then update the weights across the network, in a direction opposite to the gradient, and in a value that is typically a fraction of the gradient – this fraction is called the “Learning Rate”.
The Backwards Pass is where learning happens. Through repeated iterations, we are leveraging the gradient to take down the loss, until we converge into a low error rate.
So learning is really an optimization problem.
We’re constantly updating the network weights, to decrease the loss function and find a minima.
So now that we understand at a high level how DNN are built, operate and are trained, let’s talk a bit about some of the problems.
While DNN has been pretty successful so far, they do pose some challenges:
The learning algorithm is based on Gradient Descent. However, sometimes networks suffer from gradients that either die (vanishing) or increase exponentially (exploding) – which requires tuning the network or the activation function.
Sometimes your network will converge at a local minima – you need to tune your hyper parameters to help it avoid it
Overfitting happens when your model learns your specific training examples, and does not generalize. It is a common problem.
Most real world network training will require you to tweak the hyper parameters, such as Learning Rate, to get to an optimal learning – this is mostly an art, not a science.
Networks needs lots and lots of data for training – if you don’t have it, it would be hard to train one.
Modern networks need strong and expensive GPUs to train quickly – otherwise you are looking at days or more for modern, cutting edge networks.
Once you have a good network at hand – you have no idea how it works or how it makes decisions! This can be a problem in domains such as Healthcare, Aviation ot self driving cars.
In this lab, we will use Apache MXNet as our deep learning franework.With a show of hands: who is familiar with MXNet?So just a bit of background on MXNet:
It is an Apache open source project. People sometimes think it is an “Amazon Project” but it is not. It is truly open source, decisions are made by the community. However, it is true that AWS is contributing a lot to the project.
It is a framework for building, training and using DNNs for inference. Similar to TF, PyTorch, etc.
It originated in the adademia, CMU and UW
Aws adopted MXNet late 2016 as “DL FW of choice), there’s a nice blog post by AWS CTO (Vogels) explaining more in details. A lot of it is about scalability and MXNet being good for production use.
So what is SageMaker, in a nut shell?
It is a fully managed platform, that makes it super easy and fast to develop your models from abstract ideas up to production.
Let’s look at what this means.
The three main SM workflow pillars are: (1) Building (2) Training (3) Hosting
OK, now let’s do a demo of actually building and training a neural network.
We’ll try to solve a problem known as “Sentiment Analysis” – analyzing the sentiment in text.
We’ll want to write a neural network, that takes in a user movie review, and classifies it as either “Positive” or “Negative”.
We’ll be using:
Apache MXNet and the Gluon API for the network and training
Stanford’s Large Movie Review Dataset, which contains 50k labeled movie user review from IMDB (http://ai.stanford.edu/~amaas/data/sentiment/ )
Stanford’s GloVe – words vector representation, which is by itself a major task – we will leverage that in our network instead of learning it ourselves (https://nlp.stanford.edu/projects/glove/)
Running the demo:
$ cd ~/code/aws-sentiment-analysis-mxnet-gluon
$ jupyter notebook
Change to Python 3
Start running…
What is sentiment analysis – a NLP application that classifies text or speech into some specified sentiment.
Natural language is processed – something on social media or reviews for products or customer feedback.
Connect this text to a positive or negative sentiment.
Areas where this is used – what are people saying about your brand on Twitter, Facebook.
In 2016 elections, there were a lot of projects on analyzing sentiments from tweets.
Feedback from customers on call centers.
And there can be many more use cases.
Overall pipeline of implementing sentiment analysis.
First we need data which is labeled as positive or negative. We’ll first talk about the dataset that was used.
Then we’ll talk about how to translate the words that we get as inputs into a format that is easy for analysis.
Then we’ll have a look at the neural network model that was trained.
We’ll also look at the code for these 3 steps.
We’re using the data set on movie reviews from rotten tomatoes. This was used in one of the papers at Stanford.
So there were close to 12000 sentences labeled as positive or negative. Approximately half were positive and half were negative.
Out of these 9000 were picked for training, 2000 for testing.
One hot vector from a vocabulary.
Problem: Relations are unknown. Cannot generalize.
Take a feature and capture the word’s relation to this feature.
Featurized Representation of Words.
Embeddings are learnt (features) from large text corpus.
Usually use standard pre-trained Embeddings. Ex: FastText, GloVe.
Using these Embeddings, we are actually transferring the learning.
Using embedding, we can have smaller training data.
Thank you for listening, I hope you learned about deep learning systems and serving, and had a good time.
MXNet and Model Server are open source - feel free to try it out and file issues. We’re also hiring aggressively, so if you have talented friends that want to be part of the DL revolution - feel free to refer and talk to us!
Thank you!