Come along with me on a survey of a couple of Google's machine learning APIs: Cloud Vision API and Natural Language. Learn how to build and leverage preexisting models without needing to worry about the complexity of selecting a learning algorithm and running training.
Then, for good measure, we explore the mechanisms behind setting up a basic neural network to classify MNIST digits so that you can run other people's TensorFlow models to classify your own examples.
To follow along with the examples, you will want to check out my GitHub repository at https://github.com/mrcity/mlworkshop/ for example code and setup instructions, including how to run an existing TensorFlow model from your own code in order to classify examples.
3. Today’s Mission
Touch on ML API offerings
Explore Google’s RESTful ML APIs
Cloud Vision
Natural Language
TensorFlow (Not a RESTful API but still cool)
Allocate time to play, develop ideas
Have good conversations, network
Find learning partner or group
@SWebCEO +StephenWylie #MachineLearning
4. Before We Start…
Hopefully you followed instructions on
https://github.com/mrcity/mlworkshop/
Get access to the APIs
Install TensorFlow
@SWebCEO +StephenWylie #MachineLearning
5. What is Machine Learning?
Programming computers to deduce things from data…
Conclusions
Patterns
Objects in images
…using generic mathematical methods
No advance knowledge of trends in data
Lots of algorithms available
The process can create beautiful constructs
@SWebCEO +StephenWylie #MachineLearning
6. Who’s using ML?
Chat bots
Self-driving cars
Pepper the robot
MarI/O
Document recognition & field extraction
@SWebCEO +StephenWylie #MachineLearning
7. Machine Learning Tools Ecosystem
APIs you interface with
HP, Amazon, Microsoft, IBM, Google, Facebook’s Caffe on mobile & Web
Software you use
Orange (U of Ljubljana, Slovenia)
Weka (U of Waikato, New Zealand)
Hardware you compile programs to run on
nVidia GPUs with CUDA, DGX-1 supercomputer
Movidius neural compute stick
Amazon DeepLens camera
@SWebCEO +StephenWylie #MachineLearning
8. Google’s ML APIs In Particular
Google Play Services
Mobile Vision API
RESTful ML API Services
Cloud Vision, Cloud Speech, Natural Language, Translation
Job Discovery*, DialogFlow, Cloud Video Intelligence
Cloud ML Engine
Local ML Services
TensorFlow, Tensorflow Lite
Tensorflow Serving
@SWebCEO +StephenWylie #MachineLearning
^ Pre-defined models
v User-defined models
* private beta
10. Detect Faces, Parse Barcodes, Segment
Text
@SWebCEO +StephenWylie #MachineLearning
Availability
Native
Android
Native iOS
RESTful
API
FACE API
BARCODE
API
TEXT API
11. What do you see in that cloud?
Breaks down into more features than just FACE, BARCODE, and TEXT:
From https://cloud.google.com/vision/docs/requests-and-responses
@SWebCEO +StephenWylie #MachineLearning
Feature Type Description
LABEL_DETECTION Execute Image Content Analysis on the entire image and return
TEXT_DETECTION Perform Optical Character Recognition (OCR) on text within an image
FACE_DETECTION Detect faces within the image
LANDMARK_DETECTION Detect geographic landmarks within the image
LOGO_DETECTION Detect company logos within the image
SAFE_SEARCH_DETECTION Determine image safe search properties on the image
IMAGE_PROPERTIES Compute image properties, including dominant colors
12. What do you see in that cloud?
Beta API 1.1 offers additional features:
Note Google does not guarantee any SLAs, deprecation policies, or future
backward compatibility with these services.
@SWebCEO +StephenWylie #MachineLearning
Feature Type Description
IMAGE_PROPERTIES Predict crop hints for the image, in addition to previous data
WEB_DETECTION Detect news, events, or celebrities within an image, then search Google
Images for similar photos
DOCUMENT_TEXT_
DETECTION
Optimize text parser for dense OCR, rather than some text with non-text
13. Cloud Vision APIs
Can simultaneously detect multiple features
Features billed individually per use on image
No Barcode feature
Simple JSON request/response format
Submit image from Cloud Storage or in Base64
Returns 0 or more annotations by confidence
@SWebCEO +StephenWylie #MachineLearning
14. For Your Eyes Only
No OAuth required for Cloud Vision
Make requests using API Key
POST https://vision.googleapis.com/v1/images:annotate?key={YOUR_API_KEY}
Easy to script using Service Account
@SWebCEO +StephenWylie #MachineLearning
15. Response types
@SWebCEO +StephenWylie #MachineLearning
Feature Returns
Label Description of the picture’s contents
Confidence score
Text, Logo Text contents or logo owner name
Bounding polygon containing the text or logo
[Text only] Locale (language)
[Logo only] Confidence score
Face Bounding polygon and rotational characteristics of the face
Positions of various characteristics such as eyes, ears, lips, chin, forehead
Confidence score of exhibiting joy, sorrow, anger, or surprise
Landmark Description of the landmark and confidence score
Bounding polygon of the recognized landmark in the picture
Safe Search Likelihood of the image containing adult or violent content, that it was a spoof, or
contains graphic medical imagery
16. Response types
@SWebCEO +StephenWylie #MachineLearning
Feature Returns
Image
properties
Array of dominant RGB colors within the image, ordered by fraction of pixels
Crop hints by confidence and “importanceFraction”
Web Entities
and Pages
News, events, celebrities, or other labels found within an image
Similar images from Google Image Search
Website URLs containing matching images
Document Text List of nested objects consisting of types Page Block Paragraph Word Symbol
Bounding box of the recognized text
Good example of the Document Text hierarchy at
https://cloud.google.com/vision/docs/detecting-fulltext#vision-document-
text-detection-python
18. Mobile Vision vs. Cloud Vision
Mobile Vision is for Native Android
Free; no usage quotas
Handles more data processing
Can utilize camera video
Takes advantage of hardware
@SWebCEO +StephenWylie #MachineLearning
20. Natural Language API: Analyze Any ASCII
Parses text for parts of speech
Discovers entities like organizations,
people, locations
Analyzes text sentiment
Use Speech, Vision, Translate APIs
upstream
Works with English, Spanish, or Japanese
@SWebCEO +StephenWylie #MachineLearning
21. Sample NL API Request
@SWebCEO +StephenWylie #MachineLearning
From https://cloud.google.com/natural-language/docs/basics
<- Optional, can be guessed automatically
<- Optional, but recommended
// Use only ONE of “content” or “gcsContentUri”
Pick one of https://language.googleapis.com/v1/documents:analyzeEntities
analyzeEntitySentiment
analyzeSentiment
analyzeSyntax
classifyText
22. Combined Sample NL API Request
@SWebCEO +StephenWylie #MachineLearning
From https://cloud.google.com/natural-language/docs/basics
<- Optional, can be guessed automatically
Make request to https://language.googleapis.com/v1/documents:annotateText
26. Making Predictions With Google
Build “trained” model or use “hosted” model
Hosted models (all demos):
Language identifier
Tag categorizer (as android, appengine, chrome, youtube)
Sentiment predictor
Trained models:
Submit attributes and labels for each example
Need at least six examples
Store examples in Cloud Storage
@SWebCEO +StephenWylie #MachineLearning
27. Don’t Model Trains; Train Your Model
Train API against data
prediction.trainedmodels.insert
Send prediction query
prediction.trainedmodels.predict
Update the model
prediction.trainedmodels.update
Other CRUD operations: list, get, delete
@SWebCEO +StephenWylie #MachineLearning
28. Don’t Model Trains; Train Your Model
Insert query requires:
id
modelType
storageDataLocation
Don’t forget: poll for status updates
@SWebCEO +StephenWylie #MachineLearning
29. Permissions To Make Predictions
OAuth is required for Predictions
Easy to script using Service Account
Or, get Web app credentials:
https://console.developers.google.com/apis/credentials
@SWebCEO +StephenWylie #MachineLearning
31. Smartly Auto Fill Google Sheets
Add-on for Google Spreadsheets
No previously trained model needed
Use data with partially-labeled
examples
Specify column to auto-fill
Wait... a... long... time...
@SWebCEO +StephenWylie #MachineLearning
34. About TensorFlow
Offline library for large-
scale numerical
computation
Think of a graph:
Nodes represent
mathematical operations
Edges represent tensors
flowing between them
Excellent at building
deep neural networks
@SWebCEO +StephenWylie #MachineLearning
Soft𝑚𝑎𝑥 𝑥 𝑖 =
𝑒 𝑥 𝑖
𝑗 𝑒 𝑥 𝑗
𝑅𝑒𝐿𝑈 𝑛 =𝑓 𝑥
= max(0, 𝑥)
35. Tense About Tensors?
Think about MNIST handwritten digits
Each number is 28 pixels squared
There are 10 numbers, 0-9
@SWebCEO +StephenWylie #MachineLearning
36. Tense About Tensors?
Define an input tensor of shape
(any batch size, 784)
x = tf.placeholder(tf.float32, shape=[None, 784])
Define a target output tensor of shape
(any batch size, 10)
y_ = tf.placeholder(tf.float32, shape=[None, 10])
Define weights matrix (784x10)
and biases vector (10-D)
@SWebCEO +StephenWylie #MachineLearning
37. One-Hot: Cool To the Touch
Load the input data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
One-hot?!
Think about encoding categorical features:
US = 0, UK = 1, India = 2, Canada = 3, …
This implies ordinal properties and confuses learners
Break encoding into Booleans:
This is where the 10-D target output tensor comes from
@SWebCEO +StephenWylie #MachineLearning
US = [1, 0, 0, 0]
UK = [0, 1, 0, 0]
Etc…
39. TensorFlow Data Structures –
Variables
Values (i.e. model parameters) inside nodes
Used and modified by learning process
Need to be initialized with
W = tf.Variable(tf.zeros([784,10]))
init = tf.global_variables_initializer()
W (weights to scale inputs by), b (bias to add
to scaled value)
@SWebCEO +StephenWylie #MachineLearning
40. Training a Dragon, if the Dragon is a Model
Your Simple Model:
y = tf.matmul(x, W) + b
Cross-entropy: distance between guess & correct answer
cross_entropy = tf.reduce_mean(tf.nn.
softmax_cross_entropy_with_logits(labels=y_, logits=y))
Gradient descent: minimize cross-entropy
train_step =
tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
Learning rate: 0.5
@SWebCEO +StephenWylie #MachineLearning
𝐻 𝑦′ 𝑦 = −
𝑖
𝑦𝑖
′
log(𝑦𝑖)
41. Dragon Get
Wiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiings!
Start a Session
Run global_variables_initializer
Run training for 1000 steps
sess = tf.Session()
sess.run(init)
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
train_step.run(feed_dict={x: batch_xs, y_: batch_ys})
Expensive to use all training data at once!
Pick 100 random samples each step
@SWebCEO +StephenWylie #MachineLearning
42. Test Flight Evaluation
Compare labels between guess y and correct y_
correct_prediction =
tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
Cast each Boolean result into either a 0 or 1, then average it
accuracy = tf.reduce_mean(
tf.cast(correct_prediction, tf.float32))
Print the final figure
print(accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels}))
@SWebCEO +StephenWylie #MachineLearning
44. Future Talks, Other Talks
Follow me if you want to hear these!
Build a Neural Network in Python with NumPy
Build a Neural Network with nVidia CUDA
Elsewhere,
Mapmaking with Google Maps API, Polymer, and
Firebase
The Process Of Arcade Game ROM Hacking
@SWebCEO +StephenWylie #MachineLearning
45. More Resources
Google’s “Googly Eyes” Android app [Mobile Vision API]
https://github.com/googlesamples/android-
vision/tree/master/visionSamples/googly-eyes
Quick, Draw! Google classification API for sketches
https://quickdraw.withgoogle.com/
Making Android Apps With Intelligence, by Margaret Maynard-
Reid https://realm.io/news/360andev-margaret-maynard-reid-
making-android-apps-with-intelligence/ (Video + slides)
@SWebCEO +StephenWylie #MachineLearning
Some machine learning algorithms work better for certain types of problems than others. However, basically all of them start out with a blank slate, or perhaps random data as intermediate values, and then the dataset is analyzed in order to find which attributes of an example outcome provide the most information gain in order to make the correct classification on the training data. Learning happens iteratively over time in a manner that tries to come up with an ever more accurate answer.
Chat bots need machine learning in order to parse the conversation into its grammatical components, and to understand the context of the conversation so it can come up with an appropriate response.
Pepper the robot is used for a variety of emotion-based tasks such as teaching, helping kids with autism stay focused, and training people to make standard responses to events if they are not socially adjusted.
I hate to say this, because my mom works in Accounts Payable, but eventually her job will be automated by a system that can automatically scan an invoice, look for the amount due, and facilitate the payment over a payment network electronically. Document recognition is important for banks too, as statements which prove income for purposes of granting a loan come in many different forms.
HPE Haven – text analysis, speech recognition, image recognition, face detection, recommendation engine, knowledge graph analysis
Amazon – industry-standard ML algorithms on top of your own data sets with listeners to re-evaluate upon new data (kind of like the now-deprecated Google Cloud Prediction API)
Amazon also just rolled out SageMaker, which is a cloud-based service to help users visualize data and select the best ML algorithm, then deploy it at scale for the inference stage
Amazon Rekognition image & video analysis that ties in with the DeepLens camera
Azure APIs – language analytics, face & emotion & explicit content detection, speech recognition, and recommendation APIs
IBM Watson
Facebook – Ported Caffe for several platforms to run light & efficiently
Things above the line are pre-defined models that Google has already tuned for you to give what they feel are the best results. Things below the line are where you have to provide the data and possibly the tuning yourself in order to get the best outcome.
Tensorflow Serving falls in Local because you have to set up the parameters of the infrastructure yourself, whereas Cloud ML Engine will size the model inference environment for you automatically. However, TF Serving can also be run on any hardware you have, not just inside GCP.
The native implementations are brought to you by the Mobile Vision API. The RESTful API which is called Cloud Vision does not feature barcode reading capabilities since that’s something a device should be able to handle on its own natively.
Safe search includes Adult, Medical, Spoof, and Violence.
No Barcode feature because why should you send images over the network just to find out if it’s a barcode when chances are it’s not.
Before we get the response, there’s one more detail we need to form our request: the authentication.
Google provides several ways to allow use of the Machine Learning APIs, and the exact ways you implement them depend on what level of user data you seek to access in your application.
The Cloud Vision API does not require you to pass along OAuth prompts to your users. You can simply make a request by finding your browser API key (usually starting with the letters AIza) and then appending that to your query string. However, this precludes you from using Google’s handy dandy API libraries for your programming language of choice. (Then again, if you’re using ALGOL 68, SmallTalk, or 6502 assembly, maybe it’s your only option.)
To use Google’s libraries, you need to at least set up a Service Account. This type of authentication scheme is usually used for servers communicating to other servers and requires minimal end user intervention. Also, typical controls you might find on accounts in a Google Apps domain do not apply to service accounts, therefore you could be inadvertently granting your users the chance to do something dangerous like share documents and data outside of your domain if you are not careful with the permissions you grant them.
Nevertheless, to save ourselves some time and trouble, we are going to use the basic Service Account approach for the sake of this demo. The instructions on how to create an appropriate service worker are on this workshop’s GitHub repo.
importanceFraction is the “fraction of importance of this salient region with respect to the original image.” Not really sure how they come to that conclusion.
This demo consists of running the Googly Eyes Node.js application provided in my GitHub repo. With some files in my Google Cloud Storage bucket, I could refer to that storage bucket name and the file name to load up files with human faces in them, and then this program would use the parameters returned by the Face Detection API and do some mathematics to superimpose googly eyes on top of the image using HTML5 Canvas.
One would presume the native libraries would be taking advantage of lower-level calls to get to the bare metal of the phone and make the analysis faster.
In the past, Sentiment Analysis was only available for English text. The Changelog https://cloud.google.com/natural-language/release-notes from 11/15/16 indicates that Japanese & Spanish support is available on sentiment analysis, but their How-To guide for “Analyzing Sentiment” indicates only English is supported. What’s interesting is when an English string is scored, then sent through Translate, and then you see totally different sentiment scores.
There are client libraries for C#, Go, Java, and other languages, plus RPC calls, available for interaction with the NL API. For RESTful calls, this is the syntax to use if you wish to just run one particular NL API query at a time.
analyzeEntities gives you phrases in the text that are known entities, such as persons, locations, or organizations, and references the saliency of the entity to the article as well as how many times it was mentioned.
analyzeSentiment gives you sentiment scores for the entire text body, and analyzeEntitySentiment will give you sentiment scores for text associated with each entity and its mentions. More on this in the next slide.
analyzeSyntax gives you all the parts of speech and grammatical considerations for each word.
classifyText will give you an array of potential categories for the text as well as the confidence of that category classification.
In this construct with the “annotateText” endpoint invoked, specify in the “features” object which of the three types of NLP queries you want to run.
The score (formerly known as polarity) is a [-1, 1] scale of the emotional sentiment (tone) of the article, from negative/scathing/blistering to positive/gushing/cheerful.
The magnitude of the article is a scale of how emotional the document comes across. Since each expression in the text contributes to the magnitude, longer articles will have a higher magnitude, regardless of whether the tone of each individual emotional word in the text is positive or negative in sentiment.
The small snippet of JSON here is now contained within an object called “documentSentiment”. A sibling of this object in the returned JSON structure is called “sentences”, which is an array consisting of objects containing the text and offset of each sentence, plus a “sentiment” object containing the individual magnitude and score for that piece.
In this demo, I simply navigated to the Web sandbox at https://cloud.google.com/natural-language/ to run various strings of text through the tool.
Types of models:
- Regression: estimate a numeric answer based on the examples along a continuous curve, akin to a formula that you would solve for
- Categorization: group possible answers into buckets that don’t have a clear distance to other buckets
Here again, we need to discuss authentication, because it’s a little bit stricter for the Prediction API than for Cloud Vision.
Users must be authenticated into the Prediction API in order to use it. We can still use Service Accounts to authenticate ourselves into the app, but you have to specify a few more settings when creating them, as specified in this workshop’s GitHub repository.
However, to push the authentication screen to your end users, go into the API Manager (at console.*developers*.google.com), open up the Credentials page, and create an OAuth Client ID credential. It will guide you through steps based on the type of devices your application is targeting, and then use the client secrets file it creates when you are defining the client object in your application. Depending on the language you are using, the objects you create to authenticate a user through a Web-based service or a standalone application might look different than that used for Service Accounts.
In this example, I took the data set from Problem Set #2 from my old Machine Learning homework from Fall 2010 (at http://users.eecs.northwestern.edu/~ahu340/eecs349-ps2/ ), and loaded in the train.csv file into my Google Cloud Storage Bucket. I had to massage the CSV file into the exact format it was looking for. This includes things like removing all double quotes and changing the outcome of the example to a text value rather than a numeric value so it would classify the outcome as a categorization rather than a regression, since we probably don’t want to force the values of “0” for no and “1” for yes to be the regression we solve for.
After loading in the training data, I took some of the classification examples and pasted them into the Prediction API sample app running locally, downloadable from this workshop’s GitHub repo.
If you want to try this in API Explorer, break out the attributes of each example into JSON in the following format (note if copying the following to watch for unintended fancy double quotes):
{
"input": {
"csvInstance": [
"104042",
"0",
"77",
"m",
"s",
"0",
"9",
"7",
"523",
"428824",
"24",
"115315"
]
}
}
In this example, I took the data set from Problem Set #2 from my old Machine Learning homework from Fall 2010 (at http://users.eecs.northwestern.edu/~ahu340/eecs349-ps2/ ), and loaded in the train.csv file into my Google Cloud Storage Bucket. I had to massage the CSV file into the exact format it was looking for. This includes things like removing all double quotes and changing the outcome of the example to a text value rather than a numeric value so it would classify the outcome as a categorization rather than a regression, since we probably don’t want to force the values of “0” for no and “1” for yes to be the regression we solve for.
After loading in the training data, I took some of the classification examples and pasted them into the Prediction API sample app running locally, downloadable from this workshop’s GitHub repo.
If you want to try this in API Explorer, break out the attributes of each example into JSON in the following format (note if copying the following to watch for unintended fancy double quotes):
{
"input": {
"csvInstance": [
"104042",
"0",
"77",
"m",
"s",
"0",
"9",
"7",
"523",
"428824",
"24",
"115315"
]
}
}
The ReLU (rectified linear unit) function is a popular function to use inside perceptrons.
We will describe the Softmax function in great detail later.
The old way, initialize_all_variables(), should have been removed back in March 2017.
The softmax function is one that attempts to normalize the vector of guesses so that all the outputs sum to exactly 1, thus converting the output of guesses into a probability distribution. It also serves to multiplicatively increase the weight given to any particular outcome per additional unit of evidence, but also reduce that weight with each unit of evidence withheld. Thus, it is often used to represent categorical distributions. After all, we are categorizing pictures of digits as 0 through 9; that doesn’t mean they will exist linearly in space away from each other, or even in the same order, if you were to look at the values of each attribute that makes the system think it’s one particular digit compared to another.
Cross-entropy is used as a function to compute the cost of encoding a particular event drawn from a set. The less information required to make the distinction, the better. In this case, we can also represent how far apart the guess is from the correct answer on the training set. Cross-entropy helps perform learning faster by avoiding slowdowns that could be encountered by other functions like the quadratic cost function, where learning is impaired when it gets close to the correct answer.
Incidentally, the formula for cross-entropy has changed slightly and has been updated in this presentation. It was noted in the comments of the example code that the previous way could become numerically unstable, and so it has been changed to what is listed here. This also simplifies the model because the new function will handle taking the softmax of y.
The first code snippet takes the vectors and compares the first dimension of each element between y and y_. It then generates a new vector equal to the length of y (or y_ for that matter) and each result is a Boolean as to whether the things being compared were equal or not.
The second code snippet converts each Boolean into a 0 or 1, and then takes the average of all the elements in the vector to give you accuracy as a fraction.
The third code snippet actually causes the accuracy to be calculated. Note this used to be print(sess.run(accuracy, feed_dict=…)) and it’s still like that in the code examples, but the way listed here is in the latest documentation and it works fine.
In this demo, I launchTensorFlow through a Docker instance and ran my Python code that runs a previously-generated model to classify one of the images represented in one of the datasets.
If you happened to terminate a previously-running Docker instance, you can reactivate it and get into it again with the following commands on the command line:
ed
docker start `docker ps -q -l`
docker attach `docker ps -q -l`
Note there are several versions of Margaret’s talk available to be found on the Internet, including one from Big Android BBQ 2016 that’s a couple months newer and might have slightly different content (like how this slide deck is different from November’s deck).