SlideShare a Scribd company logo
1 of 20
Machine Learning in a
Flash
Kory Becker
June, 2017, http://primaryobjects.com
1
Sponsored by
AI !== Machine Learning
 Logical AI, Symbolic, Knowledge-
based
 Pattern Recognition, Representation
 Inference, Common Sense, Planning
 Heuristics, Ontology, Artificial Life,
Genetic
 Machine Learning, Statistics
2
Machine Learning
Algorithms
Supervised
Linear Regression
Logistic Regression
Support Vector Machines
Neural Networks
Unsupervised
K-means Clustering
Principal Component Analysis (Dimensionality
Reduction)
3
Linear Regression
Logistic Regression
Logistic Regression
Linear Classification
Support Vector Machine
Non-Linear Classification
Support Vector Machine
Gaussian Kernel
Pop Quiz!
Question 1: Supervised
or Unsupervised?
 You are designing an agent for The Matrix.
 It’s task is to classify people that are threats to the system.
 Feature Set:
 Age
 IQ
 Level of Education
 # of Times They Watched the Movie The Matrix
 Training Set of 100,000 people: 50k threats, 50k non-threats
Question 2: Supervised
or Unsupervised?
 You are designing the brain of a battle robot.
 It’s primary attack is hand-to-hand combat. Your task is to
find the most effective move combos.
 Feature Set:
 # of Kicks
 # of Punches
 # of Head-butts
 # of Leg Sweeps
 Training Set of 100,000 winning battles
Natural Language
Processing
Convert text into a numerical representation
Find commonalities within data
Clustering
Make predictions from data
Classification
Category, Popularity, Sentiment,
Relationships
Bag of Words Model
Corpus
Cats like to chase mice.
Dogs like to eat big bones.
Create a Dictionary Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
Digitize Text
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Vector Length = 8
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Classify Documents
(eating)
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
0
1
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
1
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Does it Really Work?
> data
[1] "Cats like to chase mice." "Dogs like to eat big
bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document
Term Matrix
100% Accuracy Training
Test Case
Success! Source code:
https://goo.gl/UxjPBs

More Related Content

More from Kory Becker

More from Kory Becker (11)

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
 
2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation2017 CodeFest Wrap-up Presentation
2017 CodeFest Wrap-up Presentation
 
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
 
Self Programming Artificial Intelligence
Self Programming Artificial IntelligenceSelf Programming Artificial Intelligence
Self Programming Artificial Intelligence
 
IBM Watson Concept Insights
IBM Watson Concept InsightsIBM Watson Concept Insights
IBM Watson Concept Insights
 
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Machine Learning in a Flash: An Introduction to Natural Language Processing

  • 1. Machine Learning in a Flash Kory Becker June, 2017, http://primaryobjects.com 1 Sponsored by
  • 2. AI !== Machine Learning  Logical AI, Symbolic, Knowledge- based  Pattern Recognition, Representation  Inference, Common Sense, Planning  Heuristics, Ontology, Artificial Life, Genetic  Machine Learning, Statistics 2
  • 3. Machine Learning Algorithms Supervised Linear Regression Logistic Regression Support Vector Machines Neural Networks Unsupervised K-means Clustering Principal Component Analysis (Dimensionality Reduction) 3
  • 10. Question 1: Supervised or Unsupervised?  You are designing an agent for The Matrix.  It’s task is to classify people that are threats to the system.  Feature Set:  Age  IQ  Level of Education  # of Times They Watched the Movie The Matrix  Training Set of 100,000 people: 50k threats, 50k non-threats
  • 11. Question 2: Supervised or Unsupervised?  You are designing the brain of a battle robot.  It’s primary attack is hand-to-hand combat. Your task is to find the most effective move combos.  Feature Set:  # of Kicks  # of Punches  # of Head-butts  # of Leg Sweeps  Training Set of 100,000 winning battles
  • 12. Natural Language Processing Convert text into a numerical representation Find commonalities within data Clustering Make predictions from data Classification Category, Popularity, Sentiment, Relationships
  • 13. Bag of Words Model Corpus Cats like to chase mice. Dogs like to eat big bones.
  • 14. Create a Dictionary Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 15. Digitize Text Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Vector Length = 8 Corpus Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 16. Classify Documents (eating) Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 0 1 Corpus Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 17. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 ? Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 18. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 ? Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 19. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 1 Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 20. Does it Really Work? > data [1] "Cats like to chase mice." "Dogs like to eat big bones." > train big bone cat chase dog eat like mice y 1 0 0 1 1 0 0 1 1 0 2 1 1 0 0 1 1 1 0 1 > predict(fit, newdata = train) [1] 0 1 > data2 [1] "Bats eat bugs." > test big bone cat chase dog eat like mice 1 0 0 0 0 0 1 0 0 > predict(fit, newdata = test) [1] 1 Document Term Matrix 100% Accuracy Training Test Case Success! Source code: https://goo.gl/UxjPBs

Editor's Notes

  1. Hi everyone! I'm really excited to be presenting one of my favorite topics: artificial intelligence. Specifically, I thought it would be interesting to present a “crash course” on machine learning, which is a small subset of AI. In this presentation, we’ll go over a handful of really quick machine learning algorithms. We’ll cover the difference between unsupervised and supervised AI, classification, clustering, and a little bit of natural language processing to classify sentences as being about “eating”. Sound like fun? Let’s get started!
  2. I want to briefly start this presentation off by just clearing up some media hype that has been steadily growing over the past year or two, surrounding what AI is and all of the amazing things it's going to do. The news likes to say things like "chatbots are going to take over everyone's jobs, machine learning is changing everything, etc". Not to belittle machine learning, as it truly is an amazing branch of AI that has made significant leaps and bounds in accuracy over the past few years (largely due to massive online datasets, increased computing speed, and deep learning). However, AI is not just machine learning. There is a lot more to it! AI encompasses many different branches. There is logical AI, which deals with representing knowledge as logical sentences. There is Symbolic AI (also called Classical AI), which uses human-readable representations of problems (my STRIPS planning library is an example of this). There is knowledge-based AI like the Cyc database, pattern recognition such as image recognition of cats, dogs, and the CIFAR dataset. There is also AI planning (see http://stripsfiddle.herokuapp.com for an example of this, where I demonstrate AI for solving Starcraft build orders! How cool is that?). There are heuristics like A* Search. But the focus of this presentation will be specifically on “machine learning”.
  3. Machine learning is a statistical based approach to artificial intelligence. It focuses on algorithms that provide a distinct and measurable learning component. This focus on “measurability” is what makes machine learning so appealing, as we can understand whether an algorithm is actually “learning” anything. Machine learning consists of two areas: supervised and unsupervised algorithms. Supervised algorithms largely deal with classification, and include regression, support vector machines, and neural networks, just to name a few. Unsupervised algorithms deal with clustering - finding similarities within large sets of data and grouping accordingly. Examples of unsupervised learning algorithms include kmeans and PCA.
  4. One of the most basic machine learning algorithms is linear regression. In its simplest form, this algorithm can dictate a trend line through data, allowing you to predict values for data when given specific features as input. For example, looking at the chart above, imagine we’re trying to predict home sale prices in your neighborhood. The x-axis represents the square footage of a house, while the y-axis represents the price. The blue dots are houses and the red line is a linear regression. Notice how as the square footage for a house increases, so too does the sale price. The linear regression plot shows this as the red trend line. Now, imagine you have a completely new house with a particular square footage and you want to predict the sale price. You could look at the linear regression plot to guess what a sale price might be, based upon other houses with similar features.
  5. Another basic machine learning algorithm is logistic regression. This is a classification algorithm and is often the de-facto “go to” algorithm when initially trying to classify data. In the above chart, imagine that we have college students that we’re trying to classify as to whether they might be hired at our company. We have a bunch of historical data from college students, including 2 exams, and the result of whether they were hired (the blue circles) or not (the x’s). This could just as easily be cancer diagnosis or any other classification topic, but we’ll go with new hires here. Can we determine whether a student will get hired, based upon these two exam scores? Now, you could probably eyeball the data and see a rough boundary in the data that you might be able to use to classify the students. This is where logistic regression comes in.
  6. Logistic regression divides the data into an optimal decision boundary. As you can see in the above chart, it draws a diagonal line through the data, which separates students that were hired versus those that were not. If you look at the data points on each side of the line, you can see how this separation is pretty good at halving the hires versus non-hires according to their exam scores. So, based on this result, we can probably predict on a completely new student whether that would be hired or not, by plotting their point on the chart from their exam scores, and seeing which side of the boundary they lie upon. This is specifically called “linear classification”, because we’re predicting a yes/no or 0/1 classification for a data point. There is also multi-class classification, which can label data according to any number of classifications (for example, classifying images to a corresponding digit 0-9, types of fruit, categories for a movie, etc.).
  7. Support vector machines are another form of classification, specifically for non-linear classification. The above chart might represent cancer diagnoses. The red x’s represent positives, while the blue circles negatives. Now, we could probably draw a straight line through this data, using logistic regression, and get some degree of accuracy in diagnosing cancer in these patients. However, a better fit (and higher accuracy) could probably be achieved through a support vector machine. In the chart above, you can see how an SVM is able to draw a non-linear classification circle around the group of data. We can then predict on a new patient as to whether they have cancer or not, by seeing if they fall within the classification boundary.
  8. Support vector machines are pretty powerful for classification. You can use different kernel to classify the data in different ways. The above chart shows an example of a Gaussian kernel, which uses a sort-of concentric circle approach to finding the optimal boundary in the classification of data. In this case, all of the white circles will be classified under one topic, while the black circles in the other topic. You can adjust the Gaussian kernel values to shrink or expand the boundary to fine-tune accuracy.
  9. Let’s try a quick quiz.
  10. Supervised or Unsupervised? You are designing an agent for The Matrix. It’s task is to classify people that are threats to the system. The feature set includes: age, IQ, level of education, the number of times they’ve watched the movie “The Matrix”. The training set consists of 100,000 people, divided into 50k threats and 50k non-threats. Answer: Supervised Reason: You can train a classification algorithm, such as logistic regression or a neural network by providing the 4 features as input, with a single output of 0 or 1 – corresponding to threat or non-threat. With an equally split training set, there is a better chance of accuracy.
  11. Supervised or Unsupervised? You are designing the brain of a battle robot. It’s primary attack is hand-to-hand combat. Your task is to find the most effective move combos. The feature set includes: the number of kicks, the number of punches, the number of head-butts, and the number of leg sweeps. The training set consists of 100,000 winning battles and their associated moves. Answer: Unsupervised Reason: Since we’re looking for move combinations (i.e., sets of moves that were used in winning battles), we can use an unsupervised clustering algorithm that can group the data and identify common move patterns. From these clusters, we can identify winning move combinations.
  12. Natural Language Processing The most basic form of natural language processing is to simply convert text into a numerical representation. This gives you an array of numbers. So, each document becomes a same-sized array of numbers. With this, you can apply machine learning algorithms, such as clustering and classification. This allows you to build unique insights into a set of documents, determining characteristics like category, popularity, sentiment, and relationships. This is the same type of processing that many popular online machine learning APIs use to classify data. For example, IBM Watson, Microsoft, Amazon, and Google, all include NLP APIs for working with data.
  13. Bag of Words Model Let’s take a look at a quick example. Here are two documents: “Cats like to chase mice.” and “Dogs like to eat big bones”. We’re going to try to categorize these documents as being about “eating”. To do this, we’ll build a bag-of-words model and then apply a classification algorithm. Now, the first thing to note is that the two documents are of different lengths. If you think about it, most documents will practically always be of different lengths. This is fine, because after we digitize the corpus, you’ll see that the resulting data fits neatly within same-sized vectors.
  14. Create a Dictionary So, the first step is to create a dictionary from our corpus. First, we apply a stemming algorithm on the corpus. This will remove the stop-word “to”. Next, we find each unique term and add it to our dictionary. You can see the resulting list on the right-side of this slide. Our dictionary contains 8 terms.
  15. Digitize Text With our dictionary created, we can now digitize the documents. Since our dictionary has 8 terms, each document will be encoded into a vector of length 8. This ensures that all documents end up having the same length. This makes it easier to process with machine learning algorithms. Let’s look at the first document. We’ll take the first term in the dictionary and see if it exists in the first document. The term is “cats”, which does indeed exist in the first document. Therefore, we’ll set a 1 as the first bit. The next term is “like”. Again, it exists in the first document, so we’ll set a 1 as the next bit. This repeats until we see the term “dogs”. This does not exist in the first document, so we set a “0”. Finally, we run through all terms in the dictionary and end up with a vector of length 8 for the first document. We repeat the same steps for the second document, going through each term in the dictionary and checking if it exists in the document.
  16. Classify Documents (Eating) Once the data is digitized, we can classify the documents with regard to “eating”. Since the first document is about chasing mice, maybe playing, we’ll assign a 0. It doesn’t really have to do with eating. The second document is clearly about eating. So, we’ll assign it a 1. At this point, we can train the data with logistic regression, a neural network, a support vector machine, etc.
  17. Predict on New Data Once our model has finished training, we can try predicting on new data to see if it’s classified correctly. Here you can see we have a new document, “Bats eat bugs.”. This document has never been seen by our machine learning algorithm yet. We want to try and categorize it as being about “eating” or not. We’ll first digitize the document, just like we did with our training corpus. In this case, we only have 1 term found in the dictionary.
  18. Predict on New Data The machine learning algorithm is probably going to find a relationship with this particular bit, highlighted in red above. This bit corresponds to the term “eat”, and is found in the training document that was classified as 1 for the category “eating”. Based on this similarity, our model is probably going to predict our new document as … ?
  19. Predict on New Data So this is the general idea behind natural language processing. Now, we didn’t have to classify just on “eating”. We could have just as easily classified based upon sentiment. In fact, this is a common method for performing sentiment analysis with machine learning. (Another non-machine learning method for sentiment analysis is using the AFINN word-list approach). This was a very basic example of natural language processing. In a real-world case, you could have tens of thousands of documents, with perhaps, multiple classifications. There are also various ways to encode the corpus, such as the count of the term within the sentence, tf*idf, and more.
  20. Does it Really Work? Here is an actual example in R. The code takes the original sentences from this example and builds a document-term-matrix. Notice how the 1’s and 0’s align perfectly with what we’ve seen in the previous slides. The order of the terms is a little different, but otherwise the values are the same. The ‘y’ column is the classification (eating). We train on the data using a generalized linear model, with 100% accuracy. It’s only 2 training cases, so it’s not all that difficult to train. You can see the results of training when we call “predict”. It outputs the same ‘y’ values as the training data. We then run the model on our test sentence, that the AI has never seen before, and call “predict”. It outputs a 1, which is correct, since this sentence is indeed about “eating”. There is a link to the source code in this slide https://goo.gl/UxjPBs for anyone that is curious and wants to try running it.