1. The document discusses the history and development of artificial intelligence and machine learning, from early concepts in probability and statistics in the 18th century to modern algorithms and applications.
2. It outlines important early milestones like the McCulloch-Pitts neural network model from 1943 and the Turing Test in 1950. Major algorithms like perceptron and modern frameworks like TensorFlow are also mentioned.
3. The text advocates for applying machine learning to solve real-world business problems by understanding the problem domain, acquiring relevant data, selecting an appropriate algorithm, and iterating through the problem solving process.
5. “man’s dependence on probability was simply a consequence of imperfect
knowledge. A being who could follow every particle in the universe, and who had
unbounded powers of calculation, would be able to know the past and to predict
the future with perfect certainty”
- Philosophical Essay on Probabilities (1825)
7. “Statistics is about gathering data and working out what the numbers can tell us.
From the earliest farmer estimating whether he had enough grain to last the
winter to the scientists of the Large Hadron Collider confirming the probable
existence of new particles, people have always been making inferences from data.
Statistical tools like the mean or average summarise data, and standard
deviations measure how much variation there is within a set of numbers.
Frequency distributions - the patterns within the numbers or the shapes they
make when drawn on a graph - can help predict future events.
Knowing how sure or how uncertain your estimates are a key part of statistics”
- Julian Champkin, Significance Magazine
11. 1957: Perceptron
• M-P model was simple function with multi-dimensional input and
binary output
• Perceptron has two layers of node
• Weights and threshold were not all identical
• Output function goes from [-1,1] not [0,1]
• Adds extra input that represents bias (sometimes called theta)
• Most important, a learning rule, Read more
• It was a machine that could take input and create output
12. 1970s: AI Winter
• 1973: UK Parliament to evaluate the state of AI research in the United
Kingdom
• “Computers have been oversold.. Indeed, it is big business….Continuous
failures occurred in Language translation, image recognition, human
speech, hand written letters, and so on…A robot can only mimic certain
range of human activities…Specialised problems are best treated by
specialised methods rather than generalised intelligence…. The general
purpose robot is a mirage”
– Sir James Lighthill
• The Lighthill report led to the near-complete dismantling of AI research in
England.
• The assessment coupled with slow progress contributed to loss of
confidence and drop in resources for AI research
13.
14. Image source: “You and AI – The History, Capabilities and Frontiers of AI” YouTube
15. Image source: “You and AI – The History, Capabilities and Frontiers of AI” YouTube
23. Automate the analysis
• Manual analysis is tedious
• Bandicoot is an open-source Python toolbox
used to analyze mobile phone metadata
• Bandicoot computes indicators
• Stratify the data between weekday and
weekend or day and night
• Strategy is to generate features to be
processed by algorithm to identify
behaviour, such as, In 2015, a study, titled
"Predicting Gender from Mobile Phone
Metadata“
• Learning algorithms use features for
prediction and clustering tasks – decide
which features can predict what
24. Machine Learning
• Bandicoot generates 1400 indicators. Next question, “Can we do
something useful with these indicators or variables or features?”, such as,
Can mobile phone data answer global development call?
• The SAS institute (2016) defines machine learning as “a method of data
analysis that automates analytical model building. Using algorithms that
iteratively learn from data, machine learning allows computers to find
hidden insights without being explicitly programmed where to look.” There
are two main classes of machine learning algorithms:
• Unsupervised Learning: Infer a function to describe a hidden structure or similarity
of patterns in unlabelled data
• Supervised Learning: Not only provide a set of features (Xi for i = 1,..N) but also set
of labels (Yi for i = 1,…N), where each yi is a label corresponding to Xi. One uses the
pair to learn a function f that can be used to predict unknown target value of some
input vector: y = f(X)
25. What is Learning?
• Can we extract answers to meaningful questions using vast amounts of data:
• How susceptible are they to marketing?
• What is probability of person using our new service
• Which members of community are most at risk in an epidemic outbreak?
• Approach to finding a theorem or law is difficult as they are complex and require
measurements of large data over time
• What is learning? Specifying the model f that can extract regularities for problem
– appropriate objective function to optimize specified loss function
• Learning (or fitting) the model essentially means finding optimal parameters of
the model structure, using provided input or target data. Fitting the model to
perform well on given or seen data (training data)
• However, our primary goal is for the model to perform well on unseen data
26. Problem
• Think of business problems (1-3) un-solved or can be solved better?
Problems that are:
• Complicated
• Requires learning from data
• Sufficiently self-contained
Once knowing that problem fits in ML domain, further two important
questions to answer are:
• Q: Whether right data exists for the problem? Where does it comes
from? Is data feed for machine sufficient to solve the problem?
• Q: Which ML model makes more sense to the problem?
source: HBR
27.
28. Cast the use case (problem)
• As an ML problem:
1. What is being predicted?
2. What data is needed?
• As an Software:
1. What is the API for the problem during prediction?
2. Who will use this service? How are they doing it today?
• As a Data problem:
1. What data are we analyzing?
2. What data are we predicting?
3. What data are we reacting to?
source: Google coursera ML course
29. Journey
1. Understand AI
‐ Short term course
‐ Events
‐ Blogs
2. Follow a master
3. Find a problem
4. Problem fits in ML domain
5. Data Strategy
6. Design Thinking
31. About 70% of the brain's cortical activity is dealing with visually related information, which
is equivalent to the gates of the human brain, others such as hearing, touch, and taste are
narrower channels. Visual is like a high way with eight traffic lanes, and the other feeling
are like sidewalks on both sides. If you cannot deal with visual information, the whole AI
system is an empty shelf. It can only do symbolic reasoning, such as playing chess and
proving theorem, but cannot enter the real world. Computer vision is like a door-opening
spell for AI. The door is inside, and if you fail to open it, there is no way to study AI in the
real world.
- Songchun Zhu, Professor of statistics and computer science of University of
California, Los Angeles
32. First AI Project – Two recommendations
“Not all AI projects are created equal. Some can provide incremental
improvements and are good places to start whereas some provide competitive
advantage”
- Bern Elliot, Gartner
“Build intelligence to solve one business problem. Use the intelligence and
experience for everything else”
- Demis Hassabis, DeepMind
33. AI Startup/Project
Horizontal
₋ Very science driven
₋ Solve one fundamental problem
₋ Serve many industries
₋ Such as NLP
₋ Players: Google, Facebook,
Amazon, Baidu, Microsoft, and
DeepMind
Vertical
₋ Customer segmentation &
targeting
₋ Solve problem of specific
customer
₋ Success depends on
democratized base technology
and strong community around
customers of technology
Ref - https://www.techinasia.com/talk/vertical-horizontal-ai-startup
40. Linear Regression
• Regression – find equation that fits data
• The learning step is function estimation
• Reducing error is gradient descent
• Supervised learning
• Input training data
• Input, x – size of house
• Output, y – price of house
• m – number of training examples
• Build hypothesis (predict y for given x) over input data, h(θ) = θ0 + θ1 x
41.
42. • If α is too small, gradient descent can be slow
• If α is too large, gradient descent can overshoot the
minimum
• Derivative is the slope of cost function J
• Using calculus, somehow, derivative term is :
• Batch, each step uses all training examples
43. Elementary Algebra
• If you recall from elementary algebra,
the equation for a line is y = mx + b
• Alternate to gradient descent is algebra
equation to calculate the min of cost
function
• In order to calculate linear regression,
and find the equation y = a + bx:
• In language of AP Statistics, we may see
equation written as:
• In Machine Learning, it is referred as
hypothesis:
44. Multiple variables
Size (feet2) Number of
bedrooms
Number of floors Age of home (years) Price ($1000)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
n = number of features
x(i) = input (features) of ith training example
xj
(i) = value of feature j in ith training example
Hypothesis:
45. For convenience of notation, define x0 = 1. Therefore, features are a Vector x0, x1, …, xn, that is, Rn+1. Parameters are
also a Vector from θ0, θ1, …, θn, that is, Rn+1.
Hypothesis equation can be written as: h(x) = θ0x0 + θ1x1 + . . . + θnxn = θTx
46.
47.
48. References
• The Lighthill Debate (1973)
• You and AI – The History, Capabilities and Frontiers of AI
• MIT SA+P Big Data and Social Analytics
• An Easy Introduction to Artificial Intelligence, Machine Learning and
Deep Learning
• Scala and Spark for Big Data and Machine Learning
• Machine Learning — Andrew Ng, Stanford University
•
In Europe, 17th Century was important time for quantitative studies of diseases, population, and wealth including the work done by John Graunt
More examples: https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1
Image Source: https://www.pbs.org/newshour/science/short-history-ai-schooling-humans-games
1997, Deep Blue exploited the increasing computing power to perform large scale searches of potential moves – 200 million moves per second
Ref - https://www.youtube.com/watch?v=NRYLPmy8V1k
AI is broad term for field in which human intelligence is simulated in machine. ML is term applied to systems learning from experience (data).