3. AI/ML/DL
● Artificial Intelligence (AI) is a broad field of
study dedicated to complex problem solving.
● Machine Learning (ML) is usually considered
as a subfield of AI. ML is a data-driven
approach focused on creating algorithms that
has the ability to learn from the data without
being explicitly programmed.
● Deep Learning (DL) is a subfield of ML focused
on deep neural networks (NN) able to
automatically learn hierarchical
representations.
9. Human quality is estimated as ~5.1% error rate on this dataset (0.051)
From Lex Fridman slides: https://selfdrivingcars.mit.edu/
Image recognition quality on ImageNet dataset
21. New kid on the block: GAN
https://www.technologyreview.com/lists/technologies/2018/
22. Example: Generating images by GAN
Progressive Growing of GANs for Improved Quality, Stability, and Variation,
https://github.com/tkarras/progressive_growing_of_gans
https://www.youtube.com/watch?v=XOxxPcy5Gr4
23. GAN rapid evolution
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
https://arxiv.org/abs/1802.07228
28. What’s with the Big Picture?
https://www.engadget.com/2018/01/23/photo-stitch-ai-fail-the-big-picture/
29.
30. Still some issues exist: Reasoning
Deep learning is mainly about perception, but there is a lot of inference involved in
everyday human reasoning.
● Neural networks lack common sense
● Cannot find information by inference
● Cannot explain the answer
○ It could be a must-have requirement in
some areas, i.e. law, medicine.
○ GDPR is coming
The most fruitful approach is likely to be a hybrid
neural-symbolic system. Topic of active research
right now.
38. Deep Learning and NLP
Variety of tasks:
● Classification: language detection, genre and topic detection,
positive/negative sentiment analysis, authorship detection, …
● Fact extraction: people and company names, geography, prices, dates,
product names, …
● Language modeling, Part of speech recognition
● Key phrase extraction
● Finding synonyms
● Machine translation
● Search (written and spoken)
● Question answering
● Dialog systems
42. Example: Legal document analyzing / NDA
https://www.prnewswire.com/news-releases/artificial-intelligence-more-accurate-than-lawyers-for-reviewing-contracts-new-study-reveals-300603781.html
“The highest performing lawyer in the
study achieved 94% accuracy -
matching the AI - while the lowest
performing lawyer achieved an average
67% accuracy. The challenge took the
LawGeex AI 26 seconds to complete,
compared to an average of 92 minutes
for the lawyers. The longest time taken
by a lawyer to complete the test was
156 minutes, and the shortest time was
51 minutes.”
43. Example: Legal document analyzing / Privacy policies
https://www.wired.com/story/polisis-ai-reads-privacy-policies-so-you-dont-have-to/
“In about 30 seconds, Polisis can read
a privacy policy it's never seen before
and extract a readable summary,
displayed in a graphic flow chart, of
what kind of data a service collects,
where that data could be sent, and
whether a user can opt out of that
collection or sharing.”
49. Still many problems with chatbots
http://www.eweek.com/big-data-and-analytics/state-of-chatbots-in-2018-rapidly-moving-into-the-mainstream
Key PointSource findings include:
● When AI is present, half of (49 percent) consumers are already willing to
shop more frequently, 34 percent will spend more money and 38 percent will
share their experiences with friends and family.
● 51 percent of consumers still anticipate frustrations around chatbots not
understanding what they’re looking for; 44 percent question the accuracy
of the information chatbots provide.
● More than half (54 percent) of consumers would still prefer to talk to a
customer service representative.
● If a customer is on hold with a customer service rep, 34 percent of customers
want to switch to a chatbot after 5 minutes have passed. However, 59
percent get frustrated if a chatbot doesn’t resolve their inquiry in that same
time.
51. DL/Multi-modal Learning
Deep Learning models become multi-modal: they use 2+ modalities
simultaneously, i.e.:
● Image caption generation: images + text
● Search Web by an image: images + text
● Video describing: the same but added time dimension
● Visual question answering: images + text
● Speech recognition: audio + video (lip motion)
● Image classification and navigation: RGB-D (color + depth)
Will be possible to match different modalities easily.
52. Example: Caption Generation (text by image)
http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
53. Example: NeuralTalk and Walk
Ingredients:
● https://github.com/karpathy/neuraltalk2
Project for learning Multimodal Recurrent Neural Networks that describe
images with sentences
● Webcam/notebook
Result:
● https://vimeo.com/146492001
56. Example: Image generation by text
AttnGAN: Fine-Grained Text to Image Generation with
Attentional Generative Adversarial Networks, https://arxiv.org/abs/1711.10485
57. Example: Code generation by image
pix2code: Generating Code from a Graphical User Interface Screenshot,
https://arxiv.org/abs/1705.07962
58. SketchCode: Go from idea to HTML in 5 seconds
Automated front-end development using deep learning
https://blog.insightdatascience.com/automated-front-end-development-using-deep-learning-3169dd086e82
60. Speech Recognition: Word Error Rate (WER) [2017]
“Google’s speech recognition technology now has a 4.9% word error rate” (2017)
https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
Microsoft “It can now transcribe human speech with a 5.1% error rate”
http://uk.businessinsider.com/microsofts-speech-recognition-5-1-error-rate-human-level-accuracy-2017-8
IBM. “The company has reached a 5.5 percent word error rate that's nearly on par
with humans.”
https://www.engadget.com/2017/03/10/ibm-speech-recognition-accuracy-record/
61. Speech Recognition: Lip Reading
“This lip reading performance beats a professional lip reader on videos from BBC
television, and we also demonstrate that visual information helps to improve
speech recognition performance even when the audio is available.”
Lip Reading Sentences in the Wild, https://arxiv.org/abs/1611.05358
“To the best of our knowledge, LipNet is the first end-to-end sentence-level
lipreading model that simultaneously learns spatiotemporal visual features and a
sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in
sentence-level, overlapped speaker split task, outperforming experienced human
lipreaders and the previous 86.4% word-level state-of-the-art accuracy.“
LipNet: End-to-End Sentence-level Lipreading, https://arxiv.org/abs/1611.01599
62. Case: Amazon Echo
Amazon Alexa is in more than 20 million devices. The vast majority of these are in the
Amazon Echo portfolio.
https://www.voicebot.ai/2017/10/27/bezos-says-20-million-amazon-alexa-devices-sold/
63. Case: Skype Live Translation
Translating voice calls and video calls in 8 languages and instant messages in over 50.
https://www.skype.com/en/features/skype-translator/
64. Case: Google Pixel Buds
Google packed its headphones (in combination with the Pixel 2) with the power to
translate between 40 languages, literally in real-time. The company has finally done
what science fiction and countless Kickstarters have been promising us, but failing
to deliver on, for years. This technology could fundamentally change how we
communicate across the global community.
https://www.engadget.com/2017/10/04/google-pixel-buds-translation-change-the-world/
65. ● “Our approach does not use complex linguistic and acoustic features as input. Instead, we generate
human-like speech from text using neural networks trained using only speech examples and
corresponding text transcripts.”
Speech Synthesis: Tacotron 2 (Google, 2017)
https://research.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
66. ● “Deep Voice 3 introduces a completely novel neural network architecture for speech synthesis. This
novel architecture trains an order of magnitude faster, allowing us to scale over 800 hours of
training data and synthesize speech from over 2,400 voices, which is more than any other
previously published text-to-speech model.”
Speech Synthesis: Deep Voice 3 (Baidu, 2017)
http://research.baidu.com/deep-voice-3-2000-speaker-neural-text-speech/
67. But the same problem with adversarial examples...
Did you hear that? Adversarial Examples Against Automatic Speech Recognition
https://arxiv.org/abs/1801.00554
68. Did you hear that? Adversarial Examples Against Automatic Speech Recognition
https://arxiv.org/abs/1801.00554
71. Car control
Meet the 26-Year-Old Hacker Who Built a
Self-Driving Car... in His Garage
https://www.youtube.com/watch?v=KTrgRYa2wbI
72. Car driving
https://www.youtube.com/watch?v=YuyT2SDcYrU
“Actually a “Perception to Action” system. The visual perception and control
system is a Deep learning architecture trained end to end to transform pixels
from the cameras into steering angles. And this car uses regular color cameras,
not LIDARS like the Google cars. It is watching the driver and learns.”
73. Example: Sensorimotor Deep Learning
“In this project we aim to develop deep learning techniques that can be deployed
on a robot to allow it to learn directly from trial-and-error, where the only
information provided by the teacher is the degree to which it is succeeding at the
current task.”
http://rll.berkeley.edu/deeplearningrobotics/
80. ML in datacenters
“We’ve managed to reduce the amount of energy we use for cooling by up to 40 percent.”
https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/
81. Device Placement with Reinforcement Learning
Device Placement Optimization with Reinforcement Learning
https://arxiv.org/abs/1706.04972
83. Examples
- Improving ML algorithms: Device placement, Architecture search, Optimizer
search, Ensembling, ...
- Optimizing indexes in DB (The Case for Learned Index Structures,
https://arxiv.org/abs/1712.01208)
- Improving datacenter efficiency: optimize cooling, optimize virtual machine
placement, ...
- …
Computer Systems are filled with heuristics that work well “in general case”. But
they generally don’t adapt to actual pattern of usage and don’t take into account
available context.
We can use ML anywhere we’re using heuristics to make a decision!
See Jeff Dean talk at NIPS 2017
http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
84. Examples
Compilers: instruction scheduling, register allocation, loop nest parallelization
strategies, …
Networking: TCP window size decisions, backoff for retransmits, data
compression, ...
Operating systems: process scheduling, buffer cache insertion/replacement, file
system prefetching, …
Job scheduling systems: which tasks/VMs to co-locate on same machine, which
tasks to pre-empt, ...
ASIC design: physical circuit layout, test case selection, …
See Jeff Dean talk at NIPS 2017
http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
85. See Jeff Dean talk at NIPS 2017
http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
87. No dataset — no deep learning
Deep learning requires a lot of data (otherwise simple models could be better).
But sometimes you have no dataset…
Nonetheless several ways available:
● Transfer learning
● Data augmentation
● Mechanical Turk
● Unsupervised pre-training
● moving towards one-shot and zero-shot learning
● …
90. Data & Models vs. Code
The almost same state-of-the-art code is mostly available for all the market.
Currently the real differentiator is a data or trained models (the data derivative
thing). Using a publicly available code/algorithm with unique data it’s possible to
create a better quality model than with the highly-specialized code with public
data.
There is a space for a new type of infrastructure
● Data and algorithm marketplaces
● Model marketplaces and model repositories
● AutoML (already appearing)
● Model management
● Model quality evaluation
● ...
92. Still some issues exist: Computing power
DL requires a lot of computations. Without a cluster or GPU machines
much more time is required.
● Currently GPUs (mostly NVIDIA) is the only choice
● FPGA/ASIC are coming into this field (Google TPU gen.2, Bitmain Sophon,
Intel 2018+). The situation resembles the path of Bitcoin mining
● Neuromorphic computing is on the rise (IBM TrueNorth, Intel, memristors, etc)
● Quantum computing can benefit machine learning as well (but probably it won’t be
a desktop or in-house server solutions)
98. Personal Supercomputers
● NVIDIA DGX-1 Server ($149,000)
Performance: 1000 TFLOPS FP16, 125 TFLOPS FP32
* NVIDIA DGX-2 (16 TESLA V100, 2 PFLOPS FP16) is just announced
● DeepLearning11 ($16,500, contains 10x NVIDIA GeForce GTX 1080 Ti)
Performance: 100 TFLOPS FP32
● NVIDIA GTX Titan V gaming card ($3000) 6.9 TFLOPS FP64 (! it is not usually
reported FP16 performance !)
○ Corresponds to the best supercomputer in the world at 2001–2002 (IBM ASCI
White with 7.226 TFLOPS peak speed) and a supercomputer on 500th place (still
a cool supercomputer) of the TOP500 list in November 2007 (the entry level to the
list was the 5.9 TFlop/s)
● For comparison: Huawei Mate 10 smartphone with Kirin 970 Neural Network
Processing Unit, 1.92 TFLOPS FP16
○ A similar performance (but FP64) had the top performing supercomputer of 1997
https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664
99. AI at the edge
● NVidia Jetson TK1/TX1/TX2
○ 192/256/256 CUDA Cores
○ 64/64/128-bit 4/4/6-Core ARM CPU, 2/4/8 Gb Mem
○ Xavier is coming
● Tablets, Smartphones
○ Qualcomm Snapdragon 845
○ Apple A11 Bionic
○ Huawei Kirin 970
● Raspberry Pi 3 (1.2 GHz 4-core)
● Movidius Neural Compute Stick
100. References:
Hardware for Deep Learning series of posts:
https://blog.inten.to/hardware-for-deep-learning-current-state-and-trends-51c01ebbb6dc
● Part 1: Introduction and Executive summary
● Part 2: CPU
● Part 3: GPU
● Part 4: FPGA
● Part 5: ASIC
● Part 6: Mobile AI
● Part 7: Neuromorphic computing
● Part 8: Quantum computing
103. AI changes the landscape of threats
● Expansion of existing threats
○ The costs of attacks are lowered
■ Set of actors who can carry out attacks expands
■ The rate and scale of attacks can increase
■ The set of potential targets can expand
● Introduction of new threats
○ AI systems can compete tasks that would be otherwise impractical for
humans
○ Exploiting vulnerabilities of AI systems
● Change to the typical character of threats
○ Attacks can be especially effective
○ Finely targeted
○ Difficult to attribute
104. Many other issues exist as well
● Unintentional forms of AI misuse like algorithmic bias
● Indirect threats: mass unemployment, or other second- or third-order effects
from the deployment of AI technology
● System-level threats that would come from the dynamic interaction between
non-malicious actors, e.g. “race to the bottom” on AI safety
● Existential risks from the human-level AI
● Unclear regulation