BERT is a technique for pre-training deep bidirectional representations from unlabeled text by using a Transformer encoder. It can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of natural language processing tasks, including question answering and text classification. The presentation provides an overview of what BERT is, how it works through pre-training and fine-tuning, and example tasks it can be applied to such as sentence classification, question answering, and named entity recognition.
3. Why Use BERT?
3
• Train state-of-the-art models in about 30 minutes1
• Multilingual
• Can be fine-tuned to a wide variety of tasks
1 On a single Cloud TPU
4. What can BERT be used for?
1. Sentence Pair Classification Tasks
2. Single Sentence Classification Tasks
3. Question Answering Tasks
4. Single Sentence Tagging Tasks
4
5. Sentence Pair Classification Tasks
E.g. Determine if two texts are semantically similar:
https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs
5
11. How to use BERT?
Step 1: Pre-training
Predict missing words
Predict sentence relationship
11
12. How to use BERT?
Step 2: Fine-tune to specific NLP task
12
13. References
• Papers
• BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
• Attention Is All You Need
• Code
• https://github.com/google-research/bert
• Video
• TDLS: BERT, Pretranied Deep Bidirectional Transformers for Language Understanding
(algorithm)
• Blog Post
• The Illustrated BERT, ELMo, and co.
13