5. Rectified Linear Units
Backpropagation involves repeated multiplication with derivative of activation function
→ Problem if result is always smaller than 1!
21. Data Set
Facebook posts from media organizations:
– CNN, MSNBC, NYTimes, The Guardian, Buzzfeed,
Breitbart, Politico, The Wall Street Journal, Washington
Post, Baltimore Sun
Measure sentiment as “reactions”
22. Title Org Like Love Wow Haha Sad Angry
Poll: Clinton up big on Trump in Virginia CNN 4176 601 17 211 11 83
It's a fact: Trump has tiny hands. Will
this be the one that sinks him?
Guardian 595 17 17 225 2 8
Donald Trump Explains His Obama-
Founded-ISIS Claim as ‘Sarcasm’
NYTimes 2059 32 284 1214 80 2167
Can hipsters stomach the unpalatable
truth about avocado toast?
Guardian 3655 0 396 44 773 69
Tim Kaine skewers Donald Trump's
military policy
MSNBC 1094 111 6 12 2 26
Top 5 Most Antisemitic Things Hillary
Clinton Has Done
Breitbart 1067 7 134 35 22 372
17 Hilarious Tweets About Donald
Trump Explaining Movies
Buzzfeed 11390 375 16 4121 4 5
25. It's a fact: Trump has tiny hands.
(EMBEDDING_DIM=300)
ResNet Block
…
ResNet Block
The Guardian
(1-of-K)
Conv (128) x 10
%
Title + Message
News Org
MaxPooling
Dense
Dense
26. Cherry-picked predicted response
distribution*
Sentence Org Love Haha Wow Sad Angry
Trump wins the election Guardian 3% 9% 7% 32% 49%
Trump wins the election Breitbart 58% 30% 8% 1% 3%
*Your mileage may vary. By a lot. I
mean it.
28. Initialization
● Break symmetry:
– Never ever initialize all your weights to
the same value
● Let initialization depend on activation
function:
– ReLU/PReLU → He Normal
– sigmoid/tanh → Glorot Normal
29. Choose an adaptive optimizer
Source: Alec Radford
Choose an adaptive optimizer
30. Choose the right model size
● Start small and keep adding layers
– Check if test error keeps going down
● Cross-validate over the number of units
● You want to be able to overfit
Y. Bengio (2012) Practical
recommendations for gradient-based
training of deep architectures
31. Don't be scared of overfitting
● If your model can't overfit, it also can't learn enough
● So, check that your model can overfit:
– If not, make it bigger
– If so, get more date and/or regularize
Source: wikipedia
33. Size of data set
● Just get more data already
● Augment data:
– Textual replacements
– Word vector perturbation
– Noise Contrastive Estimation
● Semi-supervised learning:
– Adapt word embeddings to your domain
36. Monitor your model
Training and validation accuracy
– Is there a large gap?
– Does the training accuracy increase
while the validation accuracy
decreases?
41. Hyperparameter optimization
Friends don't let friends do a full grid search!
– Use a smart strategy like Bayesian
optimization or Particle Swarm Optimization
(Spearmint, SMAC, Hyperopt, Optunity)
– Even random search often beats grid search