Improving Language Modeling using Densely Connected Recurrent Neural Networks

•

0 gefällt mir•162 views

fgodin

Poster presented at the Workshop on Representation Learning at ACL 2017

Daten & Analysen

Contact: Frederic.godin@ugent.be – www.fredericgodin.com - @frederic_godin
IMPROVING LANGUAGE MODELING USING
DENSELY CONNECTED RECURRENT NEURAL NETWORKS
IDLAB, GHENT UNIVERSITY - IMEC
Fréderic Godin, Joni Dambre and Wesley De Neve
MOTIVATION
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
Stacked LSTM
200 3 5M 108.8
350 2 9M 87.9
Densely Connected LSTM
200 2 9M 80.4
200 3 11M 78.5
200 4 14M 76.9
EXPERIMENTS
ARCHITECTURE
CONCLUSION
Densely connecting all layers substantially improves language modeling performance
We use six times fewer parameters to obtain the same result as a stacked LSTM
Skip or residual connections are only
sporadically used when stacking LSTMs
RESEARCH QUESTION
What if we if add a skip connection between
every output and every input of every layer
in a recurrent neural network?Densely connecting all layers with skip
connections is very successful in convolution
neural networks
LSTM
LSTM
et
et h1,t
et h1,t h2,t
Fully Conn.
yt
h1,t-1
h2,th2,t-1
h1,t
xt

Empfohlen

Skip, residual and densely connected RNN architecturesfgodin

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...fgodin

AINL 2016: NikolenkoLidia Pivovarova

A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...IJMERJOURNAL

Dcnn for text捷恩蔡

Long Short Term Memory (Neural Networks)Olusola Amusan

Recurrent Neural Networks for Text Analysisodsc

IA3_presentation.pptxKtonNguyn2

Empfohlen

Skip, residual and densely connected RNN architecturesfgodin

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...fgodin

AINL 2016: NikolenkoLidia Pivovarova

A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...IJMERJOURNAL

Dcnn for text捷恩蔡

Long Short Term Memory (Neural Networks)Olusola Amusan

Recurrent Neural Networks for Text Analysisodsc

IA3_presentation.pptxKtonNguyn2

DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay Conference by Xebia

Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi

Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud

PR-043: HyperNetworksTaesu Kim

STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig

Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson

PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionTaesu Kim

Issues in AI product development and practices in audio applicationsTaesu Kim

Video-Language Pre-training based on Transformer ModelsRaghava Urs

Large Scale Distributed Deep NetworksHiroyuki Vincent Yamazaki

Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Codemotion

Resnet.pptxYanhuaSi

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...Hiroki Shimanaka

ResNet.pptxssuser2624f71

Introduction to Multimodal LLMs with LLaVARobert McDermott

Neural netsbucknertroy

Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Fashion AIYogeshIJTSRD

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Week-01-2.ppt BBB human Computer interactionfulawalesam

Weitere ähnliche Inhalte

Ähnlich wie Improving Language Modeling using Densely Connected Recurrent Neural Networks

DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay Conference by Xebia

Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi

Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud

PR-043: HyperNetworksTaesu Kim

STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig

Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson

PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal AttentionTaesu Kim

Issues in AI product development and practices in audio applicationsTaesu Kim

Video-Language Pre-training based on Transformer ModelsRaghava Urs

Large Scale Distributed Deep NetworksHiroyuki Vincent Yamazaki

Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Codemotion

Resnet.pptxYanhuaSi

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...Hiroki Shimanaka

ResNet.pptxssuser2624f71

Introduction to Multimodal LLMs with LLaVARobert McDermott

Neural netsbucknertroy

Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Fashion AIYogeshIJTSRD

Ähnlich wie Improving Language Modeling using Densely Connected Recurrent Neural Networks (20)

DataXDay - The wonders of deep learning: how to leverage it for natural langu...

Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...

Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning

PR-043: HyperNetworks

STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...

Georgia Tech cse6242 - Intro to Deep Learning and DL4J

PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

Issues in AI product development and practices in audio applications

Video-Language Pre-training based on Transformer Models

Large Scale Distributed Deep Networks

Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...

Resnet.pptx

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...

ResNet.pptx

Introduction to Multimodal LLMs with LLaVA

Neural nets

Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)

Fashion AI

Kürzlich hochgeladen

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Week-01-2.ppt BBB human Computer interactionfulawalesam

Data-Analysis for Chicago Crime Data 2023ymrp368

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

April 2024 - Crypto Market Report's Analysismanisha194592

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Halmar dropshipping via API with DroFxolyaivanovalion

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Midocean dropshipping via API with DroFxolyaivanovalion

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Kürzlich hochgeladen (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Week-01-2.ppt BBB human Computer interaction

Data-Analysis for Chicago Crime Data 2023

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

BabyOno dropshipping via API with DroFx.pptx

April 2024 - Crypto Market Report's Analysis

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Edukaciniai dropshipping via API with DroFx

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl

Carero dropshipping via API with DroFx.pptx

Halmar dropshipping via API with DroFx

Mature dropshipping via API with DroFx.pptx

Log Analysis using OSSEC sasoasasasas.pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Midocean dropshipping via API with DroFx

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Improving Language Modeling using Densely Connected Recurrent Neural Networks

1. Contact: Frederic.godin@ugent.be – www.fredericgodin.com - @frederic_godin IMPROVING LANGUAGE MODELING USING DENSELY CONNECTED RECURRENT NEURAL NETWORKS IDLAB, GHENT UNIVERSITY - IMEC Fréderic Godin, Joni Dambre and Wesley De Neve MOTIVATION Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 Stacked LSTM 200 3 5M 108.8 350 2 9M 87.9 Densely Connected LSTM 200 2 9M 80.4 200 3 11M 78.5 200 4 14M 76.9 EXPERIMENTS ARCHITECTURE CONCLUSION Densely connecting all layers substantially improves language modeling performance We use six times fewer parameters to obtain the same result as a stacked LSTM Skip or residual connections are only sporadically used when stacking LSTMs RESEARCH QUESTION What if we if add a skip connection between every output and every input of every layer in a recurrent neural network?Densely connecting all layers with skip connections is very successful in convolution neural networks LSTM LSTM et et h1,t et h1,t h2,t Fully Conn. yt h1,t-1 h2,th2,t-1 h1,t xt