Suche senden
Hochladen
[DL輪読会]Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)
•
0 gefällt mir
•
153 views
Deep Learning JP
Folgen
2021/06/04 Deep Learning JP: http://deeplearning.jp/seminar-2/
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 20
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
Empfohlen
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Deep Learning JP
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
Deep Learning JP
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
Deep Learning JP
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
Deep Learning JP
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
Deep Learning JP
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
Deep Learning JP
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
Deep Learning JP
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Deep Learning JP
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Weitere ähnliche Inhalte
Mehr von Deep Learning JP
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
Deep Learning JP
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
Deep Learning JP
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
Deep Learning JP
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
Deep Learning JP
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
Deep Learning JP
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
Deep Learning JP
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
Deep Learning JP
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
Deep Learning JP
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
Deep Learning JP
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
Deep Learning JP
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
Deep Learning JP
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
Deep Learning JP
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
Deep Learning JP
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
Deep Learning JP
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
Deep Learning JP
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
Deep Learning JP
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
Deep Learning JP
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
Deep Learning JP
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Deep Learning JP
Mehr von Deep Learning JP
(20)
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】マルチモーダル 基盤モデル
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】TrOCR: Transformer-based Optical Character Recognition with Pre-traine...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】HyperDiffusion: Generating Implicit Neural Fields withWeight-Space Dif...
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】DINOv2: Learning Robust Visual Features without Supervision
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
【DL輪読会】Poisoning Language Models During Instruction Tuning Instruction Tuning...
Kürzlich hochgeladen
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
apidays
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Kürzlich hochgeladen
(20)
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Architecting Cloud Native Applications
Architecting Cloud Native Applications
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
[DL輪読会]Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)
1.
1 DEEP LEARNING JP [DL
Papers] http://deeplearning.jp/ “Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)” Itsuki Okimura, Matsuo Lab, B4
2.
アジェンダ 1. 書誌情報 2. 概要 3.
問題意識 4. 先⾏研究 5. 提案⼿法 6. 実験結果 7. 議論 8. まとめ 2
3.
1 書誌情報 • 論⽂名:
Are Pre-trained Convolutions Better than Pre-trained Transformers? • 出典: arXiv (https://arxiv.org/abs/2105.03322) • 著者: Yi Tay, Mostafa Dehghani, Jai GuptaらGoogle Researchのチーム • 選んだ理由: 最近活発なTransformerアーキテクチャに対する問題提起 3
4.
2 概要 • Transformerにおけるself-attention層をconvolution層に変更した CNNベースの事前学習モデルを従来の事前学習モデルと⽐較 •
7つの下流タスクで⽐較した結果, CNNベースの事前学習モデルは 従来の事前学習モデルに匹敵する, もしくは上回る性能を発揮すると主張 • また,ランタイム、スケーラビリティの点で従来のTransformer ベースの事前学習に⽐べCNNベースの事前学習に 優位性があることを指摘 • 事前学習とTransformerアーキテクチャは分けて議論すべきと主張 4
5.
3 問題意識 • 近年NLPでは,
BERT, GPT-n, T5といった事前学習済みモデルが 発表されてきた • Transformerをベースにしていない最近の事前学習済みモデルはほとんど存在しない (*) Q: 異なるアーキテクチャの機能バイアスでも同様に事前学習の恩恵を享受できるのか? 5 NLPʹ͓͍ͯࣄલֶशϞσϧͱTransformerΞʔΩςΫνϟಉ຺͡ ͰޠΒΕ͍ͯΔ ʹޮࢉܭ༏Εɺॴہతʹಈ࡞͠ɺ࠶Ͱܕؼͳ͍CNNΛ༻͍࣮ͯݧ
6.
4 先⾏研究 • 各特徴量次元ごとにCNNを適⽤する先⾏研究(Depthwise
convolution)に対し, チャンネルの次元にわたってCNNの重みを共有することで 更にパラメータを削減するLightweight convolution, さらにその拡張として、タイムステップごとにCNNの重みを動的に計算する Dynamic convolutionを提案 • self-attentionを⽤いずに機械翻訳で⾼い精度を⽰すことに成功(WMT En-Deの BLEUスコア当時3位) 6 Pay Less Attention with Lightweight and Dynamic Convolutions (ICLR 2019) https://arxiv.org/pdf/1901.10430.p df
7.
4 先⾏研究 • 各チャンネルごとに独⽴のパラメータで畳み込みを⾏うConvolution 𝐷𝑒𝑝𝑡ℎ𝑤𝑖𝑠𝑒𝐶𝑜𝑛𝑣
𝑋, 𝑊 !,:, 𝑖, 𝑐 = 2 $%& ' 𝑊!,$ 3 𝑋 ()$* ')& + ,! 7 Depthwise convolution https://qiita.com/koreyou/items/3 28fa92a1d3a7e680376#fn4
8.
4 先⾏研究 • チャンネルをH個ごとのグループにわけ、グループごとに共通のパラメータで depthwise
convolutionを⾏う 𝐿𝑖𝑔ℎ𝑡𝑤𝑒𝑖𝑔ℎ𝑡𝐶𝑜𝑛𝑣 𝑋, 𝑊 !, - ,: , 𝑖, 𝑐 = 2 $%& ' 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊 !, - ,$ 3 𝑋 ()$* ')& + ,! 8 Lightweight convolution https://qiita.com/koreyou/items/3 28fa92a1d3a7e680376#fn4
9.
4 先⾏研究 • ⼊⼒された特徴量から、Lightweight
convolutionのパラメータを動的に計算する 𝐷𝑦𝑛𝑎𝑚𝑖𝑐𝐶𝑜𝑛𝑣 𝑋, 𝑖, 𝑐 = 𝐿𝑖𝑔ℎ𝑡𝑤𝑒𝑖𝑔ℎ𝑡𝐶𝑜𝑛𝑣 𝑋, 𝑓 𝑋! ",:, 𝑖, 𝑐 ここで𝑓 𝑋! = ∑%&' ( 𝑋!,% 𝑊",!,% 9 Dynamic convolution https://qiita.com/koreyou/items/3 28fa92a1d3a7e680376#fn4
10.
4 先⾏研究 • 間隔の開いたカーネルから畳み込みを⾏うConvolution 𝐷𝑖𝑙𝑎𝑡𝑒𝑑𝐶𝑜𝑛𝑣
𝑋, 𝑊 %,:, 𝑖, 𝑐 = 9 )&' * 𝑊%,) : 𝑋!+,)- *+' ,% 10 Dilated convolution
11.
5 提案⼿法 • TransformerのQ,
K, Vの変換の代わりにGLU(gated linear unit)層へ self-attention層の代わりにconvolution層へ変更し, seq2seqで事前学習を⾏う • ⽤いるConvolutionはLightweight convolution, Dynamic convolution (それぞれfilter size=7), Dilated convolution (12層のfilter size=[4, 4, 7, 7, 15, 15, 15, 15, 31, 31, 31])のいずれか • トークン単位のクロスエントロピーから損失を最適化 11 CNNΞʔΩςΫνϟͷࣄલֶशϞσϧ
12.
6 実験結果 • T5をベースとした畳み込みモデルとTransformerモデルの両⽅で 事前学習を⾏ったものと⾏わないものを⽤意 •
事前学習にはColossal Cleaned CommonCrawl Corpus(C4)を⽤い, 524kステップ, 128のバッチサイズで学習 • 毒性検出(CIVIL COMMENTS, WIKI TOXIC), センチメント分類(IMDb, SST-2, S140), トピック分類(AGNews), 質問分類(TREC)ら7つのタスクでFine-tuning • 事前学習の有無とそれぞれのモデルの下流タスクでのスコアから 事前学習が与える影響を調査 12
13.
6 実験結果 • 幅広いドメインの7つのタスクにおいて、 (1)事前に学習されていない畳み込みは競争⼒があり、頻繁に事前に学習されてい ないTransformerを上回る (2)事前に学習された畳み込みは7つのタスクのうち6つで事前に学習された Transformerを上回る 13
14.
6 実験結果 • (3)事前に学習した畳み込みモデルの中では,
Dilated convolutionと Dynamic convolutionがLightweight convolutionよりも優れている • (4) 事前学習なしで(相対的に)良い性能を発揮するモデルが事前学習を⾏うと 必ずしも最⾼の性能を発揮するとは限らない 14
15.
7 議論 • 複数の⽂章間の関係をモデル化するタスクが困難 –
⻑距離依存を捉えるself-attentionに相当する機構がないため? (例)SQuAD: パラグラフと質問が与えられ, 正しい回答を⽣成する⽂書読解タスク - 事前学習済みTransformer F1 90% - 事前学習済みCNN F1 70% Multi NLI: 2つの⽂の含意関係を判定するタスク - 事前学習済みTransformer Accuracy 84% - 事前学習済みCNN Accuracy 75% - エンコーダーに2つの⽂のcross-attention層を補強すると83%まで到達 *Dual Encoderにすると良いのではと主張するが、 個別のタスクのためにEncoderのアーキテクチャを変えるのは微妙な気がする 15 ۤखͳλεΫ
16.
7 議論 • self-attentionは系列⻑Nに対し計算量𝑂
𝑁, に対し convolutionは計算量𝑂 𝑁 で済む • convolutionは⼀貫して⾼速であるだけでなく(系列⻑ が短くても), Transformerよりも優れたスケーリングが可能 • FLOPs効率は配列が⻑くなっても悪化しない 16 ྻܥ͕͘ͳֶͬͯश͕͘ͳΒͳ͍
17.
7 議論 - 良い点 -
ランタイムやスケーラビリティなどは優れている - 悪い点 - 複数の相互の⽂章の配列の関係のモデル化が困難 CNNベースのアーキテクチャがTransformerベースのアーキテクチャを 置き換える必要があるという主張するわけではなく, より広い選択肢を持ってアーキテクチャを探索する必要性を提⽰ 事前学習とアーキテクチャは分けて議論すべきと主張 17 ٞͷ·ͱΊ
18.
8 まとめ • Transformerにおけるself-attention層をconvolution層に変更した CNNベースの事前学習モデルを従来の事前学習モデルと⽐較。 •
7つの下流タスクで⽐較した結果, CNNベースの事前学習モデルは 従来の事前学習モデルに匹敵する, もしくは上回る性能を発揮すると主張。 • また,ランタイム、スケーラビリティの点で従来のTransformer ベースの事前学習に⽐べCNNベースの事前学習に 優位性があることを指摘 • 事前学習とTransformerアーキテクチャは分けて議論すべきと主張 18
19.
感想 • classificationには強そうだが, 幅広いタスクだときつそう •
層を増やして, ⼊⼒の系列⻑全てを⾒ることができるとどうなるのか • 複数の⽂章間の関係を捉えるのが苦⼿な割に, 既存研究では要約タスクも割とできているのが不思議 ->既存の事前学習モデルの要約は全体の⽊構造を軽視しているのかも 19
20.
DEEP LEARNING JP [DL
Papers] “Are Pre-trained Convolutions Better than Pre-trained Transformers? (2021)” Istuki Okimura, Matsuo Lab, B4 http://deeplearning.jp/
Jetzt herunterladen