SlideShare ist ein Scribd-Unternehmen logo
1 von 18
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
InfoBERT: Improving Robustness of Language Models from
An InformationTheoretic Perspective
Kazutoshi Shinoda, Aizawa Lab
書誌情報
• InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective
• 著者: Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li,
Jingjing Liu
• 所属: University of Illinois at Urbana-Champaign, Microsoft Dynamics 365
AI Research, Virginia Tech
• ICLR 2021
2
概要
• 貢献
– 言語モデルの敵対的頑健性を向上するために情報理論的な観点からInfoBERT
を提案する
– 理論的な裏付けをとりつつ、多様なNLPのタスクでstandard trainingにも
adversarial trainingにも適応可能な、相互情報量(MI)に基づいた2つの
regularizerによって局所的・大域的な特徴を改良する
– InfoBERTが benign datasetでの精度を犠牲にせずに、NLIとQAの複数の
adversarial datasetでSOTAを達成
3
※Standard training: 元の訓練データのみで訓練
Adversarial training: 元の訓練データに敵対的事例を含めたもので訓練
具体例
4
• QAの敵対的攻撃 (Jia and Liang, 2017)
前提
• ここで扱う敵対的事例は以下
5
InfoBERT
• Propose two regularizers
– Information Bottleneck (IB) as a Regularizer
– Anchored Feature Regularizer
6
Information Bottleneck as a Regularizer
• 入力文: X, 単語埋め込みの系列: T, 出力ラベル: Y, BERT: q(y|t)
7
MIの下限 (Barber and Agakov, 2003)
qは変分近似
この右辺の最大化は
タスクのロスH(Y|T)の
最小化と同じ
MIの計算はintractable
なので、
IBの下限を計算する
ために、第2項の上限
を求める
MIの上限(Cheng et al., 2020)
IBの下限
Information Bottleneck as a Regularizer
• 文の長さが増えるとI(X; T)の計算量が多くなりすぎるので、IBの下
限LIBを代わりに最大化
8
タスクのロス
Xのノイズによる
Tの複雑さを抑える
なぜIB最大化がAdversarial Robustnessにつながるか?
• Benign sentence Xがadversarial sentence X’になったときの性能
の差は以下のように上から押さえられる。
9
IBの最大化はここに効いている
こっちはadversarial trainingをIBの最大化と組み合わせると
さらにadversarial robustnessの向上に繋がることを示している
Anchored Feature Regularizer
気持ち
• タスクにとって有用でかつ頑健な情報を持つ特徴を抽出したい
• そのためにまず有用でないか頑健ではない局所的な特徴を探してそれらを除きたい
頑健ではない=単語レベルで攻撃が成功するようなもの(notなど?)
有用ではない=変更しても精度への影響がないもの(ストップワードなど)
10
単語レベルで、摂動を加えた時の性能の変化に与える影響が
(1)大きすぎるものと(2)小さすぎるものを除くと、
タスクにとってそこそこ使えてかつ頑健な特徴が残る
Anchored Feature Regularizer
• そこそこ使えてかつ頑健な単語Tiと大域的な特徴Z([CLS] embedding)の相互
情報長を最大化する
• 最終的な目的関数は以下
• 最後のMIの項は下限のInfoNCE (van den Oord et al., 2018)を代わりに最大化
11
実験設定
• データセット
– Adversarial NLI (ANLI) (Nie
et al., 2020)
– Adversarial SQuAD (Jia and
Liang, 2017)
– TextFooler (Jin et al., 2020)
12
• モデル
– BERT-largeとRoBERTa-largeに
InfoBERTを適用
• ベースライン
– FreeLB (Zhu et al., 2020)
– SMART (Jiang et al., 2020)
– ALUM (Liu et al., 2020)
結果
13
• vanilla RoBERTa and BERTが頑健ではない
• AdversarialTrainingが有効
• InfoBERTはstandardでもadvserarial trainingでも頑健性を向上
結果
14
訓練データを変えて実験
• ANLIでSOTA
結果
15
TextFoolerによって生成したadversarial examplesへの頑健性
• Benign datasetへの影響もほとんどない
• Theorem 3.2で理論的に示したInfoBERT+Adversarial trainingの併用の有効性も実験的に示した
• BERT-largeの精度が0%だったのが大幅に向上
結果
16
QAでも実験
• 大体InfoBERTが強い
• Adversarial training + FreeLBよりもStandard training + InfoBERTの方が頑健
アブレーション
17
2つのregularizerの有効性が確認された
まとめ
• 理論的な裏付けもとりつつ2つのregularizerを追加するInforBERTと
いう訓練方法を提案して、benign datasetの精度をあまり犠牲にせず
にNLIとQAの複数のデータセットでadversarial robustnessを向上さ
せた
• Adversarial trainingとInfoBERTの併用も理論的・実験的に効果的な
ことを示した
18

Weitere ähnliche Inhalte

Mehr von Deep Learning JP

【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP
 

Mehr von Deep Learning JP (20)

【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
 

【DL輪読会】InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective