SlideShare a Scribd company logo
1 of 16
Kyonggi Univ. AI Lab.
RETHINKING ATTENTION WITH PERFORMERS
2021.1.4
정규열
Artificial Intelligence Lab
Kyonggi Univiersity
Kyonggi Univ. AI Lab.
Index
 도입 배경
 FAVOR
 EXPERIMENTS
 결론
Kyonggi Univ. AI Lab.
도입 배경
Kyonggi Univ. AI Lab.
도입 배경
 Transformer에 사용되는 Attention기능의 연산량이 상당하다.
 과도한 연산량으로 인해 효율성이 저하된다.
 이에 연산량을 줄이는 방법이 필요하다.
 FAVOR를 도입함.
 우선적으로 Attention의 연산량을 줄인다.
 이에 새로운 Kernel 기법을 제안함(softmax 역할)
Kyonggi Univ. AI Lab.
도입 배경
 시간 복잡도 개선 구조
기존 제안
Kyonggi Univ. AI Lab.
FAVOR
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 일반적인 Attention
𝑄 =
𝑞11
𝑞21
𝑞31
.
.
𝑞𝐿1
𝑞12
𝑞22
𝑞32
.
.
𝑞𝐿2
𝑞13
𝑞23
𝑞33
.
.
𝑞𝐿3
…
𝑞1𝑑
𝑞2𝑑
𝑞3𝑑
.
.
𝑞𝐿𝑑
𝐾 =
𝑘11
𝑘21
𝑘31
.
.
𝑘𝐿1
𝑘12
𝑘22
𝑘32
.
.
𝑘𝐿2
𝑘13
𝑘23
𝑘33
.
.
𝑘𝐿3
…
𝑘1𝑑
𝑘2𝑑
𝑘3𝑑
.
.
𝑘𝐿𝑑
L x d L x d
𝐾𝑇 =
𝑘11
𝑘12
𝑘13
.
.
𝑘1𝑑
𝑘21
𝑘22
𝑘23
.
.
𝑘2𝑑
𝑘31
𝑘32
𝑘33
.
.
𝑘3𝑑
…
𝑘𝐿1
𝑘𝐿2
𝑘𝐿3
.
.
𝑘𝐿𝑑
d x L
𝑸𝑲𝑻
= 𝑳 × 𝒅 × (d × 𝑳 ) = 𝑳 × 𝑳
시간 복잡도 : 𝑶(𝑳𝟐𝒅)
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 시간 복잡도 개선하기 – Trick!
 일반적인 Attention -> 𝑨 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒌(𝒒, 𝒌)
 제안한 방법 -> 𝑨 = 𝑲𝒆𝒓𝒏𝒆𝒍(𝑸, 𝑲)
𝑲𝒆𝒓𝒏𝒆𝒍 𝑸, 𝑲 = 𝑬[∅ 𝑸 𝑻∅(𝑲)]
∅: mapping (d -> r)
Q → L X d
𝑄𝑇
→ d X L
∅(𝑄𝑇) → r X L
∅(𝑄𝑇)𝑇 → L X r
𝑸′ = ∅(𝑸𝑻)𝑻
Attention = Kernel(Q, K) V
= 𝑸′
(𝑲′
)𝑻
V
= 𝑸′ ((𝑲′)𝑻 V)
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 Softmax의 역할을 하는 kernel (sin-cos)
Softmax kernel
이 방법은 분산이 매우 커짐
• Softmax의 경우 결과값이 항상 양수로 나온다.
• 그러나 위 방법은 음수 범위까지 나오게 된다.
• 따라서 안정적인 수렴이 어렵다.
Kyonggi Univ. AI Lab.
FAVOR - Attention의 개선
 제안하는 Kernel 기법 – Positive
분산이 작아지며 안정적인 수렴이 용이 하도록 하였다.
Kyonggi Univ. AI Lab.
EXPERIMENTS
Kyonggi Univ. AI Lab.
EXPERIMENTS
 연산 속도 비교
순전파 역전파
Transformer에 비하여 연산 속도가 빠름을 알 수 있다.
Kyonggi Univ. AI Lab.
EXPERIMENTS
 커널 방법 차이에 따른 정확성 비교
Positive 기법이 안정적임을 확인 할 수 있다.
Kyonggi Univ. AI Lab.
EXPERIMENTS
 기존 Transformer와 정확성 비교
기존 Transformer와 비교하여 정확성에서도 우수하며 수렴 속도도 빠르다
Kyonggi Univ. AI Lab.
결론
Kyonggi Univ. AI Lab.
결론
 기존의 Transformer의 연산량을 줄이려고 함.
 결국 Attention 과정을 수정해야 함.
 Trick을 사용하여 연산량을 줄였다.
 이럴 경우 기존의 softmax 함수를 사용 할 수 없다.
 Softmax 와 비슷한 역할을 할 수 있는 Kernel기법을 제안함
 단 sin-cos 방법보다 positive 방법이 우수함
 연산량 및 정확성에서 기존 Transformer보다 우수하다.

More Related Content

What's hot

IEEE/RSJ IROS 2008 Real-time Tracker
IEEE/RSJ IROS 2008 Real-time TrackerIEEE/RSJ IROS 2008 Real-time Tracker
IEEE/RSJ IROS 2008 Real-time Trackerc.choi
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingwolf
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...Komei Sugiura
 
Recurrent_environment_simulators
Recurrent_environment_simulatorsRecurrent_environment_simulators
Recurrent_environment_simulatorsTomoki Minote
 

What's hot (7)

Robot Planing Article Overview
Robot Planing Article OverviewRobot Planing Article Overview
Robot Planing Article Overview
 
Thesis
ThesisThesis
Thesis
 
IEEE/RSJ IROS 2008 Real-time Tracker
IEEE/RSJ IROS 2008 Real-time TrackerIEEE/RSJ IROS 2008 Real-time Tracker
IEEE/RSJ IROS 2008 Real-time Tracker
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
SuMo-SS: Submodular Optimization Sensor Scattering for Deploying Sensor Netwo...
 
Recurrent_environment_simulators
Recurrent_environment_simulatorsRecurrent_environment_simulators
Recurrent_environment_simulators
 

Similar to Rethinking attention with performers

TRPO(trust region policy optimization)
TRPO(trust region policy optimization)TRPO(trust region policy optimization)
TRPO(trust region policy optimization)KyuYeolJung
 
Sparse Representations for Packetized Predictive Networked Control
Sparse Representations for Packetized Predictive Networked ControlSparse Representations for Packetized Predictive Networked Control
Sparse Representations for Packetized Predictive Networked ControlMasaaki Nagahara
 
Compressed Sensing using Generative Model
Compressed Sensing using Generative ModelCompressed Sensing using Generative Model
Compressed Sensing using Generative Modelkenluck2001
 
Ant Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its ApplicationsAnt Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its Applicationsadil raja
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationOlivier Teytaud
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsCheng-An Yang
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...sky chang
 
45 years in cm (slide share2013)
45 years in cm  (slide share2013)45 years in cm  (slide share2013)
45 years in cm (slide share2013)Ray Beebe
 
Loop Fusion for Memory Space Optimization
Loop Fusion for Memory Space OptimizationLoop Fusion for Memory Space Optimization
Loop Fusion for Memory Space Optimizationtmusabbir
 
euclides-c mthesis
euclides-c mthesiseuclides-c mthesis
euclides-c mthesisinet-lab
 
PR-252: Making Convolutional Networks Shift-Invariant Again
PR-252: Making Convolutional Networks Shift-Invariant AgainPR-252: Making Convolutional Networks Shift-Invariant Again
PR-252: Making Convolutional Networks Shift-Invariant AgainHyeongmin Lee
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forestJaey Jeong
 
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...JDIngram
 
The Action Against Soft-Errors to Prevent Service Outage
The Action Against Soft-Errors to Prevent Service OutageThe Action Against Soft-Errors to Prevent Service Outage
The Action Against Soft-Errors to Prevent Service OutageQuEST Forum
 
A tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryA tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryIstvan Rath
 
Building Robust Pipelines with Airflow | Wrangle Conference 2017
Building Robust Pipelines with Airflow | Wrangle Conference 2017Building Robust Pipelines with Airflow | Wrangle Conference 2017
Building Robust Pipelines with Airflow | Wrangle Conference 2017Cloudera, Inc.
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with AirflowErin Shellman
 

Similar to Rethinking attention with performers (20)

TRPO(trust region policy optimization)
TRPO(trust region policy optimization)TRPO(trust region policy optimization)
TRPO(trust region policy optimization)
 
Sparse Representations for Packetized Predictive Networked Control
Sparse Representations for Packetized Predictive Networked ControlSparse Representations for Packetized Predictive Networked Control
Sparse Representations for Packetized Predictive Networked Control
 
Compressed Sensing using Generative Model
Compressed Sensing using Generative ModelCompressed Sensing using Generative Model
Compressed Sensing using Generative Model
 
Ant Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its ApplicationsAnt Colony Optimization: The Algorithm and Its Applications
Ant Colony Optimization: The Algorithm and Its Applications
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Bias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimizationBias and Variance in Continuous EDA: massively parallel continuous optimization
Bias and Variance in Continuous EDA: massively parallel continuous optimization
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
 
Yacf
YacfYacf
Yacf
 
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
Feature Selection using Complementary Particle Swarm Optimization for DNA Mic...
 
45 years in cm (slide share2013)
45 years in cm  (slide share2013)45 years in cm  (slide share2013)
45 years in cm (slide share2013)
 
Loop Fusion for Memory Space Optimization
Loop Fusion for Memory Space OptimizationLoop Fusion for Memory Space Optimization
Loop Fusion for Memory Space Optimization
 
euclides-c mthesis
euclides-c mthesiseuclides-c mthesis
euclides-c mthesis
 
PR-252: Making Convolutional Networks Shift-Invariant Again
PR-252: Making Convolutional Networks Shift-Invariant AgainPR-252: Making Convolutional Networks Shift-Invariant Again
PR-252: Making Convolutional Networks Shift-Invariant Again
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
I. Henderson, J. Ingram, D. Poulcharidis - Advanced Topics in Chemical Biolog...
 
The Action Against Soft-Errors to Prevent Service Outage
The Action Against Soft-Errors to Prevent Service OutageThe Action Against Soft-Errors to Prevent Service Outage
The Action Against Soft-Errors to Prevent Service Outage
 
A tutorial on EMF-IncQuery
A tutorial on EMF-IncQueryA tutorial on EMF-IncQuery
A tutorial on EMF-IncQuery
 
Sc11 presentation 2001_06_28
Sc11 presentation 2001_06_28Sc11 presentation 2001_06_28
Sc11 presentation 2001_06_28
 
Building Robust Pipelines with Airflow | Wrangle Conference 2017
Building Robust Pipelines with Airflow | Wrangle Conference 2017Building Robust Pipelines with Airflow | Wrangle Conference 2017
Building Robust Pipelines with Airflow | Wrangle Conference 2017
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 

More from KyuYeolJung

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안KyuYeolJung
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generativeKyuYeolJung
 
MARL based on role
MARL based on roleMARL based on role
MARL based on roleKyuYeolJung
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generationKyuYeolJung
 
Language gans falling short
Language gans falling shortLanguage gans falling short
Language gans falling shortKyuYeolJung
 
COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)KyuYeolJung
 

More from KyuYeolJung (7)

Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
Cpgan content-parsing generative
Cpgan   content-parsing generativeCpgan   content-parsing generative
Cpgan content-parsing generative
 
MARL based on role
MARL based on roleMARL based on role
MARL based on role
 
Style gan
Style ganStyle gan
Style gan
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generation
 
Language gans falling short
Language gans falling shortLanguage gans falling short
Language gans falling short
 
COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)COMA(counterfactual multi-agent policy gradients)
COMA(counterfactual multi-agent policy gradients)
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Rethinking attention with performers

  • 1. Kyonggi Univ. AI Lab. RETHINKING ATTENTION WITH PERFORMERS 2021.1.4 정규열 Artificial Intelligence Lab Kyonggi Univiersity
  • 2. Kyonggi Univ. AI Lab. Index  도입 배경  FAVOR  EXPERIMENTS  결론
  • 3. Kyonggi Univ. AI Lab. 도입 배경
  • 4. Kyonggi Univ. AI Lab. 도입 배경  Transformer에 사용되는 Attention기능의 연산량이 상당하다.  과도한 연산량으로 인해 효율성이 저하된다.  이에 연산량을 줄이는 방법이 필요하다.  FAVOR를 도입함.  우선적으로 Attention의 연산량을 줄인다.  이에 새로운 Kernel 기법을 제안함(softmax 역할)
  • 5. Kyonggi Univ. AI Lab. 도입 배경  시간 복잡도 개선 구조 기존 제안
  • 6. Kyonggi Univ. AI Lab. FAVOR
  • 7. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  일반적인 Attention 𝑄 = 𝑞11 𝑞21 𝑞31 . . 𝑞𝐿1 𝑞12 𝑞22 𝑞32 . . 𝑞𝐿2 𝑞13 𝑞23 𝑞33 . . 𝑞𝐿3 … 𝑞1𝑑 𝑞2𝑑 𝑞3𝑑 . . 𝑞𝐿𝑑 𝐾 = 𝑘11 𝑘21 𝑘31 . . 𝑘𝐿1 𝑘12 𝑘22 𝑘32 . . 𝑘𝐿2 𝑘13 𝑘23 𝑘33 . . 𝑘𝐿3 … 𝑘1𝑑 𝑘2𝑑 𝑘3𝑑 . . 𝑘𝐿𝑑 L x d L x d 𝐾𝑇 = 𝑘11 𝑘12 𝑘13 . . 𝑘1𝑑 𝑘21 𝑘22 𝑘23 . . 𝑘2𝑑 𝑘31 𝑘32 𝑘33 . . 𝑘3𝑑 … 𝑘𝐿1 𝑘𝐿2 𝑘𝐿3 . . 𝑘𝐿𝑑 d x L 𝑸𝑲𝑻 = 𝑳 × 𝒅 × (d × 𝑳 ) = 𝑳 × 𝑳 시간 복잡도 : 𝑶(𝑳𝟐𝒅)
  • 8. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  시간 복잡도 개선하기 – Trick!  일반적인 Attention -> 𝑨 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒌(𝒒, 𝒌)  제안한 방법 -> 𝑨 = 𝑲𝒆𝒓𝒏𝒆𝒍(𝑸, 𝑲) 𝑲𝒆𝒓𝒏𝒆𝒍 𝑸, 𝑲 = 𝑬[∅ 𝑸 𝑻∅(𝑲)] ∅: mapping (d -> r) Q → L X d 𝑄𝑇 → d X L ∅(𝑄𝑇) → r X L ∅(𝑄𝑇)𝑇 → L X r 𝑸′ = ∅(𝑸𝑻)𝑻 Attention = Kernel(Q, K) V = 𝑸′ (𝑲′ )𝑻 V = 𝑸′ ((𝑲′)𝑻 V)
  • 9. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  Softmax의 역할을 하는 kernel (sin-cos) Softmax kernel 이 방법은 분산이 매우 커짐 • Softmax의 경우 결과값이 항상 양수로 나온다. • 그러나 위 방법은 음수 범위까지 나오게 된다. • 따라서 안정적인 수렴이 어렵다.
  • 10. Kyonggi Univ. AI Lab. FAVOR - Attention의 개선  제안하는 Kernel 기법 – Positive 분산이 작아지며 안정적인 수렴이 용이 하도록 하였다.
  • 11. Kyonggi Univ. AI Lab. EXPERIMENTS
  • 12. Kyonggi Univ. AI Lab. EXPERIMENTS  연산 속도 비교 순전파 역전파 Transformer에 비하여 연산 속도가 빠름을 알 수 있다.
  • 13. Kyonggi Univ. AI Lab. EXPERIMENTS  커널 방법 차이에 따른 정확성 비교 Positive 기법이 안정적임을 확인 할 수 있다.
  • 14. Kyonggi Univ. AI Lab. EXPERIMENTS  기존 Transformer와 정확성 비교 기존 Transformer와 비교하여 정확성에서도 우수하며 수렴 속도도 빠르다
  • 15. Kyonggi Univ. AI Lab. 결론
  • 16. Kyonggi Univ. AI Lab. 결론  기존의 Transformer의 연산량을 줄이려고 함.  결국 Attention 과정을 수정해야 함.  Trick을 사용하여 연산량을 줄였다.  이럴 경우 기존의 softmax 함수를 사용 할 수 없다.  Softmax 와 비슷한 역할을 할 수 있는 Kernel기법을 제안함  단 sin-cos 방법보다 positive 방법이 우수함  연산량 및 정확성에서 기존 Transformer보다 우수하다.