3 a Consider the selfattention operation SX defined as.pdf

A

3. (a) Consider the self-attention operation S(X) defined as follows AS(X)t,:=1XWqWkTXT= softmax(At,:)XWv Where XRTDin,WvRDinDk,WkRDinDk,WqRDinDk.Din being the input dimensionality and T a sequence (or set) length (b) (15 points) Show that for any permutation matrix P,PS(X)=S(PX) (c) (5 points) We would like to use S(X) and a linear operation to construct a permutation invariant function G(X) that outputs the same feature representation for any ordering of the input sequence (hence, it is invariant to the input's arrangement). What linear operation can we use? Specifically find an example of a vector w such that G(X)=wTS(X) satisfies the condition G(X)=G(PX) for all permutation matrix P. (d) (15 points) Consider a batched ZRBTD that is for example the output of a self-attention layer. Implement a nn.module or python function using functionalize that takes Z and applies a "Position wise feedforward network" layer, H(x)=relu(W1x+ b) which outputs a tensor of size RBTD as commonly used in transformer models. Implement this in two ways (1) using nn. Linear and (2) using nn. Conv1d with padding 0 , stride 1 and kernel size 1 . Validate your implementations match with an assert statement. You may select a random Z with the size B,T,D of your choosing for validating your implementation..

3. (a) Consider the self-attention operation S(X) defined as follows AS(X)t,:=1XWqWkTXT=
softmax(At,:)XWv Where XRTDin,WvRDinDk,WkRDinDk,WqRDinDk.Din being the input
dimensionality and T a sequence (or set) length (b) (15 points) Show that for any permutation
matrix P,PS(X)=S(PX) (c) (5 points) We would like to use S(X) and a linear operation to construct
a permutation invariant function G(X) that outputs the same feature representation for any ordering
of the input sequence (hence, it is invariant to the input's arrangement). What linear operation can
we use? Specifically find an example of a vector w such that G(X)=wTS(X) satisfies the condition
G(X)=G(PX) for all permutation matrix P. (d) (15 points) Consider a batched ZRBTD that is for
example the output of a self-attention layer. Implement a nn.module or python function using
functionalize that takes Z and applies a "Position wise feedforward network" layer, H(x)=relu(W1x+
b) which outputs a tensor of size RBTD as commonly used in transformer models. Implement this
in two ways (1) using nn. Linear and (2) using nn. Conv1d with padding 0 , stride 1 and kernel size
1 . Validate your implementations match with an assert statement. You may select a random Z
with the size B,T,D of your choosing for validating your implementation.

Recomendados

(α ψ)- Construction with q- function for coupled fixed point von
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed pointAlexander Decker
273 views10 Folien
Explanation on Tensorflow example -Deep mnist for expert von
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert홍배 김
7.7K views14 Folien
Dfa h11 von
Dfa h11Dfa h11
Dfa h11Rajendran
683 views43 Folien
Introduction to R programming von
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
981 views58 Folien
Turing machine von
Turing machineTuring machine
Turing machineKanis Fatema Shanta
288 views17 Folien

Más contenido relacionado

Similar a 3 a Consider the selfattention operation SX defined as.pdf

Disjoint sets von
Disjoint setsDisjoint sets
Disjoint setsCore Condor
5K views32 Folien
Code of the multidimensional fractional pseudo-Newton method using recursive ... von
Code of the multidimensional fractional pseudo-Newton method using recursive ...Code of the multidimensional fractional pseudo-Newton method using recursive ...
Code of the multidimensional fractional pseudo-Newton method using recursive ...mathsjournal
5 views7 Folien
Code of the multidimensional fractional pseudo-Newton method using recursive ... von
Code of the multidimensional fractional pseudo-Newton method using recursive ...Code of the multidimensional fractional pseudo-Newton method using recursive ...
Code of the multidimensional fractional pseudo-Newton method using recursive ...mathsjournal
61 views7 Folien
Random Variable von
Random Variable Random Variable
Random Variable Abhishek652999
3 views4 Folien
matlab-basic-functions-reference.pdf von
matlab-basic-functions-reference.pdfmatlab-basic-functions-reference.pdf
matlab-basic-functions-reference.pdfShafqatNabiMughal
19 views4 Folien
matlab-basic-functions-reference.pdf von
matlab-basic-functions-reference.pdfmatlab-basic-functions-reference.pdf
matlab-basic-functions-reference.pdfKennethPhiri5
11 views4 Folien

Similar a 3 a Consider the selfattention operation SX defined as.pdf(20)

Code of the multidimensional fractional pseudo-Newton method using recursive ... von mathsjournal
Code of the multidimensional fractional pseudo-Newton method using recursive ...Code of the multidimensional fractional pseudo-Newton method using recursive ...
Code of the multidimensional fractional pseudo-Newton method using recursive ...
mathsjournal5 views
Code of the multidimensional fractional pseudo-Newton method using recursive ... von mathsjournal
Code of the multidimensional fractional pseudo-Newton method using recursive ...Code of the multidimensional fractional pseudo-Newton method using recursive ...
Code of the multidimensional fractional pseudo-Newton method using recursive ...
mathsjournal61 views
matlab-basic-functions-reference.pdf von KennethPhiri5
matlab-basic-functions-reference.pdfmatlab-basic-functions-reference.pdf
matlab-basic-functions-reference.pdf
KennethPhiri511 views
Code of the Multidimensional Fractional Quasi-Newton Method using Recursive P... von mathsjournal
Code of the Multidimensional Fractional Quasi-Newton Method using Recursive P...Code of the Multidimensional Fractional Quasi-Newton Method using Recursive P...
Code of the Multidimensional Fractional Quasi-Newton Method using Recursive P...
mathsjournal34 views
(DL hacks輪読)Bayesian Neural Network von Masahiro Suzuki
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
Masahiro Suzuki18.1K views
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems von Amr E. Mohamed
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and SystemsDSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
DSP_FOEHU - MATLAB 01 - Discrete Time Signals and Systems
Amr E. Mohamed8.6K views
Fixed points theorem on a pair of random generalized non linear contractions von Alexander Decker
Fixed points theorem on a pair of random generalized non linear contractionsFixed points theorem on a pair of random generalized non linear contractions
Fixed points theorem on a pair of random generalized non linear contractions
Alexander Decker355 views
Recurrent and Recursive Networks (Part 1) von sohaib_alam
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
sohaib_alam645 views
void mysort(vector & v)- Write a function called mysort that rearrange.docx von rtodd575
void mysort(vector & v)- Write a function called mysort that rearrange.docxvoid mysort(vector & v)- Write a function called mysort that rearrange.docx
void mysort(vector & v)- Write a function called mysort that rearrange.docx
rtodd5752 views
statistical computation using R- an intro.. von Kamarudheen KV
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
Kamarudheen KV196 views
Fixed Point Theorm In Probabilistic Analysis von iosrjce
Fixed Point Theorm In Probabilistic AnalysisFixed Point Theorm In Probabilistic Analysis
Fixed Point Theorm In Probabilistic Analysis
iosrjce113 views
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis von Amr E. Mohamed
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier AnalysisDSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
Amr E. Mohamed7.6K views
Application of the Transpositions matrix for obtaining n-dimensional rotation... von AI Publications
Application of the Transpositions matrix for obtaining n-dimensional rotation...Application of the Transpositions matrix for obtaining n-dimensional rotation...
Application of the Transpositions matrix for obtaining n-dimensional rotation...
AI Publications7 views
ch9.pdf von KavS14
ch9.pdfch9.pdf
ch9.pdf
KavS145 views

Más de acutekcorp

3 Consider two random variables X and Y whose relationship .pdf von
3 Consider two random variables X and Y whose relationship .pdf3 Consider two random variables X and Y whose relationship .pdf
3 Consider two random variables X and Y whose relationship .pdfacutekcorp
3 views1 Folie
3 Consider two periodic tasks with computation times C11C.pdf von
3 Consider two periodic tasks with computation times C11C.pdf3 Consider two periodic tasks with computation times C11C.pdf
3 Consider two periodic tasks with computation times C11C.pdfacutekcorp
3 views1 Folie
3 Consider a random variable X with the below probability m.pdf von
3 Consider a random variable X with the below probability m.pdf3 Consider a random variable X with the below probability m.pdf
3 Consider a random variable X with the below probability m.pdfacutekcorp
3 views1 Folie
3 Alice and Bob buy a bag of apples which contains 10 good.pdf von
3 Alice and Bob buy a bag of apples which contains 10 good.pdf3 Alice and Bob buy a bag of apples which contains 10 good.pdf
3 Alice and Bob buy a bag of apples which contains 10 good.pdfacutekcorp
2 views1 Folie
3 Consider a stochastic process Xt governed by the equation.pdf von
3 Consider a stochastic process Xt governed by the equation.pdf3 Consider a stochastic process Xt governed by the equation.pdf
3 Consider a stochastic process Xt governed by the equation.pdfacutekcorp
2 views1 Folie
3 Compute insample forecasts of the monthly return data u.pdf von
3 Compute insample forecasts of the monthly return data u.pdf3 Compute insample forecasts of the monthly return data u.pdf
3 Compute insample forecasts of the monthly return data u.pdfacutekcorp
3 views1 Folie

Más de acutekcorp(20)

3 Consider two random variables X and Y whose relationship .pdf von acutekcorp
3 Consider two random variables X and Y whose relationship .pdf3 Consider two random variables X and Y whose relationship .pdf
3 Consider two random variables X and Y whose relationship .pdf
acutekcorp3 views
3 Consider two periodic tasks with computation times C11C.pdf von acutekcorp
3 Consider two periodic tasks with computation times C11C.pdf3 Consider two periodic tasks with computation times C11C.pdf
3 Consider two periodic tasks with computation times C11C.pdf
acutekcorp3 views
3 Consider a random variable X with the below probability m.pdf von acutekcorp
3 Consider a random variable X with the below probability m.pdf3 Consider a random variable X with the below probability m.pdf
3 Consider a random variable X with the below probability m.pdf
acutekcorp3 views
3 Alice and Bob buy a bag of apples which contains 10 good.pdf von acutekcorp
3 Alice and Bob buy a bag of apples which contains 10 good.pdf3 Alice and Bob buy a bag of apples which contains 10 good.pdf
3 Alice and Bob buy a bag of apples which contains 10 good.pdf
acutekcorp2 views
3 Consider a stochastic process Xt governed by the equation.pdf von acutekcorp
3 Consider a stochastic process Xt governed by the equation.pdf3 Consider a stochastic process Xt governed by the equation.pdf
3 Consider a stochastic process Xt governed by the equation.pdf
acutekcorp2 views
3 Compute insample forecasts of the monthly return data u.pdf von acutekcorp
3 Compute insample forecasts of the monthly return data u.pdf3 Compute insample forecasts of the monthly return data u.pdf
3 Compute insample forecasts of the monthly return data u.pdf
acutekcorp3 views
3 Con base en todo lo que sabemos sobre los mundos terrestr.pdf von acutekcorp
3 Con base en todo lo que sabemos sobre los mundos terrestr.pdf3 Con base en todo lo que sabemos sobre los mundos terrestr.pdf
3 Con base en todo lo que sabemos sobre los mundos terrestr.pdf
acutekcorp2 views
3 Consider a large population with so many subpopulations t.pdf von acutekcorp
3 Consider a large population with so many subpopulations t.pdf3 Consider a large population with so many subpopulations t.pdf
3 Consider a large population with so many subpopulations t.pdf
acutekcorp2 views
3 Calculate the geostrophic wind at an altitude of 95km a.pdf von acutekcorp
3 Calculate the geostrophic wind at an altitude of 95km a.pdf3 Calculate the geostrophic wind at an altitude of 95km a.pdf
3 Calculate the geostrophic wind at an altitude of 95km a.pdf
acutekcorp2 views
3 Calculate the Following and Explain their significance .pdf von acutekcorp
3 Calculate the Following and Explain their significance  .pdf3 Calculate the Following and Explain their significance  .pdf
3 Calculate the Following and Explain their significance .pdf
acutekcorp2 views
3 Based on the tracing of current below measured through a .pdf von acutekcorp
3 Based on the tracing of current below measured through a .pdf3 Based on the tracing of current below measured through a .pdf
3 Based on the tracing of current below measured through a .pdf
acutekcorp3 views
3 Bees Buttermilk Corporation seleccin de proveedores Bas.pdf von acutekcorp
3 Bees Buttermilk Corporation seleccin de proveedores Bas.pdf3 Bees Buttermilk Corporation seleccin de proveedores Bas.pdf
3 Bees Buttermilk Corporation seleccin de proveedores Bas.pdf
acutekcorp6 views
3 An astronomer takes 100 independent measurements X1X2.pdf von acutekcorp
3 An astronomer takes 100 independent measurements X1X2.pdf3 An astronomer takes 100 independent measurements X1X2.pdf
3 An astronomer takes 100 independent measurements X1X2.pdf
acutekcorp3 views
3 An application of the sampling distribution of the sample.pdf von acutekcorp
3 An application of the sampling distribution of the sample.pdf3 An application of the sampling distribution of the sample.pdf
3 An application of the sampling distribution of the sample.pdf
acutekcorp8 views
3 According to HealthyChildorg 4 Missing two days a month.pdf von acutekcorp
3 According to HealthyChildorg 4 Missing two days a month.pdf3 According to HealthyChildorg 4 Missing two days a month.pdf
3 According to HealthyChildorg 4 Missing two days a month.pdf
acutekcorp2 views
3 Al comentar sobre el desempeo decepcionante de The Great.pdf von acutekcorp
3 Al comentar sobre el desempeo decepcionante de The Great.pdf3 Al comentar sobre el desempeo decepcionante de The Great.pdf
3 Al comentar sobre el desempeo decepcionante de The Great.pdf
acutekcorp3 views
3 According to a company news release during the third qua.pdf von acutekcorp
3 According to a company news release during the third qua.pdf3 According to a company news release during the third qua.pdf
3 According to a company news release during the third qua.pdf
acutekcorp2 views
3 Aadakilerden hangisi azalan marjinal kontrol yasasn tanml.pdf von acutekcorp
3 Aadakilerden hangisi azalan marjinal kontrol yasasn tanml.pdf3 Aadakilerden hangisi azalan marjinal kontrol yasasn tanml.pdf
3 Aadakilerden hangisi azalan marjinal kontrol yasasn tanml.pdf
acutekcorp2 views
3 A perfectly competitive firm has the following total cost.pdf von acutekcorp
3 A perfectly competitive firm has the following total cost.pdf3 A perfectly competitive firm has the following total cost.pdf
3 A perfectly competitive firm has the following total cost.pdf
acutekcorp2 views
3 A recent study done on college students found that the me.pdf von acutekcorp
3 A recent study done on college students found that the me.pdf3 A recent study done on college students found that the me.pdf
3 A recent study done on college students found that the me.pdf
acutekcorp2 views

Último

Volf work.pdf von
Volf work.pdfVolf work.pdf
Volf work.pdfMariaKenney3
75 views43 Folien
Meet the Bible von
Meet the BibleMeet the Bible
Meet the BibleSteve Thomason
76 views80 Folien
MercerJesse3.0.pdf von
MercerJesse3.0.pdfMercerJesse3.0.pdf
MercerJesse3.0.pdfjessemercerail
92 views6 Folien
MIXING OF PHARMACEUTICALS.pptx von
MIXING OF PHARMACEUTICALS.pptxMIXING OF PHARMACEUTICALS.pptx
MIXING OF PHARMACEUTICALS.pptxAnupkumar Sharma
117 views35 Folien
Career Building in AI - Technologies, Trends and Opportunities von
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and OpportunitiesWebStackAcademy
41 views44 Folien
Mineral nutrition and Fertilizer use of Cashew von
 Mineral nutrition and Fertilizer use of Cashew Mineral nutrition and Fertilizer use of Cashew
Mineral nutrition and Fertilizer use of CashewAruna Srikantha Jayawardana
53 views107 Folien

Último(20)

Career Building in AI - Technologies, Trends and Opportunities von WebStackAcademy
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and Opportunities
WebStackAcademy41 views
Six Sigma Concept by Sahil Srivastava.pptx von Sahil Srivastava
Six Sigma Concept by Sahil Srivastava.pptxSix Sigma Concept by Sahil Srivastava.pptx
Six Sigma Concept by Sahil Srivastava.pptx
Sahil Srivastava40 views
When Sex Gets Complicated: Porn, Affairs, & Cybersex von Marlene Maheu
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & Cybersex
Marlene Maheu108 views
11.28.23 Social Capital and Social Exclusion.pptx von mary850239
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptx
mary850239409 views
EILO EXCURSION PROGRAMME 2023 von info33492
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023
info33492181 views
Nelson_RecordStore.pdf von BrynNelson5
Nelson_RecordStore.pdfNelson_RecordStore.pdf
Nelson_RecordStore.pdf
BrynNelson546 views
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant... von Ms. Pooja Bhandare
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Ms. Pooja Bhandare194 views
Monthly Information Session for MV Asterix (November) von Esquimalt MFRC
Monthly Information Session for MV Asterix (November)Monthly Information Session for MV Asterix (November)
Monthly Information Session for MV Asterix (November)
Esquimalt MFRC98 views
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx von Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan38 views
Education of marginalized and socially disadvantages segments.pptx von GarimaBhati5
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptx
GarimaBhati540 views
Parts of Speech (1).pptx von mhkpreet001
Parts of Speech (1).pptxParts of Speech (1).pptx
Parts of Speech (1).pptx
mhkpreet00143 views

3 a Consider the selfattention operation SX defined as.pdf

  • 1. 3. (a) Consider the self-attention operation S(X) defined as follows AS(X)t,:=1XWqWkTXT= softmax(At,:)XWv Where XRTDin,WvRDinDk,WkRDinDk,WqRDinDk.Din being the input dimensionality and T a sequence (or set) length (b) (15 points) Show that for any permutation matrix P,PS(X)=S(PX) (c) (5 points) We would like to use S(X) and a linear operation to construct a permutation invariant function G(X) that outputs the same feature representation for any ordering of the input sequence (hence, it is invariant to the input's arrangement). What linear operation can we use? Specifically find an example of a vector w such that G(X)=wTS(X) satisfies the condition G(X)=G(PX) for all permutation matrix P. (d) (15 points) Consider a batched ZRBTD that is for example the output of a self-attention layer. Implement a nn.module or python function using functionalize that takes Z and applies a "Position wise feedforward network" layer, H(x)=relu(W1x+ b) which outputs a tensor of size RBTD as commonly used in transformer models. Implement this in two ways (1) using nn. Linear and (2) using nn. Conv1d with padding 0 , stride 1 and kernel size 1 . Validate your implementations match with an assert statement. You may select a random Z with the size B,T,D of your choosing for validating your implementation.