Attention Is All You Need 中的self-attention 以及multi-head attention

最新推荐文章于 2024-08-05 23:14:30 发布

A霸天下

最新推荐文章于 2024-08-05 23:14:30 发布

阅读量602

点赞数 1

分类专栏：人工智能 attention 文章标签： tensorflow 深度学习

本文链接：https://blog.csdn.net/qq_43534932/article/details/104082747

版权

人工智能同时被 2 个专栏收录

25 篇文章 1 订阅

订阅专栏

attention

4 篇文章 0 订阅

订阅专栏

前言

attention在语音识别方面越来越受大家的欢迎了，无论是soft attention以及hard attention等等都被大家广泛应用，从今天起笔者将基于一篇篇的顶会，来复现各家的attention的算法，今天就分享self-attention。

self-attention

在这里插入图片描述
如图所示文章中的经典图例
所采用的公式

也就是引入了QKV三个值，用这三个值进行一如上公式，进行系列的操作
代码展示：

import tensorflow as tf 
import math
length=50#帧长
input=39#MFCC特征维数
###########输入数据
x = tf.placeholder(tf.float32,[None,length,input])#输入数据


def self_attention(x,hidden_layer,head):
    x=tf.layers.conv1d(x,hidden_layer*3,1,strides=1, padding='same')
    Q,K,V=tf.split(x, 3, axis=2)
    print(Q,K,V)
    K=tf.transpose(K,[0,2,1])
    print(K)
    result=tf.reduce_sum(tf.matmul(Q,K)/math.sqrt(hidden_layer),axis=1)
    print(result)
    result=tf.reshape(result,[-1,50,1])
    result=tf.nn.softmax(result)
    V=V*result
    return V

采用tf.split函数分离出Q，K，V，然后Q与K矩阵相乘，求和，经过softmax最后与V相乘，得到了单头注意力机制的结果
既然有个单头的算法了，如何变成多头的呢？
我们先来看一下论文里写的：
在这里插入图片描述
他是这么做的呢，他是先把一个语料最后一维先分成h份最后concat到了一起，并且文章中的这句话也验证了我们的研究：

文章中采用的是h=8，这里我们采用5。

def multi_head_attention(x,head,output_channel):
    xn=tf.split(x,head,axis=2)
    print(xn)
    V1=xn[0]
    print(V1)
    V1=self_attention(V1,32)
    for a in xn[1:]:
        V=self_attention(a,32)
        V1=tf.concat([V1,V],axis=2)
    print(V1)   
    V1=tf.layers.conv1d(V1,output_channel,1,strides=1, padding='same')
    return V1

这样就比较轻松的完成了multi-head attention的代码编写