java_crocodile-CSDN博客

背景短语音不足以收集到足够的帧来学习音素信息，不同说话人的相同语句的相似度，可能大于同一个人不同语句的相似度。实现流程acoustic stem的输入为40维Fbank，phonetic stem的输入为100维ASR bottleneck features。两个stem由中间的couple stem连接，第一层的couple stem的输入为Fbank与ASR bottleneck features的连接，之后每层的输入为acoustic stem、couple stem、phonetic st

2021-09-14 15:29:12 334 2

原创 D-TDNN

实现流程每层D-TDNN layer的结构如下：第一层为FNN-based的bottleneck layer。g为TDNN层的output size(growth rate)，bottleneck layer层的output size为2g,最后将D-TDNN的输入与TDNN layer的输出连接起来。整个D-TDNN的结构如下：整个网络分为5个部分：1. 1，初始化通道数2. 2-8，帧偏移为1，学习局部特征。3. 9-21，帧偏移为3，学习long-term dependence4

2021-09-14 15:26:21 719

原创 AutoSpeech: Neural Architecture Search for Speaker Recognition

背景经典的CNN可能并不适合声纹识别。本文提出了一种网络搜索的办法，来寻找最适合的network。实现search space：网络由多个cell，组成，每个cell的结构如下：每个xi代表了一个tensor，每个edge代表了一种operation oij(.)每个Cell包括2个input node、4个intermediate node、1个output node第k个input的x0为第 k-2个cell的output，x1为第 k-1个cell的output对于intermedia

2021-09-13 14:55:00 392 6

原创 Utterance-Level Aggregation For Speaker Recognition In The Wild

本文使用NetVLAD，将frame-level聚合为utterance-level。in the wild: 4s以上的语音实现流程将通过Thin ResNet的frame-level通过NetVLAD聚合为utterance-level。网络输入为R（257×T×1），输出变为了R（1×T/32×512）NetVLAD: 输出一个K×D的矩阵V，K为聚类类别数，D为每一类的维数。第一项代表了这一帧特征在类别k的权重，第二项代表了其与类中心的残差。最后将每帧向量L2标准化后连接起来。在

2021-08-02 20:11:46 359

原创 ECAPA-TDNN

实现流程ECAPA-TDNN由三部分组成：1-Dimensional Squeeze-Excitation Res2Blocks传统的x-vector的frame-layers只考虑了15帧的信息，而我们想要其考虑全局的信息，因此使用了 Squeeze-Excitation (SE) blocks首先是squeeze操作：将每一帧 frame-level features按时间取平均，输入特征为[N, C, L]，其中N为batch size，L为特征帧数， C为channel数，则通过求平均值，

2021-08-02 15:43:26 5331 1

qq_41048571的博客

原创 Spring日志配置

原创 spring常用注解

原创第十四章：类的加载

原创第十三章：垃圾回收器

原创第十二章：垃圾回收相关概念

原创第十一章：垃圾回收算法

原创第十一章 StringTable

原创第十章：执行引擎

原创第九章：对象实例化

原创第八章：方法区

原创第七章：堆

原创第六章：本地方法栈

原创第五章：虚拟机栈

原创第四章：程序计数器(PC寄存器)

原创第三章：运行时数据区概述及线程

原创第二章：类加载子系统

原创第一章：JVM简介

原创 PacNet

原创 D-TDNN

原创 AutoSpeech: Neural Architecture Search for Speaker Recognition

原创 Utterance-Level Aggregation For Speaker Recognition In The Wild

原创 ECAPA-TDNN

原创 Channel Invariant Speaker Embedding Learning with Joint Multi-Task and Adversarial Training

原创 SAP(Self-Attentive Speaker Embeddings for Text-Independent Speaker Veriﬁcation)

原创 ASP(Attentive Statistics Pooling for Deep Speaker Embedding)

原创 j-vector(Multi-Task Learning for Text-dependent Speaker Veriﬁcation)

原创 Deep Speaker: an End-to-End Neural Speaker Embedding System

原创 Triplet Loss（End-to-End Text-Independent Speaker Veriﬁcation with Triplet Loss on Short Utterances）

原创 Triplet Loss（TristouNet: Triplet Loss For Speaker Turn Embedding）

原创 x-vector(X-Vectors:Robust DNN embeddings For Speaker Recognition)

原创 x-vector(Deep Neural Network-Based Speaker Embeddings For End-to-End Speaker Verification)

原创 d-vector(End-to-End Text-Dependent Speaker Verification)

原创 d-vector(Deep neural networks for small footprint text-dependent speaker verification)

原创 python学习笔记 collections.OrderedDict‘ object has no attribute ‘eval‘

空空如也

空空如也