IEEE SLT 2021丨Alpha-mini Speech Challenge(ASC)

Introduction

Human-robot speech interaction (HRSI) is an indispensable skill for humanoid robots. Robots produced by UBTECH are equipped with intelligent voice interaction functions. As a global high-tech innovation enterprise integrating artificial intelligence, humanoid robot research and development, platform software development and application, and product sales, UBTECH has always been committed to smooth, efficient and friendly HRSI technology research and development, enabling every robot to listen and speak.

As the first chain of HRSI, keyword spotting (KWS) technology, (a.k.a wake-up word detection ) directly determines the experience of subsequent interactions. Meanwhile, the accuracy of sound source location (SSL) can provide essential cues for subsequent beamforming, speech enhancement and speech recognition algorithms. In home environments, the following interferences pose great challenges to HRSI: 1) various types of noises from TV, radio, other electrical appliances and human talking, 2) echoes from the loudspeaker equipped on the robot, 3) room reverberation and 4) noises from the mechanical movements of the robot (mechanical noise in short). These noise interferences complicate KWS and SSL to a great extent. Thus, robust algorithms are highly in demand.

UBTECH Technology Co., Ltd., Northwestern Polytechnical University, Idiap Research Institute, Peking University and AISHELL Foundationjointly organize the Alpha-mini Speech Challenge (ASC), providing a common benchmark for KWS, SSL and related tasks. Alpha-mini is an excellent robot produced by UBTECH, equipped with intelligent speech interaction module based on a 4-microphone array. As a flagship challenge event of the 2021 IEEE Spoken Language Technology (SLT) Workshop , ASC will provide the participants with labelled audio data recorded from Alpha-mini in real room environments, covering abundant indoor noise, echo and reverberation. It aims to promote research in actual HRSI scenarios and provide a common benchmark for KWS, SSL and related speech tasks.

The official challenge description paper can be downloaded from here.

@inproceedings{ASC2021,
    title={IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines},
    author={Fu, Yihui and Yao, Zhuoyuan and He, Weipeng and Wu, Jian and Wang, Xiong and Yang, Zhanheng and
           Zhang,Shimin and Xie, Lei and Huang, Dongyan and Bu, Hui and Motlicek, Petr and Odobez, Jean-Marc},
    booktitle = {{IEEE SLT 2021}},
    address = {Shenzhen, China},
    year = {2021},
    month = January,
  }

Codes for baseline systems can be found from: https://github.com/nwpuaslp/ASC_baseline


Results

Table 1. Results of KWS Track

RankingTeam IDOrganizationFRRFARScore (The lower the better)
1ASC_029

MICL, School of Computer Science and Engineering, 

Nanyang Technological University, Singapore

0.310.290.59
2ASC_018

BATC Lab, Department of Electronic Engineering, 

Shanghai Jiao Tong University

0.32 0.440.75
Baseline(Deep KWS)0.550.250.81
3ASC_0200.220.640.86
4ASC_0160.140.740.88
5ASC_0320.450.460.91
6ASC_0140.060.910.97
7ASC_0190.070.931.00

Table 2. Results of SSL Track

RankingTeam IDACC10 ACC7.5(%)FACC5(%)Score MAE(°)Score (The higher the better)
Bsaeline27.0018.9311.4566.4018.73
1ASC_03216.6512.329.0864.1512.52
2ASC_00412.659.406.8974.139.38
3ASC_0156.674.803.5088.384.58 
4ASC_0276.024.293.1488.654.07

Datasets

Table 1. Data to Release

DatasetSubsetDuration(hrs)FormatScenarioMic-Loudspeaker
distance(metres)
Training

Keyword-Train

Speech-Train

Noise-Train

Echo-Train

9.4

146.1

60.0

28.5

16kHz, 16bit,
single channel wav
//

Echo-Record

Noise-Mech

3.0

8.6

16kHz, 16bit,
six-channel wav
DevelopmentKWS-Dev7.516kHz, 16bit,
six-channel wav

Keyword Only 

Keyword+Noise 

Keyword+Echo 

Keyword+Noise+Echo 

Keyword+Echo+Mech 

[2, 4]
SSL-Dev20.0

Speech Only 

Speech+Noise 

Speech+Echo 

Speech+Noise+Echo 

Speech+Echo+Mech 

Evaluation

KWS-Eval

SSL-Eval

TBASame as DevelopmentSame as Development[2,5]

Keyword Spotting (KWS) Track

The data used in KWS Track is shown in Table 2. Participants can use their own room impulse response (RIR), either collected or simulated, for data augmentation to train the KWS model. Furthermore, Echo-Record and Noise-Mech are provided as the reference of time-delay of echo and mechanical noise of Alpha-mini, respectively. Participants can also use these data sets during training. KWS-Dev, SSL-Dev, KWS-Eval, SSL-Eval are six-channel recorded data. Participants can use KWS-Dev and SSL-Dev directly without any simulation to optimize the model.

Data

Table 2: Data for Keyword spotting (KWS) Track

TrainDevelopment Evaluation

Keyword-Train

Speech-Train 

Noise-Train 

Echo-Train 

Echo-Record 

Noise-Mech 

KWS-Dev
SSL-Dev
KWS-Eval
SSL-Eval

Evaluation & Ranking


Sound Source Location (SSL) Track

The data that participants can use in SSL Track is shown in Table 3. Participants can also use their own RIR, either collected or simulated, for data augmentation to train the SSL model. Furthermore, Echo-Record and Noise-Mech are provided as the reference of time-delay of echo and mechanical noise of Alpha-mini, respectively. Participants can also use these data sets during training. SSL-Dev and SSL-Eval are six-channel recorded data. Participants can use SSL-Dev directly without any simulation to optimize the model.

Data

Table 3: Data for Sound Source Location (SSL) Track

TrainDevelopment Evaluation

Speech-Train

Noise-Train 

Echo-Train 

Echo-Record 

Noise-Mech 

SSL-DevSSL-Eval

Evaluation & Ranking

Rules

The use of any other data that is not provided by challenge organizers (except for RIR) is strictly prohibited. Furthermore, it is not allowed to use SSL-Dev and Keyword-Train to train the SSL model. The challenge organizers will provide participants with the topology of microphone array and loudspeaker, as well as the definition of angle. There is no limitation on the system architecture, models, training techniques and time delays. However, we encourage participants to develop models with better performance and lower time delay. In case the submitted systems with the same score, the system with lower time delay will be given higher ranking.

Submission

SSL-Eval will not be released before organizers notify the participants about the results. Participants need to provide organizers with a docker image of a runnable SSL system. The executable file in the image needs to receive the list of data in SSL-Eval and outputs the result of SSL. The output determines the direction of speech ranges from 1°to 360°. A detailed technical support of the usage and submission of docker will be provided later.


Organizing Committee

● Youjun Xiong, UBTECT Technology Co., Ltd.

● Lei Xie, Northwestern Polytechnical University

● Huan Tan, UBTECT Technology Co., Ltd.

● Dongyan Huang, UBTECT Technology Co., Ltd.

● Jean-Marc Odobez, Idiap Research Institute, Switzerland

● Petr Motlicek, Idiap Research Institute, Switzerland

● Weipeng He, Idiap Research Institute, Switzerland

● Yuexian Zou, Peking University

● Hui Bu, AISHELL Foundation

● Jian Wu, Northwestern Polytechnical University


Important Dates

September 27th, 2020Registration due
September 30th, 2020 Release of the training and development set
November 22nd, 2020Deadline for participants to submit docker mirror
December 6th, 2020Organizers will notify the participants about the results
December 27th, 2020Working note report deadline

Awards

KWS

  • TrackFirst prize: An Alpha-mini robot

  • Second Prize: An Iron Man MARK50 robot

  • Third prizes: A Star Wars First Order Stormtrooper robot

SSL Track

  • First prize: An Alpha-mini robot

  • Second Prize: An Iron Man MARK50 robot

  • Third prizes: A Star Wars First Order Stormtrooper robot


 MISC

Participants can choose any track. It is also welcomed to participant in both tracks. More details on this challenge will be announced soon. The right of interpretation of the challenge belongs to the organizing committee. Should you have any questions regarding this challenge, please drop an email to: slt2021_asc@163.com.

IEEE SLT 2021 ASC Website

IEEE SLT 2021 Alpha-mini Speech Challenge(ASC)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值