2020AI未来杯之准备数据集

1、使用服务器下载数据集——创建文件夹

(base) [LiMiao@gpu08 /]$ cd data
(base) [LiMiao@gpu08 data]$ ls
heshulin  liu_wang_data  lost+found  yangyang
(base) [LiMiao@gpu08 data]$ mkdir limiao
(base) [LiMiao@gpu08 data]$ ls
heshulin  limiao  liu_wang_data  lost+found  yangyang
(base) [LiMiao@gpu08 data]$ cd limiao
(base) [LiMiao@gpu08 limiao]$ mkdir develop_data
(base) [LiMiao@gpu08 limiao]$ ls
develop_data
(base) [LiMiao@gpu08 limiao]$ cd develop_data

注:下载数据集出错:(原因特殊字符未转码)

(base) [LiMiao@gpu08 develop_data]$ wget http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34
[1] 26822
[2] 26823
[3] 26824
(base) [LiMiao@gpu08 develop_data]$ --2020-05-09 10:46:36--  http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612
Resolving aidownload.futurelab.tv (aidownload.futurelab.tv)... 202.97.231.18, 116.117.158.58, 113.229.254.8, ...
Connecting to aidownload.futurelab.tv (aidownload.futurelab.tv)|202.97.231.18|:80... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.

[1]   Exit 6                  wget http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612
[2]-  Done                    token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=
[3]+  Done                    sign=e44d82eb806a436351e847fadd23b085

2、在特殊字符的前面加上\转码,下载成功:

(base) [LiMiao@gpu08 develop_data]$ wget http://aidownload.futurelab.tv/2020af-sr-aishell2.zip\?e\=1589078612\&token\=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU\=\&sign\=e44d82eb806a436351e847fadd23b085\&t\=5eb66d34
--2020-05-09 10:57:21--  http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34
Resolving aidownload.futurelab.tv (aidownload.futurelab.tv)... 123.129.244.194, 101.72.205.202, 61.240.154.98, ...
Connecting to aidownload.futurelab.tv (aidownload.futurelab.tv)|123.129.244.194|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3911445147 (3.6G) [application/zip]
Saving to: ‘2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34’

100%[==========================================>] 3,911,445,147 9.58MB/s   in 6m 30s 

2020-05-09 11:03:52 (9.56 MB/s) - ‘2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34’ saved [3911445147/3911445147]

(base) [LiMiao@gpu08 develop_data]$ ls
19
2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d3

3、更改文件名称:

(base) [LiMiao@gpu08 develop_data]$ mv 2020af-sr-aishell2.zip\?e\=1589078612\&token\=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk\:b2CLU2YtJFdsHSVKdC0XmmMe3tU\=\&sign\=e44d82eb806a436351e847fadd23b085\&t\=5eb66d34 2020af-sr-aishell2.zip
(base) [LiMiao@gpu08 develop_data]$ ls
19  2020af-sr-aishell2.zip

4、解压文件:

(base) [LiMiao@gpu08 develop_data]$ unzip 2020af-sr-aishell2.zip
Archive:  2020af-sr-aishell2.zip
   creating: AISHELL-2/
[2020af-sr-aishell2.zip] AISHELL-2/README.md password: 
password incorrect--reenter: 
  inflating: AISHELL-2/README.md     
   creating: AISHELL-2/iOS/
  inflating: AISHELL-2/iOS/AISHELL2-Data-Specification[ZH].docx  
   creating: AISHELL-2/iOS/data/
  inflating: AISHELL-2/iOS/data/spk_info.txt  
   creating: AISHELL-2/iOS/data/wav/
  inflating: AISHELL-2/iOS/data/wav/D1215.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1225.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1236.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1164.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1217.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1187.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1056.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1226.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1183.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1192.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1179.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1171.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1051.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1162.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1165.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1220.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1055.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1210.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1197.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D2164.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D2165.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1206.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1211.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1219.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1174.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1190.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1057.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1189.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D2166.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1059.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1196.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1194.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1199.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1049.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1198.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1227.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1168.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1163.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1205.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1053.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1180.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1218.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1238.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1200.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1240.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1188.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1237.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1181.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1061.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1048.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1167.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1172.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1186.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1052.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1222.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1223.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1224.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D2162.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1054.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1221.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1214.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1235.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1228.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1169.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1229.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1184.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D2161.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1202.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1178.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1170.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1185.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1193.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1212.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1232.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1234.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1208.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1231.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1060.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1050.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1173.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1058.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1239.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1191.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1182.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1062.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1233.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1241.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1177.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1175.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1204.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1213.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1230.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1209.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1203.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1201.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1207.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1161.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1195.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1216.tar.gz  
  inflating: AISHELL-2/iOS/data/wav/D1176.tar.gz  
  inflating: AISHELL-2/iOS/data/wav.scp  
  inflating: AISHELL-2/iOS/data/trans.txt  
  inflating: AISHELL-2/iOS/AISHELL2-Data-Specification[EN].docx  
  inflating: AISHELL-2/iOS/ChangeLog  
(base) [LiMiao@gpu08 develop_data]$ ls
19  2020af-sr-aishell2.zip  AISHELL-2
(base) [LiMiao@gpu08 develop_data]$ cd AISHELL-2
(base) [LiMiao@gpu08 AISHELL-2]$ ls
iOS  README.md
(base) [LiMiao@gpu08 AISHELL-2]$ cd iOS
(base) [LiMiao@gpu08 iOS]$ ls
AISHELL2-Data-Specification[EN].docx  ChangeLog
AISHELL2-Data-Specification[ZH].docx  data
(base) [LiMiao@gpu08 iOS]$ cd data
(base) [LiMiao@gpu08 data]$ ls
spk_info.txt  trans.txt  wav  wav.scp
(base) [LiMiao@gpu08 data]$ cd wav
(base) [LiMiao@gpu08 wav]$ ls
D1048.tar.gz  D1163.tar.gz  D1181.tar.gz  D1198.tar.gz  D1215.tar.gz  D1232.tar.gz
D1049.tar.gz  D1164.tar.gz  D1182.tar.gz  D1199.tar.gz  D1216.tar.gz  D1233.tar.gz
D1050.tar.gz  D1165.tar.gz  D1183.tar.gz  D1200.tar.gz  D1217.tar.gz  D1234.tar.gz
D1051.tar.gz  D1167.tar.gz  D1184.tar.gz  D1201.tar.gz  D1218.tar.gz  D1235.tar.gz
D1052.tar.gz  D1168.tar.gz  D1185.tar.gz  D1202.tar.gz  D1219.tar.gz  D1236.tar.gz
D1053.tar.gz  D1169.tar.gz  D1186.tar.gz  D1203.tar.gz  D1220.tar.gz  D1237.tar.gz
D1054.tar.gz  D1170.tar.gz  D1187.tar.gz  D1204.tar.gz  D1221.tar.gz  D1238.tar.gz
D1055.tar.gz  D1171.tar.gz  D1188.tar.gz  D1205.tar.gz  D1222.tar.gz  D1239.tar.gz
D1056.tar.gz  D1172.tar.gz  D1189.tar.gz  D1206.tar.gz  D1223.tar.gz  D1240.tar.gz
D1057.tar.gz  D1173.tar.gz  D1190.tar.gz  D1207.tar.gz  D1224.tar.gz  D1241.tar.gz
D1058.tar.gz  D1174.tar.gz  D1191.tar.gz  D1208.tar.gz  D1225.tar.gz  D2161.tar.gz
D1059.tar.gz  D1175.tar.gz  D1192.tar.gz  D1209.tar.gz  D1226.tar.gz  D2162.tar.gz
D1060.tar.gz  D1176.tar.gz  D1193.tar.gz  D1210.tar.gz  D1227.tar.gz  D2164.tar.gz
D1061.tar.gz  D1177.tar.gz  D1194.tar.gz  D1211.tar.gz  D1228.tar.gz  D2165.tar.gz
D1062.tar.gz  D1178.tar.gz  D1195.tar.gz  D1212.tar.gz  D1229.tar.gz  D2166.tar.gz
D1161.tar.gz  D1179.tar.gz  D1196.tar.gz  D1213.tar.gz  D1230.tar.gz
D1162.tar.gz  D1180.tar.gz  D1197.tar.gz  D1214.tar.gz  D1231.tar.gz

现在我们还需要对wav中的说话人数据进行解压操作,为此,创建脚本文件,使用for循环进行解压,详细解压过程见下文

5、解压完成后如下:
在这里插入图片描述

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值