先解决第二个问题:Could not generate .cloupe file
martian.StageException: Could not generate .cloupe file:
表示csv文件格式有问题,使用od -c file.csv查看
- Missing new-line character at the end of file (\n)
- NonWordChars at the beginning of the file (eg: 357, 273, 277)
$od -c 14.csv
0000000 357 273 277 s a m p l e _ i d , m o l
0000020 e c u l e _ h 5 \r \n 1 4 , / m n
0000040 t / s r a / 1 4 _ c o u n t _ o
0000060 u t / o u t s / m o l e c u l e
0000100 _ i n f o . h 5 \r \n 1 5 , / m n
0000120 t / s r a / 1 5 _ c o u n t _ o
0000140 u t / o u t s / m o l e c u l e
0000160 _ i n f o . h 5 \r \n 1 6 , / m n
0000200 t / s r a / 1 6 _ c o u n t _ o
0000220 u t / o u t s / m o l e c u l e
0000240 _ i n f o . h 5 \r \n 1 7 , / m n
0000260 t / s r a / 1 7 _ c o u n t _ o
0000300 u t / o u t s / m o l e c u l e
0000320 _ i n f o . h 5 \r \n \n \n
0000334
# 第一行有BOM:357/273/277
# 每行有回车/r
表格最前多了357 273 277的字符
原因是使用Windows创建的CSV文件,多了开头的BOM,删除即可
sed -i 's/\xEF\xBB\xBF//' 14.csv #删除BOM
od -c 14.csv
0000000 s a m p l e _ i d , m o l e c u
0000020 l e _ h 5 \n 1 4 , / m n t / s r
0000040 a / 1 4 _ c o u n t _ o u t / o
0000060 u t s / m o l e c u l e _ i n f
0000100 o . h 5 \n 1 5 , / m n t / s r a
0000120 / 1 5 _ c o u n t _ o u t / o u
0000140 t s / m o l e c u l e _ i n f o
0000160 . h 5 \n 1 6 , / m n t / s r a /
0000200 1 6 _ c o u n t _ o u t / o u t
0000220 s / m o l e c u l e _ i n f o .
0000240 h 5 \n 1 7 , / m n t / s r a / 1
0000260 7 _ c o u n t _ o u t / o u t s
0000300 / m o l e c u l e _ i n f o . h
0000320 5 \n \n \n \n
0000326
检查文件
cat 14.csv
sample_id,molecule_h5
14,/mnt/sra/14_count_out/outs/molecule_info.h5
15,/mnt/sra/15_coun