介绍:
我得到了一个CSV文件,其中字段分隔符是管道分类(即|).
此文件具有预定义数量的字段(例如N).我可以通过读取CSV文件的标题来发现N的值,我们可以认为这是正确的.
问题:
一些字段错误地包含换行符,这使得该行看起来比所需的短(即,它具有M个字段,其中M
if [ $# -ne 1 ]
then
echo "Usage: $0 "
exit
fi
# get first line
first_line=$(head -n 1 $1)
# get number of fields
num_separators=$(echo "$first_line" | tr -d -c '|' | awk '{print length}')
cat $1 | awk -v numFields=$(( num_separators + 1 )) -F '|' '
{
totRecords = NF/numFields
# loop over lines
for (record=0; record < totRecords; record++) {
output = ""
# loop over fields
for (i=0; i
j = (numFields*record)+i+1
# replace newline with question mark
sub("\n", "?", $j)
output = output (i > 0 ? "|" : "") $j
}
print output
}
}
'
但是,换行符仍然存在.
我该如何解决这个问题?
CSV示例:
FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a
newline
Foo|Bar|Baz
预期产量:
FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a * newline
Foo|Bar|Baz
* I don't care about the replacement, it could be a space, a question mark, whatever except a newline or a pipe (which would create a new field)