用 tcl/tk 写了个测试工具,需要用tcl 脚本读取csv 文件。但复杂的csv 文件中,每个单元格可能包含逗号,双引号,换行符,双引号中又有换行符等等情况,导致读取困难。网上找到的一些例子,大多是逐个读取单个字符,用了一段时间,感觉效率差了点。研究了一下,自己写了 tcl 读csv 文件的代码,如下:
proc readCSV { channel { header 1 } { symbol , }} {
set quote 0
set data [ split [ read $channel nonewline ] "\n" ]
foreach line $data {
set quote [ expr { $quote + [ regexp -all \" $line ]}]
if { [ expr { $quote % 2 }] == "0" } {
set quote 0
append row_temp $line
set row_temp [ split $row_temp , ]
foreach section $row_temp {
set quote [ expr { $quote + [ regexp -all \" $section ]}]
if { [ expr { $quote % 2 }] == "0" } {
append cell_temp $section
set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
lappend cell $cell_temp
unset cell_temp
set quote 0
} else {
append cell_temp $section$symbol
}
}
lappend final [ regsub -all {""} $cell \" ]
unset cell
unset row_temp
} else {
append row_temp $line\n
}
}
# generate array if needed, or return $final here
set row [ llength $final ]
set column [ llength [ lindex $final 0 ]]
if { $header == 1 } {
for { set i 0 } { $i < $row } { incr i } {
for { set j 0 } { $j < $column } { incr j } {
set csvData([ lindex [ lindex $final 0 ] $j ],$i) [ lindex [ lindex $final $i ] $j ]
}
}
} else {
for { set i 0 } { $i < $row } { incr i } {
for { set j 0 } { $j < $column } { incr j } {
set csvData($i,$j) [ lindex [ lindex $final $i ] $j ]
}
}
}
return [ array get csvData ]
}
函数返回一个数组,默认指定csv文件中第一行作为Header,分隔符为",",可变更。
能够处理csv文件中包含的 ",", "'", "\n" 字符。
Example:
下面是以Header & line number的方式输出某单元格数据:
set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv ]
puts $csvData(Name,1) ;# assume there is a cell containing "Name" at first row.
下面是以row number & line number方式输出某单元格数据:
set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv 0 ]
puts $csvData(3,1)
Efficency:
经测试,处理 2000 x 4 容量的测试用例文件,用时100ms左右。
-----------------------------------
CPU: Dual-Core 3.20GHz
Memory: 2G
System Type: 32bit
-----------------------------------
tcl 里有个专门处理csv文件的包,叫csv,对比了一下效率。如果同样返回处理后的数据列表,这个函数处理速度会快一点。
csv package的使用方法:
package require csv
package require struct::queue
set csv [ open c:/testcase.csv {RDWR} ]
::struct::queue q
::csv::read2queue $csv q
set final [ q peek [ q size ]]
Cappacity | readCSV | csv package | file size |
---|---|---|---|
2000*4 | 103ms | 170ms | 768KB |
2000*8 | 200ms | 335ms | 1534KB |
2000*16 | 382ms | 770ms | 3065KB |
2000*32 | 760ms | 2088ms | 6127KB |
2000*64 | 1501ms | 6411ms | 12252KB |
2000*128 | 2995ms | 21841ms | 24501KB |
Output:
所输出的数据,与在Excel 中看到的csv 文件内容相同。
类的形式:
package require Itcl
itcl::class readCSV {
common final
common anchor 1
constructor { path } {
set quote 0
set channel [ open $path {RDWR} ]
set data [ split [ read $channel nonewline ] "\n" ]
close $channel
foreach line $data {
set quote [ expr { $quote + [ regexp -all \" $line ]}]
if { [ expr { $quote % 2 }] == "0" } {
set quote 0
append row_temp $line
set row_temp [ split $row_temp , ]
foreach section $row_temp {
set quote [ expr { $quote + [ regexp -all \" $section ]}]
if { [ expr { $quote % 2 }] == "0" } {
append cell_temp $section
set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
lappend cell $cell_temp
unset cell_temp
set quote 0
} else {
append cell_temp $section,
}
}
lappend final [ regsub -all {""} $cell \" ]
unset cell
unset row_temp
} else {
append row_temp $line\n
}
}
}
method getCell { row col } {
return [ lindex [ lindex $final $row ] $col ]
}
method getValue { header } {
set col [ lsearch [ lindex $final 0 ] $header ]
return [ getCell $anchor $col ]
}
method next { } {
if { [ done ] == 0 } {
incr anchor
}
}
method pre { } {
if { $anchor > 1 } {
incr anchor -1
}
}
method end { } {
set anchor [ expr {[ llength $final ]-1}]
}
method done { } {
if { $anchor == [ expr {[ llength $final ]-1} ]} {
return 1
} else {
return 0
}
}
method reset { } {
set anchor 1
}
}
Name | Age | Address |
---|---|---|
Zhang_san | 13 | Address1: 1. aaaaa 2. aaad "bbbb", 3. bacad, adfa"aaa". |
Li_si | 14 | Address2, xxxx aaaa" bbbbb"., |
Wang_wu | 15 | Address3 |
readCSV f c:/csvfile.csv
f getValue Name
output:
Zhang_san
f next
f getValue Name
output:
Li_si
f pre
f getValue Name
f end
f getValue Name
f getCell 1 0
output:
Zhang_san
Wang_wu
Zhang_san