Tcl 脚本读取复杂CSV文件

最新推荐文章于 2024-04-15 18:17:58 发布

王桑的一天

最新推荐文章于 2024-04-15 18:17:58 发布

阅读量7.4k

点赞数 3

文章标签： csv tcl 脚本 header struct 测试工具

本文链接：https://blog.csdn.net/wn0112/article/details/7194441

版权

用 tcl/tk 写了个测试工具，需要用tcl 脚本读取csv 文件。但复杂的csv 文件中，每个单元格可能包含逗号，双引号，换行符，双引号中又有换行符等等情况，导致读取困难。网上找到的一些例子，大多是逐个读取单个字符，用了一段时间，感觉效率差了点。研究了一下，自己写了 tcl 读csv 文件的代码，如下：

proc readCSV { channel { header 1 } { symbol , }} {
	set quote 0	
	set data [ split [ read $channel nonewline ] "\n" ]
	foreach line $data {
		set quote [ expr { $quote + [ regexp -all \" $line ]}]
		if { [ expr { $quote % 2 }] == "0" } {
			set quote 0
			append row_temp $line
			set row_temp [ split $row_temp , ]	
			foreach section $row_temp {
				set quote [ expr { $quote + [ regexp -all \" $section ]}]
				if { [ expr { $quote % 2 }] == "0" } {
					append cell_temp $section
					set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
					lappend cell $cell_temp
					unset cell_temp
					set quote 0
				} else {
					append cell_temp $section$symbol
				}
			}
			lappend final [ regsub -all {""} $cell \" ]
			unset cell
			unset row_temp
		} else {
			append row_temp $line\n
		}
	}
	# generate array if needed, or return $final here
	set row [ llength $final ]
	set column [ llength [ lindex $final 0 ]]
	if { $header == 1 } {
		for { set i 0 } { $i < $row } { incr i } {		
			for { set j 0 } { $j < $column } { incr j } {
				set csvData([ lindex [ lindex $final 0 ] $j ],$i) [ lindex [ lindex $final $i ] $j ]
			}
		}
	} else {
		for { set i 0 } { $i < $row } { incr i } {		
			for { set j 0 } { $j < $column } { incr j } {
				set csvData($i,$j) [ lindex [ lindex $final $i ] $j ]
			}
		}
	}
	return [ array get csvData ]
}

函数返回一个数组，默认指定csv文件中第一行作为Header，分隔符为","，可变更。

能够处理csv文件中包含的 ",", "'", "\n" 字符。

Example:

下面是以Header & line number的方式输出某单元格数据：

set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv ]
puts $csvData(Name,1)    ;# assume there is a cell containing "Name" at first row.

下面是以row number & line number方式输出某单元格数据：

set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv 0 ]
puts $csvData(3,1)

Efficency:
经测试，处理 2000 x 4 容量的测试用例文件，用时100ms左右。

-----------------------------------

CPU: Dual-Core 3.20GHz

Memory: 2G

System Type: 32bit

-----------------------------------

tcl 里有个专门处理csv文件的包，叫csv，对比了一下效率。如果同样返回处理后的数据列表，这个函数处理速度会快一点。

csv package的使用方法：

package require csv
package require struct::queue

set csv [ open c:/testcase.csv {RDWR} ]

::struct::queue q
::csv::read2queue $csv q
set final [ q peek [ q size ]]

Cappacity	readCSV	csv package	file size
2000*4	103ms	170ms	768KB
2000*8	200ms	335ms	1534KB
2000*16	382ms	770ms	3065KB
2000*32	760ms	2088ms	6127KB
2000*64	1501ms	6411ms	12252KB
2000*128	2995ms	21841ms	24501KB

Output:

所输出的数据，与在Excel 中看到的csv 文件内容相同。

类的形式：

package require Itcl

itcl::class readCSV {
	common final
	common anchor 1
	constructor { path } {
		set quote 0
		set channel [ open $path {RDWR} ]
		set data [ split [ read $channel nonewline ] "\n" ]
		close $channel
			foreach line $data {
				set quote [ expr { $quote + [ regexp -all \" $line ]}]
				if { [ expr { $quote % 2 }] == "0" } {
					set quote 0
					append row_temp $line
					set row_temp [ split $row_temp , ]	
					foreach section $row_temp {
						set quote [ expr { $quote + [ regexp -all \" $section ]}]
						if { [ expr { $quote % 2 }] == "0" } {
							append cell_temp $section
							set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
							lappend cell $cell_temp
							unset cell_temp
							set quote 0
						} else {
							append cell_temp $section,
						}
					}
					lappend final [ regsub -all {""} $cell \" ]
					unset cell
					unset row_temp
				} else {
					append row_temp $line\n
				}
			}
	}
	
	method getCell { row col } {
		return [ lindex [ lindex $final $row ] $col ]
	}
	
	method getValue { header } {
		set col [ lsearch [ lindex $final 0 ] $header ]
		return [ getCell $anchor $col ]
	}
	
	method next { } {
		if { [ done ] == 0 } {
			incr anchor
		}
	}
	
	method pre { } {
		if { $anchor > 1 } {
			incr anchor -1
		}
	}
	
	method end { } {
		set anchor [ expr {[ llength $final ]-1}]
	}
	
	method done { } {
		if { $anchor == [ expr {[ llength $final ]-1} ]} {
			return 1
		} else {
			return 0
		}
	}
	
	method reset { } {
		set anchor 1
	}
	
}

Name	Age	Address
Zhang_san	13	Address1: 1. aaaaa 2. aaad "bbbb", 3. bacad, adfa"aaa".
Li_si	14	Address2, xxxx aaaa" bbbbb".,
Wang_wu	15	Address3

Example:

readCSV f c:/csvfile.csv
f getValue Name

output:

Zhang_san

f next
f getValue Name

output:

Li_si

f pre
f getValue Name
f end
f getValue Name
f getCell 1 0

output:

Zhang_san

Wang_wu

Zhang_san

王桑的一天

关注

3
点赞
踩
33

收藏

觉得还不错? 一键收藏
3
评论
Tcl 脚本读取复杂CSV文件

用 tcl/tk 写了个测试工具，需要用tcl 脚本读取csv 文件。但复杂的csv 文件中，每个单元格可能包含逗号，双引号，换行符，双引号中又有换行符等等情况，导致读取困难。网上找到的一些例子，大多是逐个读取单个字符，用了一段时间，感觉效率差了点。研究了一下，自己写了 tcl 读csv 文件的代码，如下：proc readCSV { channel { header 1 } { symbol
复制链接

扫一扫