我有265个CSV文件,总记录(行)超过400万条,需要在所有CSV文件中进行搜索和替换。下面是我的PowerShell代码片段,执行此操作需要17分钟:ForEach ($file in Get-ChildItem C:\temp\csv\*.csv)
{
$content = Get-Content -path $file
$content | foreach {$_ -replace $SearchStr, $ReplaceStr} | Set-Content $file
}
现在,我有了以下Python代码,它们执行相同的操作,但执行时间不到1分钟:import os, fnmatch
def findReplace(directory, find, replace, filePattern):
for path, dirs, files in os.walk(os.path.abspath(directory)):
for filename in fnmatch.filter(files, filePattern):
filepath = os.path.join(path, filename)
with open(filepath) as f:
s = f.read()
s = s.replace(find, replace)
with open(filepath, "w") as f:
f.write(s)
findReplace("c:/temp/csv", "Search String", "Replace String", "*.csv")
为什么Python方法要高效得多?我的PowerShell代码是高效的,还是在文本操作方面Python只是一种更强大的编程语言?