csv 单引号 数字,处理带单引号的字符串和带单引号的字符串中的偶数逗号或单引号的CSV文件...

I have a CSV file with text columns quoted in single quote around it and other non text columns don't have a quote around it.The text columns might have a comma or single quote within these single quoted text columns. I found a script online but it doesn't handle this kind of situation.

Is there way to handle this in PowerShell?

Example:

123,678.89,'hello there1', 'xyz1@gmail.com', 'abc,nds'\n

123,678.89,'hello 'there2', 'xyz2@gmail.com', 'akiu-'nds'\n

Output:

123,678.89|hello there1|xyz1@gmail.com|abc,nds \n

123,678.89|hello 'there2|xyz2@gmail.com|akiu-'nds \n

Example 2:

123,6272,678.89,,,'hello ,there1',,,,'abc1','tw,es',,'xyz1@gmail.com',,,,,,'abc,nds1'\n

124,8272,928.89,,,,'hello 'there2',,,'abc2','twes',,,'xyz2@gmail.com',,'biej',,,'abc'nds2'\n

125,9272,328.89,,'hello 'there3',,'abc3',', outyi',,,,'xyz3@gmail.com',,,,,,'ahct','abc'nds3'\n

Output:

123|6272|678.89|||hello ,there1||||abc1|tw,es||xyz1@gmail.com||||||abc,nds1\n 124|8272|928.89||||hello 'there2|||abc2|twes|||xyz2@gmail.com||biej|||abc'nds2\n

125|9272|328.89||hello 'there3||abc3|, outyi||||xyz3@gmail.com||||||ahct|abc'nds3\n

解决方案

Similar to Kiran's answer. There are a couple of things that need to change so I don't think that there is a one size fits all solution. We need to chain these couple of changes. First being the commas that are actually delimiters and second the special end of line character sequence.

$path = "c:\temp\file.csv"

$newDelimiter = "|"

(Get-Content $path) -replace "'\s*?,\s?'|,\s?'|'\s?,",$newDelimiter -replace "'\s*?\\n$","\n" | Set-Content $path

I have a regex101 link that explains with more detail. The regex doing the greater work is the first with three potential alternate matches. This effectively ignores quotes that are off by themselves. If there is data that has a quote and comma combo then I think it would be following to program this without more information.

'\s*?,\s?': Comma enclosed in quotes optionally surrounded by variant white-space.

,\s?': Comma with optional space followed by a quote

'\s?,: Quote with optional space followed by a comma

So a match of any of the above groups would be replaced with $newDelimiter. Second regex is just looking for '\n$ while accounting for potential optional white-space between the quote and \n that happens at the end of the line. This is how the last single quote is removed.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值