csv文件示例_自己动手? -一个简单的CSV解析器示例

本文探讨了在处理CSV数据时,选择自己编写解析器还是使用第三方库的权衡。作者通过一个80行的C#简单CSV解析器示例,展示了如何实现基本的CSV解析功能,并对比了使用第三方库的优缺点。文章还提到了可扩展性和未来文件规格变化的影响,并提供了F#的实现。文章最后强调,对于简单需求,自己编写解析器可能是更合适的选择。
摘要由CSDN通过智能技术生成

csv文件示例

目录 (Table of Contents)

介绍 (Introduction)

I was having this conversation with a coworker about using a third-party library vs. rolling your own, specifically regarding parsing a comma separated value (CSV) data, or more generally, any text data separated by a known delimiter.

我正在与一位同事进行对话,讨论使用第三方库而不是滚动自己的库,特别是解析逗号分隔值(CSV)数据,或更一般而言,解析由已知定界符分隔的任何文本数据。

This subject has been pretty much beaten to death and there definitely are many libraries in every language under the sun to parse CSV files. My coworker brought this particular article to my attention: CSV Parsing in .NET Core which is a useful example (which I ignore here) of some edge cases -- certainly not all of them. The article references a couple projects, CsvHelper and TinyCsvParser and one can easily find several CSV parsers on Code Project:

这个主题已经被打死了,在阳光下肯定有很多使用每种语言的库都可以解析CSV文件。 我的同事引起了我这篇特别的文章: .NET Core中的CSV解析,这是一些极端情况的有用示例(在此我忽略),当然不是全部。 本文引用了CsvHelperTinyCsvParser两个项目,并且可以在Code Project上轻松找到几个CSV解析器:

That last link is quite comprehensive in parsers that Tomas Takac evaluated and has a great graph of the finite state machine to parse CSV data. Even more interesting (to me at least), he references RFC-4180 which documents Common Format and MIME Type for Comma-Separated Values (CSV) Files. Pretty cool.

最后一个链接在Tomas Takac评估的解析器中相当全面,并且具有用于解析CSV数据的有限状态机的图形。 更有趣的是(至少对我而言),他引用了RFC-4180 ,该文件记录了逗号分隔值(CSV)文件的通用格式和MIME类型。 很酷

That said, having looked at a few myself and having some very basic requirements (really no edge cases), the question again came up as to whether to roll my own or use a library. Some of the discussion points were:

就是说,在看了几个我自己并且有一些非常基本的要求(真的没有极端情况)之后,再次提出了一个问题,就是要自己动手还是使用一个库。 一些讨论要点是:

  • What if there's a bug in the roll-your-own code?

    如果您自己的代码有错误怎么办?
  • What if it needs to be extended?

    如果需要扩展怎么办?
  • What if the file specification changes?

    如果文件规格更改怎么办?

My answers here are biased of course, but they still might be of some value:

我在这里的答案当然是有偏见的,但它们可能仍然有价值:

What if there's a bug in the roll-your-own code?

如果您自己的代码有错误怎么办?

Well, given that the actual parser is 80 lines long (not including extension methods and helper functions), how hard is it to have bugs? And there's unit tests, which actually are 3x more lines of code! My first reaction to rolling my own is always a question to complexity: how complex is the problem to be solved, and how complex are the third party libraries out there that solve that problem? The fewer lines of code, the less bugs and the fewer unit tests one has to write. It cannot be guaranteed that a third-party library is bug free or has implemented decent unit tests. Granted, this is all rather moot for non-edge cases anyways. Still, looking at the code for some of these libraries, in which the implementation is in the thousands of lines of code, I really really can't justify adding that to my code base -- the npm install, nuget install, and so forth are so easy that they preclude any thinking about what you're about to do. This is really a dangerous situation, especially when packages have hidden dependencies and you're suddenly installing 10s or 100s of additional, breakable, dependencies.

好吧,考虑到实际的解析器长80行(不包括扩展方法和帮助器函数),漏洞有多难? 还有单元测试,实际上是多三倍的代码行! 我对自己动手

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值