csv文件示例
目录 (Table of Contents)
介绍 (Introduction)
I was having this conversation with a coworker about using a third-party library vs. rolling your own, specifically regarding parsing a comma separated value (CSV) data, or more generally, any text data separated by a known delimiter.
我正在与一位同事进行对话,讨论使用第三方库而不是滚动自己的库,特别是解析逗号分隔值(CSV)数据,或更一般而言,解析由已知定界符分隔的任何文本数据。
This subject has been pretty much beaten to death and there definitely are many libraries in every language under the sun to parse CSV files. My coworker brought this particular article to my attention: CSV Parsing in .NET Core which is a useful example (which I ignore here) of some edge cases -- certainly not all of them. The article references a couple projects, CsvHelper and TinyCsvParser and one can easily find several CSV parsers on Code Project:
这个主题已经被打死了,在阳光下肯定有很多使用每种语言的库都可以解析CSV文件。 我的同事引起了我这篇特别的文章: .NET Core中的CSV解析,这是一些极端情况的有用示例(在此我忽略),当然不是全部。 本文引用了CsvHelper和TinyCsvParser两个项目,并且可以在Code Project上轻松找到几个CSV解析器:
CSV File Parser (by #realJSOP!)
CSV文件解析器 (通过#realJSOP!)
That last link is quite comprehensive in parsers that Tomas Takac evaluated and has a great graph of the finite state machine to parse CSV data. Even more interesting (to me at least), he references RFC-4180 which documents Common Format and MIME Type for Comma-Separated Values (CSV) Files. Pretty cool.
最后一个链接在Tomas Takac评估的解析器中相当全面,并且具有用于解析CSV数据的有限状态机的图形。 更有趣的是(至少对我而言),他引用了RFC-4180 ,该文件记录了逗号分隔值(CSV)文件的通用格式和MIME类型。 很酷
That said, having looked at a few myself and having some very basic requirements (really no edge cases), the question again came up as to whether to roll my own or use a library. Some of the discussion points were:
就是说,在看了几个我自己并且有一些非常基本的要求(真的没有极端情况)之后,再次提出了一个问题,就是要自己动手还是使用一个库。 一些讨论要点是:
- What if there's a bug in the roll-your-own code? 如果您自己的代码有错误怎么办?
- What if it needs to be extended? 如果需要扩展怎么办?
- What if the file specification changes? 如果文件规格更改怎么办?
My answers here are biased of course, but they still might be of some value:
我在这里的答案当然是有偏见的,但它们可能仍然有价值:
What if there's a bug in the roll-your-own code?
如果您自己的代码有错误怎么办?
Well, given that the actual parser is 80 lines long (not including extension methods and helper functions), how hard is it to have bugs? And there's unit tests, which actually are 3x more lines of code! My first reaction to rolling my own is always a question to complexity: how complex is the problem to be solved, and how complex are the third party libraries out there that solve that problem? The fewer lines of code, the less bugs and the fewer unit tests one has to write. It cannot be guaranteed that a third-party library is bug free or has implemented decent unit tests. Granted, this is all rather moot for non-edge cases anyways. Still, looking at the code for some of these libraries, in which the implementation is in the thousands of lines of code, I really really can't justify adding that to my code base -- the npm install
, nuget install
, and so forth are so easy that they preclude any thinking about what you're about to do. This is really a dangerous situation, especially when packages have hidden dependencies and you're suddenly installing 10s or 100s of additional, breakable, dependencies.
好吧,考虑到实际的解析器长80行(不包括扩展方法和帮助器函数),漏洞有多难? 还有单元测试,实际上是多三倍的代码行! 我对自己动手做的第一个React始终是对复杂性的问题:要解决的问题有多复杂,解决问题的第三方库有多复杂? 更少的代码行,更少的错误和更少的单元测试。 不能保证第三方库没有错误或已实施了不错的单元测试。 诚然,对于非边缘情况而言,这一切都是毫无意义的。 不过,在看一些这些库,其中实现在数千行代码的代码,我真的真的不能证明补充说,我的代码基地- npm install
, nuget install
,等等如此简单,以至于您无需考虑您将要做什么。 这确实是一种危险的情况,尤其是当程序包具有隐藏的依赖项,而您突然要安装10或100的其他易碎依赖项时。
What if it needs to be extended?
如果需要扩展怎么办?
Extensibility is easily implemented with virtual methods, assuming the implementation breaks the process apart in small steps. There are of course other mechanisms. I find this question however to be a specious argument as it poses a future "what-if" scenario that is unrealistic given the requirements scope. Certainly if the requirements say that all sorts of edge cases have to be handled, then yes, let's go for a third-party library. But I disagree with the argument that edge-cases will suddenly crop up in data files that are produced by other programs and in such a way as to break the parser.
假设实现将过程分为小步,则可以使用虚拟方法轻松实现可扩展性。 当然,还有其他机制。 但是,我认为这个问题是一个似是而非的论点,因为它提出了未来的“假设”场景,这在给定需求范围的情况下是不现实的。 当然,如果要求说必须处理各种极端情况,那么可以,让我们去第三方库。 但是我不同意这样一种说法,即边缘情况会突然出现在其他程序生成的数据文件中,从而破坏解析器。
What if the file specification changes?
如果文件规格更改怎么办?
This is definitely plausible but is not actually an issue of the parser but the class in which the data is parsed and the mapping of properties to fields. So this is not a valid argument in my opinion.
这绝对是合理的,但实际上不是解析器的问题,而是解析数据的类以及属性到字段的映射。 因此,我认为这不是有效的论点。
实施时间! (Implementation Time!)
So having just said all that, I put together a proof of concept. It's actually a good example to see how far programming languages have evolved, as this code is basically just a set of map, reduce, and filter operations. It took me about 30 minutes to write this code (compared to my writing the F# example, which took over 5 hours, as F# is not something I'm particularly well-skilled at!
综上所述,我整理了概念证明。 实际上,这是一个很好的示例,可以了解编程语言的发展程度,因为该代码基本上只是一组map,reduce和filter操作。 我花了大约30分钟的时间来编写这段代码(相比之下,我花了5个小时才编写了F#示例,因为F# 不是我特别熟练的技能!
数据集 (The Dataset)
I decided to use a simplified version of the data in that article on CSV Parsing in .NET Core referenced earlier:
我决定在之前引用的.NET Core中的CSV解析文章中使用数据的简化版本:
string data =
@"Make,Model,Type,Year,Cost,Comment
Toyota,Corolla,Car,1990,2000.99,A Comment";
You'll note here that I'm assuming there's a header.
您会在这里注意到,我假设有一个标题。
进入的班级 (The Class it Goes Into)
At a minimum, I wanted the ability to rename properties that are mapped to fields in the CSV:
至少,我希望能够重命名映射到CSV中的字段的属性:
public enum AutomobileType
{
None,
Car,
Truck,
Motorbike
}
public class Automobile : IPopulatedFromCsv
{
public string Make { get; set; }
public string Model { get; set; }
public AutomobileType Type { get; set; }
public int Year { get; set; }
[ColumnMap(Header = "Cost")]
public decimal Price { get; set; }
public string Comment { get; set; }
}
Note the ColumnMap
attribute:
请注意ColumnMap
属性:
public class ColumnMapAttribute : Attribute
{
public string Header { get; set; }
}
and for giggles, I require that any class that is going to be populated by the CSV parser implements IPopulatedFromCsv
as a useful hint to the user.
对于傻笑,我要求将由CSV解析器填充的任何类都将IPopulatedFromCsv
实施为对用户的有用提示。
代码 (The Code)
Here's the core parser:
这是核心解析器:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using Clifton.Core.Utils;
namespace SimpleCsvParser
{
public class CsvParser
{
public virtual List<T> Parse<T>(string data, string delimiter)
where T : IPopulatedFromCsv, new()
{
List<T> items = new List<T>();
var lines = GetLines(data);
var headerFields = GetFieldsFromHeader(lines.First(), delimiter);
var propertyList = MapHeaderToProperties<T>(headerFields);
lines.Skip(1).ForEach(l => items.Add(Populate<T>(l, delimiter, propertyList)));
return items;
}
// Get all non-blank lines
protected virtual IEnumerable<string> GetLines(string data)
{
var lines = data.Split(new char[] { '\r', '\n' }).Where
(l => !String.IsNullOrWhiteSpace(l));
return lines;
}
protected virtual T Create<T>() where T : IPopulatedFromCsv, new()
{
return new T();
}
protected virtual string[] GetFieldsFromHeader(string headerLine, string delimiter)
{
// The || allows blank headers if at least one header or delimiter exists.
var fields = headerLine.Split(delimiter)
.Select(f => f.Trim())
.Where(f =>
!string.IsNullOrWhiteSpace(f) ||
!String.IsNullOrEmpty(headerLine)).ToArray();
return fields;
}
protected virtual List<PropertyInfo> MapHeaderToProperties<T>(string[] headerFields)
{
var map = new List<PropertyInfo>();
Type t = typeof(T);
// Include null properties so these are skipped when parsing the data lines.
headerFields
.Select(f =>
(
f,
t.GetProperty(f,
BindingFlags.Instance |
BindingFlags.Public |
BindingFlags.IgnoreCase |
BindingFlags.FlattenHierarchy) ?? AliasedProperty(t, f)
)
)
.ForEach(fp => map.Add(fp.Item2));
return map;
}
protected virtual PropertyInfo AliasedProperty(Type t, string fieldName)
{
var pi = t.GetProperties()
.Where(p => p.GetCustomAttribute<ColumnMapAttribute>()
?.Header
?.CaseInsensitiveEquals(fieldName)
?? false);
Assert.That<CsvParserDuplicateAliasException>(pi.Count() <= 1,
$"{fieldName} is aliased more than once.");
return pi.FirstOrDefault();
}
protected virtual T Populate<T>(
string line,
string delimiter,
List<PropertyInfo> props) where T : IPopulatedFromCsv, new()
{
T t = Create<T>();
var fieldValues = line.Split(delimiter);
// Unmapped fields will have a null property, and we also skip empty fields,
// and trim the field value before doing the type conversion.
props.ForEach(fieldValues,
(p, v) => p != null && !String.IsNullOrWhiteSpace(v),
(p, v) => p.SetValue(t, Converter.Convert(v.Trim(), p.PropertyType)));
return t;
}
}
}
There's copious use of LINQ, null
continuation and null
coalescence operators to map fields to properties, and properties with names overridden with the ColumnMap
attribute, and handle cases where there is no property that maps to a specific field.
大量使用LINQ, null
延续和null
合并运算符将字段映射到属性,以及名称被ColumnMap
属性覆盖的属性,并处理没有属性映射到特定字段的情况。
Also note that all methods, even Create
are virtual so that they can be overridden for custom behaviors.
还要注意,所有方法,甚至是Create
都是虚拟的,因此可以为自定义行为覆盖它们。
AliasedProperty方法 (The AliasedProperty Method)
This piece of code is not entirely obvious. What it does is:
这段代码并不完全清楚。 它的作用是:
Get all the
public
instance properties of the class we're populating获取我们正在填充的类的所有
public
实例属性- Try and get the custom attribute 尝试获取自定义属性
Using
null
continuation, get the aliased header使用
null
连续,获取别名标头Again using
null
continuation, compare the aliased header with the CSV header再次使用
null
连续,将别名标头与CSV标头进行比较Using
null
coalescence, return either the result of the comparison or returnfalse
to thewhere
condition使用
null
合并,返回比较结果或将false
返回到where
条件Assert that
0
or1
properties have that alias断言
0
或1
属性具有该别名Finally,
FirstOrDefault
is used because we now know we have only 0 or 1 items in the resulting enumeration, and we want to returnnull
as a placeholder if after all this, there is no property that is aliased to the CSV header field value.最后,使用
FirstOrDefault
是因为我们现在知道在结果枚举中只有0或1个项目,并且如果所有这些都没有别名作为CSV标头字段值的属性,我们想返回null
作为占位符。
填充方法 (The Populate Method)
The piece of code uses an extension method ForEach
:
该代码段使用扩展方法ForEach
:
that iterates both collections in lock step and..
在锁定步骤中迭代两个集合 。
- executes the action only when the condition is met. 仅在满足条件时执行操作。
Thus the code:
因此代码:
props.ForEach(fieldValues,
(p, v) => p != null && !String.IsNullOrWhiteSpace(v),
(p, v) => p.SetValue(t, Converter.Convert(v.Trim(), p.PropertyType)));
does the following:
执行以下操作:
The two collections are
props
andfieldValues
.这两个集合是
props
和fieldValues
。The property must exist (remember, we use
null
as a placeholder).该属性必须存在(请记住,我们使用
null
作为占位符)。The value is not
null
or whitespace.该值不能为
null
或空格。When that condition is met, we use reflection to set the property value, using the
Converter
helper.满足该条件后,我们将使用
Converter
助手来使用反射来设置属性值。
使用解析器 (Using the Parser)
class Program
{
static void Main(string[] args)
{
string data =
@"Make,Model,Type,Year,Cost,Comment
Toyota,Corolla,Car,1990,2000.99,A Comment";
CsvParser parser = new CsvParser();
var recs = parser.Parse<Automobile>(data, ",");
var result = JsonConvert.SerializeObject(recs);
Console.WriteLine(result);
}
}
And the output:
并输出:
[{"Make":"Toyota","Model":"Corolla","Type":1,"Year":1990,"Price":2000.99,"Comment":"A Comment"}]
The JsonConvert is my anti-example of using a huge third party library simply so as to write out an array of instances values! Because I was lazy! See? I validate my own argument on the dangers of reaching for a third-party library when something much simpler will do!
JsonConvert是我的反示例 ,该示例只是使用巨大的第三方库来写出实例值数组! 因为我很懒! 看到? 我会验证我自己的论点,即当事情变得简单得多时,获取第三方库的危险!
支持代码 (Supporting Code)
So there's a bunch of supporting code. We have extension methods:
因此,有一堆支持代码。 我们有扩展方法:
public static class ExtensionMethods
{
public static void ForEach<T>(this IEnumerable<T> collection, Action<T> action)
{
foreach (var item in collection)
{
action(item);
}
}
// Double iteration -- both collections are iterated over
// and are assumed to be of equal length
public static void ForEach<T, U>(
this IEnumerable<T> collection1,
IList<U> collection2,
Func<T, U,
bool> where,
Action<T, U> action)
{
int n = 0;
foreach (var item in collection1)
{
U v2 = collection2[n++];
if (where(item, v2))
{
action(item, v2);
}
}
}
public static string[] Split(this string str, string splitter)
{
return str.Split(new[] { splitter }, StringSplitOptions.None);
}
public static bool CaseInsensitiveEquals(this string a, string b)
{
return String.Equals(a, b, StringComparison.OrdinalIgnoreCase);
}
}
And we have a simple Assert
static
class:
我们有一个简单的Assert
static
类:
public static class Assert
{
public static void That<T>(bool condition, string msg) where T : Exception, new()
{
if (!condition)
{
var ex = Activator.CreateInstance(typeof(T), new object[] { msg }) as T;
throw ex;
}
}
}
And we have this Converter
class I wrote ages ago:
我们有很久以前写的这个Converter
类:
public class Converter
{
public static object Convert(object src, Type destType)
{
object ret = src;
if ((src != null) && (src != DBNull.Value))
{
Type srcType = src.GetType();
if ((srcType.FullName == "System.Object") ||
(destType.FullName == "System.Object"))
{
ret = src;
}
else
{
if (srcType != destType)
{
TypeConverter tcSrc = TypeDescriptor.GetConverter(srcType);
TypeConverter tcDest = TypeDescriptor.GetConverter(destType);
if (tcSrc.CanConvertTo(destType))
{
ret = tcSrc.ConvertTo(src, destType);
}
else if (tcDest.CanConvertFrom(srcType))
{
if (srcType.FullName == "System.String")
{
ret = tcDest.ConvertFromInvariantString((string)src);
}
else
{
ret = tcDest.ConvertFrom(src);
}
}
else
{
// If the target type is a base class of the source type,
// then we don't need to do any conversion.
if (destType.IsAssignableFrom(srcType))
{
ret = src;
}
else
{
// If no conversion exists, throw an exception.
throw new ConverterException("Can't convert from " +
src.GetType().FullName +
" to " +
destType.FullName);
}
}
}
}
}
else if (src == DBNull.Value)
{
if (destType.FullName == "System.String")
{
// convert DBNull.Value to null for strings.
ret = null;
}
}
return ret;
}
}
Really, the last commit was Nov 26, 2015! It's so old that inline string
parsing $"{}"
didn't exist (or I wasn't aware of it, haha.)
确实, 最后一次提交是2015年11月26日! 它太老了,以至于没有解析$"{}"
内联string
(或者我不知道,哈哈。)
单元测试 (Unit Tests)
So then I thought, well, let's write some unit tests, since this code is actually unit-testable:
所以我想,让我们写一些单元测试,因为这段代码实际上是可单元测试的:
public class NoAliases : IPopulatedFromCsv
{
public string A { get; set; }
public string B { get; set; }
public string C { get; set; }
}
public class WithAlias : IPopulatedFromCsv
{
public string A { get; set; }
public string B { get; set; }
[ColumnMap(Header = "C")]
public string TheCField { get; set; }
}
public class WithDuplicateAlias : IPopulatedFromCsv
{
public string A { get; set; }
public string B { get; set; }
[ColumnMap(Header = "C")]
public string TheCField { get; set; }
[ColumnMap(Header = "C")]
public string Oops { get; set; }
}
public class NoMatchingField : IPopulatedFromCsv
{
public string A { get; set; }
public string B { get; set; }
public string C { get; set; }
public string D { get; set; }
}
public class NoMatchingProperty : IPopulatedFromCsv
{
public string A { get; set; }
public string C { get; set; }
}
public class Disordered : IPopulatedFromCsv
{
public string C { get; set; }
public string B { get; set; }
public string A { get; set; }
}
// public class NoMatching
/// <summary>
/// By inheriting from CsvParser, we have access to the protected methods.
/// </summary>
[TestClass]
public class CsvParserTests : CsvParser
{
[TestMethod]
public void BlankAndWhitespaceLinesAreSkippedTest()
{
GetLines("").Count().Should().Be(0);
GetLines(" ").Count().Should().Be(0);
GetLines("\r").Count().Should().Be(0);
GetLines("\n").Count().Should().Be(0);
GetLines("\r\n").Count().Should().Be(0);
}
[TestMethod]
public void CRLFBothCreateNewLineTest()
{
GetLines("a\rb").Count().Should().Be(2);
GetLines("a\nb").Count().Should().Be(2);
GetLines("a\r\nb").Count().Should().Be(2);
}
[TestMethod]
public void HeaderSplitByDelimiterTest()
{
GetFieldsFromHeader("a,b,c", ",").Count().Should().Be(3);
}
[TestMethod]
public void SingleHeaderTest()
{
GetFieldsFromHeader("a", ",").Count().Should().Be(1);
}
[TestMethod]
public void NoHeaderTest()
{
GetFieldsFromHeader("", ",").Count().Should().Be(0);
}
[TestMethod]
public void EmptyHeaderFieldIsAllowedTest()
{
GetFieldsFromHeader("a,,c", ",").Count().Should().Be(3);
}
[TestMethod]
public void BlankHeaderFieldsAreAllowedTest()
{
GetFieldsFromHeader(",,", ",").Count().Should().Be(3);
}
[TestMethod]
public void HeaderIsTrimmedTest()
{
GetFieldsFromHeader("a ", ",")[0].Should().Be("a");
GetFieldsFromHeader(" a", ",")[0].Should().Be("a");
}
[TestMethod]
public void DirectMappingTest()
{
MapHeaderToProperties<NoAliases>
(new string[] { "A", "B", "C" }).Count().Should().Be(3);
}
[TestMethod]
public void AliasTest()
{
MapHeaderToProperties<WithAlias>
(new string[] { "A", "B", "C" }).Count().Should().Be(3);
}
[TestMethod]
public void CaseInsensitiveTest()
{
MapHeaderToProperties<NoAliases>
(new string[] { "a", "b", "c" }).Count().Should().Be(3);
}
[TestMethod]
public void AdditionalPropertyTest()
{
MapHeaderToProperties<NoMatchingField>
(new string[] { "A", "B", "C" }).Count().Should().Be(3);
}
[TestMethod]
public void MissingPropertyTest()
{
MapHeaderToProperties<NoMatchingProperty>
(new string[] { "A", "B", "C" }).Count().Should().Be(3);
}
[TestMethod]
public void FieldNotMappedTest()
{
var props = MapHeaderToProperties<NoMatchingProperty>(new string[] { "A", "B", "C" });
props.Count().Should().Be(3);
props[0].Should().NotBeNull();
props[1].Should().BeNull();
props[2].Should().NotBeNull();
}
[TestMethod]
public void CaseInsensitiveAliasTest()
{
AliasedProperty(typeof(WithAlias), "c").Should().NotBeNull();
}
[TestMethod]
public void PropertyNotFoundTest()
{
AliasedProperty(typeof(NoAliases), "D").Should().BeNull();
}
[TestMethod]
public void DuplicateAliasTest()
{
this.Invoking((_) => AliasedProperty(typeof(WithDuplicateAlias), "C")).Should()
.Throw<CsvParserDuplicateAliasException>()
.WithMessage("C is aliased more than once.");
}
[TestMethod]
public void PopulateFieldsTest()
{
var props = MapHeaderToProperties<NoAliases>(new string[] { "a", "b", "c" });
var t = Populate<NoAliases>("1,2,3", ",", props);
t.A.Should().Be("1");
t.B.Should().Be("2");
t.C.Should().Be("3");
}
[TestMethod]
public void PopulatedFieldsAreTrimmedTest()
{
var props = MapHeaderToProperties<NoAliases>(new string[] { "a", "b", "c" });
var t = Populate<NoAliases>("1 , 2, 3 ", ",", props);
t.A.Should().Be("1");
t.B.Should().Be("2");
t.C.Should().Be("3");
}
[TestMethod]
public void PopulateFieldNotMappedTest()
{
var props = MapHeaderToProperties<NoMatchingProperty>(new string[] { "a", "b", "c" });
var t = Populate<NoMatchingProperty>("1,2,3", ",", props);
t.A.Should().Be("1");
t.C.Should().Be("3");
}
[TestMethod]
public void OrderedTest()
{
var map = MapHeaderToProperties<NoAliases>(new string[] { "A", "B", "C" });
map[0].Name.Should().Be("A");
map[1].Name.Should().Be("B");
map[2].Name.Should().Be("C");
}
[TestMethod]
public void DisorderedTest()
{
// Field mapping should be independent of the order of the fields.
var map = MapHeaderToProperties<Disordered>(new string[] { "A", "B", "C" });
map[0].Name.Should().Be("A");
map[1].Name.Should().Be("B");
map[2].Name.Should().Be("C");
}
}

Woohoo!
hoo!
没有泛型类的F#实现 (F# Implementation Without Generic Class)
And then I decided to flog myself by rewriting this in F# and learn some things along the way. I wanted to try this because I really like the forward pipe |>
operator in F#, as it lets you chain functions together using the output of one function as the input of another function. I also "made the mistake" of not implementing a CsvParser
class -- this led to discovering that generics apply only to members, not functions, but I discovered a really cool workaround. So here's my crazy F# implementation:
然后,我决定通过用F#重写它来鞭打自己,并在此过程中学习一些知识。 我想尝试一下,因为我真的很喜欢F#中的前向管道|>
运算符,因为它允许您使用一个函数的输出作为另一个函数的输入将函数链接在一起。 我还“犯了一个错误”,即没有实现CsvParser
类 -这导致发现泛型仅适用于成员 ,而不适用于函数,但是我发现了一个非常CsvParser
解决方法。 所以这是我疯狂的F#实现:
open System
open System.Reflection
open System.ComponentModel
type AutomobileType =
None = 0
| Car = 1
| Truck = 2
| Motorbike = 3
// Regarding AllowNullLiteral: https://sergeytihon.com/2013/04/10/f-null-trick/
[<AllowNullLiteral>]
type ColumnMapAttribute(headerName) =
inherit System.Attribute()
let mutable header : string = headerName
member x.Header with get() = header
type Automobile() =
let mutable make : string = null
let mutable model : string = null
let mutable year : int = 0
let mutable price : decimal = 0M
let mutable comment : string = null
let mutable atype : AutomobileType = AutomobileType.None
member x.Make with get() = make and set(v) = make <- v
member x.Model with get() = model and set(v) = model <- v
member x.Type with get() = atype and set(v) = atype <- v
member x.Year with get() = year and set(v) = year <- v
member x.Comment with get() = comment and set(v) = comment <- v
[<ColumnMap("Cost")>]
member x.Price with get() = price and set(v) = price <- v
module String =
let notNullOrWhiteSpace = not << System.String.IsNullOrWhiteSpace
// I would never have been able to figure this out.
// https://stackoverflow.com/a/32345373/2276361
type TypeParameter<'a> = TP
[<EntryPoint>]
let main _ =
let mapHeaderToProperties(_ : TypeParameter<'a>) fields =
let otype = typeof<'a>
fields |>
Array.map (fun f ->
match otype.GetProperty(f,
BindingFlags.Instance |||
BindingFlags.Public |||
BindingFlags.IgnoreCase |||
BindingFlags.FlattenHierarchy) with
| null -> match otype.GetProperties() |> Array.filter
(fun p ->
match p.GetCustomAttribute<ColumnMapAttribute>() with
| null -> false
// head::tail requires a list, not an array!
| a -> a.Header = f) |> Array.toList with
| head::_ -> head
| [] -> null
| a -> a
)
let convert(v, destType) =
let srcType = v.GetType();
let tcSrc = TypeDescriptor.GetConverter(srcType)
let tcDest = TypeDescriptor.GetConverter(destType)
if tcSrc.CanConvertTo(destType) then
tcSrc.ConvertTo(v, destType)
else
if tcDest.CanConvertFrom(srcType) then
if srcType.FullName = "System.String" then
tcDest.ConvertFromInvariantString(v)
else
tcDest.ConvertFrom(v);
else
if destType.IsAssignableFrom(srcType) then
v :> obj
else
null
let populate(_ : TypeParameter<'a>)
(line:string) (delimiter:char) (props: PropertyInfo[]) =
let t = Activator.CreateInstance<'a>()
let vals = line.Split(delimiter)
for i in 0..vals.Length-1 do
match (props.[i], vals.[i]) with
| (p, _) when p = null -> ()
| (_, v) when String.isNullOrWhiteSpace v -> ()
| (p, v) -> p.SetValue(t, convert(v.Trim(), p.PropertyType))
t
let data ="Make,Model,Type,Year,Cost,Comment\r\nToyota,Corolla,Car,1990,2000.99,A Comment";
let delimiter = ',';
let lines = data.Split [|'\r'; '\n'|] |> Array.filter
(fun l -> String.notNullOrWhiteSpace l)
let headerLine = lines.[0]
let fields = headerLine.Split(delimiter) |> Array.map (fun l -> l.Trim()) |>
Array.filter (fun l -> String.notNullOrWhiteSpace l ||
String.notNullOrWhiteSpace headerLine)
let props = mapHeaderToProperties(TP:TypeParameter<Automobile>) fields
let recs = lines |> Seq.skip 1 |> Seq.map (fun l ->
populate(TP:TypeParameter<Automobile>) l delimiter props) |> Seq.toList
printf "fields: %A\r\n" fields
printfn "lines: %A\r\n" lines
printfn "props: %A\r\n" props
recs |> Seq.iter
(fun r -> printfn "%A %A %A %A %A %A\r\n" r.Make r.Model r.Year r.Type r.Price r.Comment)
0
And the output is:
输出为:
fields: [|"Make"; "Model"; "Type"; "Year"; "Cost"; "Comment"|]
lines: [|"Make,Model,Type,Year,Cost,Comment";
"Toyota,Corolla,Car,1990,2000.99,A Comment"|]
props: [|System.String Make; System.String Model; AutomobileType Type; Int32 Year;
System.Decimal Price; System.String Comment|]
"Toyota" "Corolla" 1990 Car 2000.99M "A Comment"
It was a struggle not to use FirstOrDefault
:
不使用FirstOrDefault
是一个挣扎:
// Using FirstOrDefault:
fields |>
Array.map (fun f ->
match otype.GetProperty(f,
BindingFlags.Instance |||
BindingFlags.Public |||
BindingFlags.IgnoreCase |||
BindingFlags.FlattenHierarchy) with
| null -> otype.GetProperties() |>
Array.filter (fun p -> match p.GetCustomAttribute<ColumnMapAttribute>() with
| null -> false
| a -> a.Header = f)
|> Enumerable.FirstOrDefault
| a -> a
)
But I persevered! A few things I learned:
但是我坚持不懈! 我学到的一些东西:
match
is touchy, I spend a while figuring out why an array[]
wouldn't work. My F# ignorance rears its ugly head.match
很敏感,我花了一段时间弄清楚为什么数组[]
不起作用。 我的F#无知抬起了丑陋的头。F# doesn't like nullable types, and C# is full of them, so I learned about the
[<AllowNullLiteral>]
attribute and its use!F#不喜欢可为空的类型,而C#充满了它们,因此我了解了
[<AllowNullLiteral>]
属性及其用法!This
type TypeParameter<'a> = TP
was one gnarly piece of code to spoof functions into being "generic!" I don't take the credit for that -- see the code comments.此
type TypeParameter<'a> = TP
是一个肮脏的代码,旨在将功能欺骗为“泛型”! 我不为此而功劳-请参阅代码注释。
使用SimpleCsvParser <'a>的F#实现 (F# Implementation with SimpleCsvParser<'a>)
So I decided to rework the implementation into a true generic class -- very easily done:
因此,我决定将实现重做为真正的泛型类-非常容易做到:
type SimpleCsvParser<'a>() =
member __.mapHeaderToProperties fields =
let otype = typeof<'a>
fields |>
Array.map (fun f ->
match otype.GetProperty(f,
BindingFlags.Instance |||
BindingFlags.Public |||
BindingFlags.IgnoreCase |||
BindingFlags.FlattenHierarchy) with
| null -> match otype.GetProperties() |> Array.filter (fun p ->
match p.GetCustomAttribute<ColumnMapAttribute>() with
| null -> false
| a -> a.Header = f) |> Array.toList with
| head::_ -> head
| [] -> null
| a -> a
)
member __.populate (line:string) (delimiter:char) (props: PropertyInfo[]) =
let t = Activator.CreateInstance<'a>()
let vals = line.Split(delimiter)
for i in 0..vals.Length-1 do
match (props.[i], vals.[i]) with
| (p, _) when p = null -> ()
| (_, v) when String.isNullOrWhiteSpace v -> ()
| (p, v) -> p.SetValue(t, convert(v.Trim(), p.PropertyType))
t
member this.parse (data:string) (delimiter:char) =
let lines = data.Split [|'\r'; '\n'|] |> Array.filter
(fun l -> String.notNullOrWhiteSpace l)
let headerLine = lines.[0]
let fields = headerLine.Split(delimiter) |> Array.map
(fun l -> l.Trim()) |> Array.filter
(fun l -> String.notNullOrWhiteSpace l ||
String.notNullOrWhiteSpace headerLine)
let props = this.mapHeaderToProperties fields
lines |> Seq.skip 1 |> Seq.map
(fun l -> this.populate l delimiter props) |> Seq.toList
Note that convert
, being a general purpose method, is not implemented in the class. Also note that the bizarre type TypeParameter<'a> = TP
type definition isn't needed, as we use the class like this:
请注意,类中未实现convert
,它是一种通用方法。 还要注意, type TypeParameter<'a> = TP
奇怪的type TypeParameter<'a> = TP
类型定义,因为我们使用这样的类:
let parser = new SimpleCsvParser<Automobile>();
let recs2 = parser.parse data delimiter
recs2 |> Seq.iter
(fun r -> printfn "%A %A %A %A %A %A\r\n" r.Make r.Model r.Year r.Type r.Price r.Comment)
结论 (Conclusion)
As I mentioned, the C# code (excluding unit tests) took 30 minutes to write. The F# code, about 5 hours. A great learning experience!
如前所述,C#代码(不包括单元测试)花费了30分钟的时间来编写。 F#代码,大约5个小时。 很棒的学习经验!
But more to the point is really still the unanswered question as to whether you want to roll your own or use a third party library. My view is that if the requirement is simple and unlikely to change, rolling your own should at least be considered before reaching for a third party library, especially if the library has one or more of the following qualities:
但是,更重要的是,对于您是要自己发行还是使用第三方库,仍然是一个悬而未决的问题。 我的观点是,如果要求很简单并且不太可能改变,那么在寻求第三方图书馆之前,至少应该考虑自己滚动,尤其是如果图书馆具有以下一项或多项素质:
- Has a significantly larger code base because it's trying to handle all sorts of scenarios 具有较大的代码库,因为它正在尝试处理各种情况
- Has many dependencies on other libraries 对其他库有很多依赖
- Does not have good test coverage 没有很好的测试范围
- Does not have good documentation 没有好的文档
- Is not extensible 不可扩展
- It takes more time to evaluate libraries than to roll the code! 评估库比滚动代码花费更多的时间!
历史 (History)
13th January, 2020: Initial version
2020年1月13 日 :初始版本
翻译自: https://www.codeproject.com/Articles/5256497/Rolling-Your-Own-A-Simple-CSV-Parser-Example
csv文件示例