ETL,是英文 Extract-Transform-Load 的缩写,用来描述将数据从来源端经过抽取(extract)、转换(transform)、加载(load)至目的端的过程。ETL一词较常用在数据仓库,但其对象并不限于数据仓库。ETL是构建数据仓库的重要一环,用户从数据源抽取出所需的数据,经过数据清洗,最终按照预先定义好的数据仓库模型,将数据加载到数据仓库中去。
1. 安装ChoETL
Cinchoo ETL可以从Nuget下载,在Nuget包管理器控制台使用GUI或者以下命令:
Install-Package ChoETL
将命名空间添加到程序:
using ChoETL;
2. 加载/写入文件
要加载/写入任何文件(CSV,FixedLength,Xml,JSON,KVP),只需使用适当的读取器/写入器组件来解析它.
CSV - ChoCSVReader / ChoCsvWriter
FixedLength - ChoFixedLengthReader / ChoFixedLengthWriter
XML - ChoXmlReader / ChoXmlWriter
JSON - ChoJSONReader / ChoJSONWriter
KVP - ChoKVPReader / ChoKVPWriter
3. CSV文件读写
3.1 新建一个csv文件,文件命名为"Emp.csv", 输入以下内容后保存文件.
1,Tom 2,Carl 3,Mark
3.2 Iterator(迭代器)方式加载
foreach (var e in new ChoCSVReader("Emp.csv")) { Console.WriteLine(e.ToStringEx()); }
3.3 Loop(循环)方式加载
var reader = new ChoCSVReader("Emp.csv"); object rec = null; while ((rec = reader.Read()) != null) { Console.WriteLine(rec.ToStringEx()); }
3.4 List<T>(泛型List)方式写入
List<ExpandoObject> objs = new List<ExpandoObject>(); dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; objs.Add(rec1); dynamic rec2 = new ExpandoObject(); rec2.Id = 2; rec2.Name = "Jason"; objs.Add(rec2); using (var parser = new ChoCSVWriter("Emp.csv")) { parser.Write(objs); }
3.5 Object写入
using (var parser = new ChoCSVWriter("Emp.csv")) { dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; parser.Write(item); dynamic rec1 = new ExpandoObject(); rec1.Id = 2; rec1.Name = "Jason"; parser.Write(item); }
4. FixedLength文件读写(例如:TSV)
4.1 新建一个txt文件,文件命名为"Emp.txt", 输入以下内容后(以一个或多个TAB键间隔)保存文件.
Id Name 1 Carl 2 Mark
4.2 Iterator(迭代器)方式加载
foreach (dynamic e in new ChoFixedLengthReader("Emp.txt").WithFirstLineHeader()) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
4.3 Loop(循环)方式加载
var reader = new ChoFixedLengthReader("Emp.txt").WithFirstLineHeader(); dynamic rec; while ((rec = reader.Read()) != null) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
4.4 Field specs方式加载
foreach (var e in new ChoFixedLengthReader("Emp.txt").WithFirstLineHeader().WithField("Id", 0, 8).WithField(8, 10)) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
4.5 POCO方式加载
public partial class Employee { [ChoFixedLengthRecordField(0, 8)] public int Id { get; set; } [ChoFixedLengthRecordField(8, 10)] public string Name { get; set; } } foreach (var e in new FixedLengthReader<Employee>("Emp.txt")) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
4.6 List<T>(泛型List)方式写入
List<ExpandoObject> objs = new List<ExpandoObject>(); dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; objs.Add(rec1); dynamic rec2 = new ExpandoObject(); rec2.Id = 2; rec2.Name = "Jason"; objs.Add(rec2); using (var parser = new ChoFixedLengthWriter("Emp.txt").WithFirstLineHeader(). WithField("Id", 0, 8). WithField("Name", 8, 10)) { parser.Write(objs); }
4.7 Object写入
using (var parser = new ChoFixedLengthWriter("Emp.txt").WithFirstLineHeader(). WithField("Id", 0, 8). WithField("Name", 8, 10)) { dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; parser.Write(item); dynamic rec1 = new ExpandoObject(); rec1.Id = 2; rec1.Name = "Jason"; parser.Write(item); }
5. XML文件读写
5.1 新建一个xml文件,文件命名为"Emp.xml", 输入以下内容后保存文件.
<Employees> <Employee Id='1'> <Name>Tom</Name> </Employee> <Employee Id='2'> <Name>Mark</Name> </Employee> </Employees>
5.2 Iterator(迭代器)方式加载
foreach (dynamic e in new ChoXmlReader("Emp.xml")) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
5.3 Loop(循环)方式加载
var reader = new ChoXmlReader("Emp.xml"); dynamic rec; while ((rec = reader.Read()) != null) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
5.4 POCO方式加载
public class Employee { public int Id { get; set; } public string Name { get; set; } } foreach (var e in new ChoXmlReader<Employee>("Emp.xml")) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
5.5 List<T>(泛型List)方式写入
List<ExpandoObject> objs = new List<ExpandoObject>(); dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; objs.Add(rec1); dynamic rec2 = new ExpandoObject(); rec2.Id = 2; rec2.Name = "Jason"; objs.Add(rec2); using (var parser = new ChoXmlWriter("Emp.xml").WithXPath("Employees/Employee")) { parser.Write(objs); }
5.6 Object写入
using (var parser = new ChoXmlWriter("Emp.xml").WithXPath("Employees/Employee")) { dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; parser.Write(item); dynamic rec1 = new ExpandoObject(); rec1.Id = 2; rec1.Name = "Jason"; parser.Write(item); }
5.7 POCO object写入
public class Employee { public int Id { get; set; } public string Name { get; set; } } List<Employee> objs = new List<Employee>(); objs.Add(new Employee() { Id = 1, Name = "Tom" }); objs.Add(new Employee() { Id = 2, Name = "Mark" }); using (var parser = new ChoXmlWriter<Employee>("Emp.xml").WithXPath("Employees/Employee")) { parser.Write(objs); }
6. JSON文件读写
6.1 新建一个json文件,文件命名为"Emp.json", 输入以下内容后保存文件.
[ { "Id": 1, "Name": "Mark" }, { "Id": 2, "Name": "Tom" } ]
6.2 Iterator(迭代器)方式加载
foreach (dynamic e in new ChoJSONReader("Emp.json")) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
6.3 Loop(循环)方式加载
var reader = new ChoJSONReader("Emp.json"); dynamic rec; while ((rec = reader.Read()) != null) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
6.4 POCO方式加载
public class Employee { public int Id { get; set; } public string Name { get; set; } } foreach (var e in new ChoJSONReader<Employee>("Emp.json")) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
6.5 List<T>(泛型List)方式写入
List<ExpandoObject> objs = new List<ExpandoObject>(); dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; objs.Add(rec1); dynamic rec2 = new ExpandoObject(); rec2.Id = 2; rec2.Name = "Jason"; objs.Add(rec2); using (var parser = new ChoJSONWriter("Emp.json")) { parser.Write(objs); }
6.6 Object写入
using (var parser = new ChoJSONWriter("Emp.json")) { dynamic rec1 = new ExpandoObject(); rec1.Id = 1; rec1.Name = "Mark"; parser.Write(item); dynamic rec1 = new ExpandoObject(); rec1.Id = 2; rec1.Name = "Jason"; parser.Write(item); }
6.7 POCO object写入
public class Employee { public int Id { get; set; } public string Name { get; set; } } List<Employee> objs = new List<Employee>(); objs.Add(new Employee() { Id = 1, Name = "Tom" }); objs.Add(new Employee() { Id = 2, Name = "Mark" }); using (var parser = new ChoJSONWriter<Employee>("Emp.json")) { parser.Write(objs); }
7. Key-Value(KVP)文件读写
7.1 新建一个ics文件,文件命名为"sample.ics", 输入以下内容后保存文件.
BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN VERSION:2.0 METHOD:PUBLISH BEGIN:VEVENT ORGANIZER:MAILTO:eralper@nosuchemail.com DTSTART:20090306T140000Z DTEND:20090306T150000Z LOCATION:Meeting Room 1-D TRANSP:OPAQUE SEQUENCE:0 UID:040000008200E00074C5B7101A82E00800000000B09916C3F9ADC9010000000000000000100 0000099DFBC4731297248A2CAF0885C8C4856 DTSTAMP:20090326T080109Z DESCRIPTION:A short discussion on topics related with upgrade of HiPath ProCenter SDK Applications\n SUMMARY:Upgrade HiPath ProCenter SDK Applications PRIORITY:5 X-MICROSOFT-CDO-IMPORTANCE:1 CLASS:PUBLIC BEGIN:VALARM TRIGGER:-PT15M ACTION:DISPLAY DESCRIPTION:Reminder END:VALARM END:VEVENT END:VCALENDAR
7.2 Iterator(迭代器)方式加载
using (var r = new ChoKVPReader(@"sample.ics")) { r.Configuration.RecordStart = "BEGIN:VEVENT"; r.Configuration.RecordEnd = "END:VEVENT"; r.Configuration.IgnoreEmptyLine = true; r.Configuration.Comment = ";"; foreach (dynamic item in r) { Console.WriteLine(item.SUMMARY); } }
7.3 Loop(循环)方式加载
using (var r = new ChoKVPReader(@"sample.ics")) { r.Configuration.RecordStart = "BEGIN:VEVENT"; r.Configuration.RecordEnd = "END:VEVENT"; r.Configuration.IgnoreEmptyLine = true; r.Configuration.Comment = ";"; dynamic rec; while ((rec = r.Read()) != null) { Console.WriteLine(rec.SUMMARY); } }
7.4 POCO方式加载
public class Employee { public int Id { get; set; } public string Name { get; set; } } foreach (var e in new ChoJSONReader<Employee>("Emp.json")) { Console.WriteLine("Id: " + e.Id + " Name: " + e.Name); }
7.5 写入待补充
8. GitHub
https://github.com/Cinchoo/ChoETL