以分隔符分隔的文本文件_将固定宽度/定界符分隔的文本文件内容(输入文件)转换为另一个格式的文本文件(输出文件)。 使用什么方法?

以分隔符分隔的文本文件

Definitions:

定义:

Source System
The IT system that provide the original data.
Targeted System
The IT system that will received the output data from Source System.
Input File
Referring to the original data file that generated from the Source System.
Output File
Referring to the formatted Input File.
Source File
Referring to the Final Data File that will being used by the Targeted System.
File Convertor
The program that converts the Input File to Output File.
Template File
The configuration file that defines the rules how the Output File should be generated via the File Convertor.
Admin Executables / Template File Generator
The program that defines the configuration rules how the Output File should be generated via the File Convertor.
源系统
提供原始数据的IT系统。
目标系统
将从源系统接收输出数据的IT系统。
输入文件
引用从源系统生成的原始数据文件。
输出文件
参考格式化的输入文件。
源文件
引用目标系统将要使用的最终数据文件。
文件转换器
将输入文件转换为输出文件的程序。
模板文件
定义规则的配置文件,该规则应如何通过文件转换器生成输出文件。
管理员可执行文件/模板文件生成器
定义配置规则的程序应如何通过文件转换器生成输出文件。

The Issue

问题

Relating to this file transfer protocol, there are a few issues that need to be agreed on, especially to do with how the data is being transferred.

关于此文件传输协议,需要商定一些问题,尤其是与如何传输数据有关。

There are a few options available:

有一些可用的选项:

  1. Database to Database Integration Services - Data are being transferred seamlessly between Source System and Targeted System with authenticated credentials

    数据库到数据库集成服务-数据通过身份验证在源系统和目标系统之间无缝传输

  2. Data File Transfer - Data is first exported from the Source System as a file and then imported into the Target System

    数据文件传输-数据首先从源系统作为文件导出,然后导入目标系统

  3. Manual System Entries - Data is recreated manually via some sort of user interface in the Targeted System

    手动系统条目-通过目标系统中的某种用户界面手动重新创建数据

What we will focus on in this article is the second scenario, Data File Transfer.

我们将在本文中重点介绍第二种情况,即数据文件传输

A common problem is there is no option available in the Source System to directly generate the Source File needed by Targeted System.

一个常见的问题是,源系统中没有可用的选项来直接生成目标系统所需的源文件。

Figure 1

Hence, the Source File (Input File) needs to be further processed before it (Output File) can be recognized and used by the Targeted System.

因此,在目标系统可以识别和使用源文件(输入文件)之前,需要对其进行进一步处理。

Proposed Resolution

拟议决议

1. File Formats

1.文件格式

To ensure the data to be transferred successfully from the Source System to Targeted System, first we need to determine the file format of Input File.

为了确保将数据成功地从源系统传输到目标系统,首先我们需要确定输入文件的文件格式。

First of all, we need to determine what are the sections available in the Input File, most of the cases, there could be a Header section, a Detail section, with an optional Footer section. Each of the sections could be defined differently with different fixed width to define the data that being stored.

首先,我们需要确定输入文件中可用的部分,在大多数情况下,可能会有一个Header部分,Detail部分以及一个可选的Footer部分。 可以用不同的固定宽度不同地定义每个部分,以定义要存储的数据。

Similarly for the Output File, there could have a different format which come with a Header section, a Detail section or even with a Footer section, that's all depends on the Targeted System's requirement on what data format are accepted.

类似地,对于输出文件,页眉节,详细节甚至页脚节可能具有不同的格式,这完全取决于目标系统对接受哪种数据格式的要求。

The difference could be in term of:

区别可能在于:

i) Number of fields required

i)必填字段数

ii) The sequence of fields

ii)字段顺序

iii) The conversion of values

iii)价值转换

iv) The change of fixed width / delimiter

iv)固定宽度/定界符的变化

v) The conversion from fixed width to delimiter-separated, or vice versa

v)从固定宽度到定界符分隔的转换,反之亦然

vi) The use of Derived fields

vi)派生字段的使用

And hence, an Input & Output File Mapping documentation is useful to document all the requirements.

因此,输入和输出文件映射文档对于记录所有需求很有用。

2. File Converter

2.文件 转换器

A few things to consider:

需要考虑的几件事:

i) User interface consideration

i)用户界面注意事项

Do we need to build an User Interface to enable user to operate the program or we simply make it as simple as a Console Program or a Service that running at background?

我们是否需要构建一个用户界面以使用户能够运行该程序,还是仅仅使其像在后台运行的控制台程序或服务一样简单?

ii) Programming Languages and Tools

ii)编程语言和工具

What programming languages or tools to be used to generated the Output File?

哪些编程语言或工具可用于生成输出文件?

There are few options available, such as:

可用的选项很少,例如:

a) SQL Server Integration Services (SSIS)

a)SQL Server集成服务(SSIS)

   We can create a Script Task (in C# or VB.NET) and write the logic within.

我们可以创建一个脚本任务(在C#或VB.NET中)并在其中编写逻辑。

b) a Compiled Executables, VB Script, PowerShell script etc

b)编译的可执行文件,VB脚本,PowerShell脚本等

   Depends on the efficiency and convenience of developer, different programming languages and tools could be used to develop the File Convertor

根据开发人员的效率和便利性,可以使用不同的编程语言和工具来开发文件转换器。

iii) Delimiter and Text Qualifier

iii)分隔符和文本限定符

In case the Delimiter was being used in either Input File or Output File, do consider to use the Delimiter as unique as possible, which is different from the data content.

如果在输入文件或输出文件中使用了定界符,请考虑尽可能使用定界符,这与数据内容不同。

A Text Qualifier could be very useful to use in case inevitably the data also contains the character of delimiter as part of the content.

如果数据不可避免地包含定界符字符作为内容的一部分,则使用文本限定符可能非常有用。

iv) Other Consideration

iv)其他考虑

Other than that, we may think what features we should build into the File Converter, such as:

除此之外,我们可能会考虑应该在文件转换器中内置哪些功能,例如:

a) Data verification

a)数据验证

   To verify the data checksum, size, format, etc.

验证数据校验和,大小,格式等

b) Derived fields

b)派生字段

    To enable the File Converter to have capability to generate derived  fields based on certain logic, such as summation of fields, count of  records, etc.

使文件转换器能够基于某些逻辑(例如字段总和,记录计数等)生成派生字段。

3. Static or Dynamic Approach?

3.静态还是动态方法?

As what we can see from Figure in section: 1. File Formats above, it illustrates the Static mapping of Input File to Output File.

正如我们在以下部分的图中所看到的:1.上面的文件格式,它说明了输入文件到输出文件的静态映射。

The question that may arise is how can have a more dynamic approach to make it possible for the File Convertor to generate different Output File with different output formats?

可能出现的问题是,如何有一种更动态的方法来使文件转换器生成具有不同输出格式的不同输出文件?

In real programming world, it's really not a practical approach to hard code the logic in the codes. Therefore if there's a chance, the developers will try to make things to be configurable as possible as it can.

在实际的编程世界中,将代码中的逻辑硬编码实际上不是一种实用的方法。 因此,如果有机会,开发人员将尝试使事情尽可能地可配置。

I think to make this possible, we would need to have another Admin Executables (aka Template File Generator) in order to generate a Template File (we may also called it as a profile), so that when the File Converter starts to convert the Input File's content, it would actually read the configuration rules in the Template File and then do the conversion accordingly to generate the Output File.

我认为要使其成为可能,我们将需要另一个Admin Executables(又名模板文件生成器)才能生成模板文件(我们也可以将其称为配置文件),以便在文件转换器开始转换输入时文件的内容,它将实际读取模板文件中的配置规则,然后进行相应的转换以生成输出文件。

How can this Template File looks like?

该模板文件的外观如何?

Well, it depends on what programming languages and tools we are using. As a general guideline, it probably will be in file format such as XML or Config file.

好吧,这取决于我们使用的编程语言和工具。 作为一般准则,它可能采用XML或Config文件之类的文件格式。

Sample using XML:

使用XML的示例:

<?xml version="1.0"?>
<setting>
   <ver>1.0</ver>
   <input>
        <format>fixed width</format>
        <header>
            <code>header</code>
            <isdetail>false</isdetail>
            <field>
                <code>1</code>
                <name>Record Identifier</name>
                <type>Char</type>
                <size>34</size>
                <start>1</start>
                <end>34</end>
                <remarks></remarks>
            </field>
        </header>
        <header>
            <code>detail</code>
            <isdetail>true</isdetail>
            <field>
                <code>SubmitDate</code>
                <name>Submission Date</name>
                <type>Date</type>
                <format>YYYYMMDD</format>
                <size>8</size>
                <start>1</start>
                <end>8</end>
                <remarks></remarks>
            </field>            
            <field>
                <code>SerialNo</code>
                <name>Serial Number</name>
                <type>Numeric</type>
                <size>9</size>
                <start>9</start>
                <end>17</end>
                <remarks></remarks>
            </field>
            <field>
                <code>CustName</code>
                <name>Customer Name</name>
                <type>Char</type>
                <size>20</size>
                <start>18</start>
                <end>37</end>
                <remarks></remarks>
            </field>
        </header>
   </input>
   <output>
        <format>fixed width</format>
        <header>
            <code>AcctList</code>
            <remarks>This is the Account List</remarks>
            <printremarks>false</printremarks>
            <isdetail>true</isdetail>
            <field>
                <code>SubmitDate</code>
                <source>
                    <code>detail|SerialNo</code>
                </source>
                <trim>false</trim>                    
                <name>Submission Date</name>
                <type>Date</type>
                <format>MM/DD/YYYY</format>
                <size>10</size>
                <start>1</start>
                <end>10</end>
                <align>right</align>
                <remarks></remarks>
            </field>            
            <field>
                <code>AppCode</code>
                <source>
                    <code>detail|SerialNo</code>
                </source>                    
                <trim>true</trim>
                <formula></formula>                    
                <name>Application Code</name>
                <type>Numeric</type>
                <size>9</size>
                <start>11</start>
                <end>19</end>
                <align>left</align>
                <remarks></remarks>
            </field>
        </header>
        <header>
            <code>AcctTotal</code>
            <remarks>This is the Account Total</remarks>
            <printremarks>true</printremarks>
            <isdetail>false</isdetail>
            <field>
                <code>AppCode</code>
                <source>
                    <code>detail|SerialNo2</code>
                </source>
                <trim>true</trim>
                <formula>{SUM}(1)</formula>
                <name>Total Application</name>
                <type>Numeric</type>
                <format>#,#0</format>
                <size>7</size>
                <start>1</start>
                <end>7</end>
                <align>right</align>
                <remarks></remarks>
            </field>
        </header>
   </output>
</setting> 

To give a sample to reading above template using C# codes, we can customize this sample:

为了提供使用C#代码阅读上述模板的示例,我们可以自定义此示例:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Xml;
using System.IO;


namespace MyFileConvertor
{
    class FileConvertor
    {
        private XmlDocument doc;

        public String TemplateFileVersion { get; set; }
        public Input Input { get; set; }
        public Output Output { get; set; }

        public FileConvertor(String TemplatePath)
        {
            doc = new XmlDocument();
            Input = new Input();
            Output = new Output();

            doc.Load(TemplatePath);
            TemplateFileVersion = doc.SelectSingleNode("setting/ver").InnerText;
            
            Input.Format = doc.SelectSingleNode("setting/input/format").InnerText;

            XmlNodeList inputHeaders = doc.SelectNodes("setting/input/header");
            foreach (XmlNode inputHeader in inputHeaders)
            {
                InputHeader hdr = new InputHeader();
                hdr.Code = inputHeader.SelectSingleNode("code").InnerText;
                hdr.isDetail = inputHeader.SelectSingleNode("isdetail").InnerText == "true" ? true : false;

                XmlNodeList inputFields = inputHeader.SelectNodes("field");
                foreach (XmlNode inputField in inputFields)
                {
                    Field fld = new Field();
                    fld.Code = inputField.SelectSingleNode("code").InnerText;
                    fld.Name = inputField.SelectSingleNode("name").InnerText;
                    fld.Type = inputField.SelectSingleNode("type").InnerText;
                    fld.Format = getNodeText(inputField.SelectSingleNode("format"));
                    fld.Size = Convert.ToInt32(inputField.SelectSingleNode("size").InnerText);
                    fld.Start = Convert.ToInt32(inputField.SelectSingleNode("start").InnerText);
                    fld.End = Convert.ToInt32(inputField.SelectSingleNode("end").InnerText);
                    fld.Remarks = inputField.SelectSingleNode("remarks").InnerText;
                    hdr.Fields.Add(fld);
                }
                Input.InputHeaders.Add(hdr);
            }

            Output.Format = doc.SelectSingleNode("setting/output/format").InnerText;

            XmlNodeList outputHeaders = doc.SelectNodes("setting/output/header");
            foreach (XmlNode outputHeader in outputHeaders)
            {
                OutputHeader hdr = new OutputHeader();
                hdr.Code = outputHeader.SelectSingleNode("code").InnerText;
                hdr.Remarks = outputHeader.SelectSingleNode("remarks").InnerText;
                hdr.PrintRemarks = outputHeader.SelectSingleNode("printremarks").InnerText;
                hdr.isDetail = outputHeader.SelectSingleNode("isdetail").InnerText == "true" ? true : false;

                XmlNodeList outputFields = outputHeader.SelectNodes("field");
                foreach (XmlNode outputField in outputFields)
                {
                    Field fld = new Field();
                    fld.Code = outputField.SelectSingleNode("code").InnerText;
                    fld.Name = outputField.SelectSingleNode("name").InnerText;
                    fld.Type = outputField.SelectSingleNode("type").InnerText;
                    fld.Format = getNodeText(outputField.SelectSingleNode("format"));
                    fld.Size = Convert.ToInt32(outputField.SelectSingleNode("size").InnerText);
                    fld.Start = Convert.ToInt32(outputField.SelectSingleNode("start").InnerText);
                    fld.End = Convert.ToInt32(outputField.SelectSingleNode("end").InnerText);
                    fld.Alignment = outputField.SelectSingleNode("align").InnerText;
                    fld.Remarks = outputField.SelectSingleNode("remarks").InnerText;

                    fld.Formula = getNodeText(outputField.SelectSingleNode("formula"));
                    fld.IsTrim = getNodeText(outputField.SelectSingleNode("trim")) == "true" ? true : false;

                    XmlNodeList outputFieldCodes = outputField.SelectNodes("source");
                    foreach (XmlNode outputFieldCode in outputFieldCodes)
                    {
                        Source src = new Source();
                        src.Codes.Add(outputFieldCode.SelectSingleNode("code").InnerText);
                        fld.Sources.Add(src);
                    }
                    
                    hdr.Fields.Add(fld);
                }
                Output.OutputHeaders.Add(hdr);
            }


        }
        public String getNodeText(XmlNode node)
        {
            try
            {
                return node.InnerText;
            }
            catch (Exception)
            {
                return "";
            }
        }
        public String getNodeText(XmlNode node, String defaultReturn)
        {
            try
            {
                return node.InnerText;
            }
            catch (Exception)
            {
                return defaultReturn;
            }
        }
        
    }

    class Input
    {
        public String Format { get; set; }
        public List<InputHeader> InputHeaders;

        public Input()
        {
            InputHeaders = new List<InputHeader>();
        }
    }

    class InputHeader
    {
        public String Code { get; set; }
        public Boolean isDetail { get; set; }
        public List<Field> Fields;

        public InputHeader()
        {
            Fields = new List<Field>();
        }
    }

    class Output
    {
        public String Format { get; set; }
        public List<OutputHeader> OutputHeaders;

        public Output()
        {
            OutputHeaders = new List<OutputHeader>();
        }
    }

    class OutputHeader
    {
        public String Code { get; set; }
        public String Remarks { get; set; }
        public String PrintRemarks { get; set; }
        public Boolean isDetail { get; set; }
        public List<Field> Fields;

        public OutputHeader()
        {
            Fields = new List<Field>();
        }
    }

    class Field
    {
        public String Code { get; set; }

        public Boolean IsTrim { get; set; }
        public String Formula { get; set; }
        
        public String Name { get; set; }
        public String Type { get; set; }
        public String Format { get; set; }
        public int Size { get; set; }
        public int Start { get; set; }
        public int End { get; set; }
        public String Alignment { get; set; }
        public String Remarks { get; set; }
        
        public List<Source> Sources;

        public Field()
        {
            Sources = new List<Source>();
        }
    }

    class Source
    {
        public List<String> Codes;
        public Source()
        {
            Codes = new List<string>();
        }
    }

} 

In general, we can build a much more complex rules in the Template File to meet the requirements.

通常,我们可以在模板文件中构建更复杂的规则来满足要求。

So, the preferred Programming Languages and Tools will play a crucial role to read the settings from the Template File, which eventually based on the logic within to generate the Output File.

因此,首选的编程语言和工具将扮演至关重要的角色,以从模板文件中读取设置,而模板文件最终将基于其中的逻辑来生成输出文件。

This may not be an easy task but it's worth a try :)

这可能不是一件容易的事,但是值得一试:)

Conclusion:

结论:

As a whole, we need to ensure the data accuracy and integrity are take into consideration.

总体而言,我们需要确保考虑到数据的准确性和完整性。

Some of other concerns:

其他一些问题:

1. We can build an application log to track the activities of during the file conversion process.

1.我们可以构建一个应用程序日志来跟踪文件转换过程中的活动。

2. We can embed the security algorithms to protect our data if necessary.

2.如果需要,我们可以嵌入安全算法来保护我们的数据。

3. Automate the whole process by integrate it as part of the ETL (Extract, Transform and Load) or Windows Scheduled Task process.

3.通过将其集成为ETL(提取,转换和加载)或Windows计划任务过程的一部分来自动化整个过程。

4. How the Template File to be updated/ stored if the File Converter had been deployed to many computer machines?

4.如果文件转换器已部署到许多计算机上,如何更新/存储模板文件?

More references: 更多参考:

Script Task

脚本任务

https://docs.microsoft.com/en-us/sql/integration-services/control-flow/script-task

https://docs.microsoft.com/zh-cn/sql/integration-services/control-flow/script-task

Text Import Wizard

文字汇入精灵

https://support.office.com/en-us/article/Text-Import-Wizard-c5b02af6-fda1-4440-899f-f78bafe41857?ui=en-US&rs=en-US&ad=US&fromAR=1

https://support.office.com/zh-CN/article/Text-Import-Wizard-c5b02af6-fda1-4440-899f-f78bafe41857?ui=zh-CN&rs=zh-CN&ad=US&fromAR=1

(Read Step 2 of 3 for explanation of Delimiter and Text Qualifier)

(阅读第2步(共3步)以了解分隔符和文本限定符)

翻译自: https://www.experts-exchange.com/articles/29620/Converting-a-fixed-width-delimiter-separated-text-file-content-Input-File-to-another-formatted-text-file-Output-File-What-approach-to-use.html

以分隔符分隔的文本文件

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值