sgml_使用SGMLReader对自动关闭的SGML标签进行后处理

sgml

sgml

Chris Lovett's SGMLReader is an interesting and complex piece of work. It's more complex than my brain can hold, which is good, since he wrote it and not I. It's able to parse SGML documents like HTML. However, it derives from XmlReader, so it tries (and succeeds) to look like an XmlReader. As such, it Auto-Closes Tags. Remember that SGML doesn't have to have closing tags. Specifically, it doesn't need closing tags on primitive/simple types.

克里斯·洛夫特(Chris Lovett)的SGMLReader是一件有趣而复杂的工作。 它比我的大脑所能承受的还要复杂,这很好,因为他是我写的,而不是我写的。它能够解析HTML之类的SGML文档。 但是,它是从XmlReader派生的,因此它尝试(并成功)看起来像XmlReader。 因此,它会自动关闭标签。 请记住,SGML不必有结束标记。 具体来说,它不需要在原始/简单类型上关闭标签。

Sometimes I need to parse an OFX 1.x document, a financial format that is SGML like this:

有时我需要解析OFX 1.x文档,这种财务格式为SGML,如下所示:

<OFX>    
<SIGNONMSGSRQV1>
<SONRQ>   
 <DTCLIENT>20060128101000 
 <USERID>654321
 <USERPASS>123456
 <LANGUAGE>ENG 
  <FI>    
   <ORG>Corillian
   <FID>1001 
  </FI>
 <APPID>MyApp  
 <APPVER>0500  
</SONRQ>
...etc...

<OFX> <SIGNONMSGSRQV1> <SONRQ> <DTCLIENT> 20060128101000 <USERID> 654321 <用户密码> 123456 <语言> ENG <FI> <ORG> Corillian <FID> 1001 </ FI> <APPID> MyApp <APPVER> 0500 </ SONRQ> ...等等...

Notice that ORG and DTCLIENT and all the other simple types have no end tags, but complex types like FI and SONRQ do have end tags. The SgmlReader class attempts to automatically insert end tags (to close the element) as I use the XmlReader.Read() method to move through the document. However, he can't figure out where the right place for an end tag is until he sees an end elements go by. Then he says, oh, crap! There's </FI>! I need to empty my stack of start elements in reverse order. This is lovely for him, but gives me a document that looks (in memory) like this:

请注意,ORG和DTCLIENT以及所有其他简单类型都没有结束标签,但是像FI和SONRQ这样的复杂类型确实具有结束标签。 当我使用XmlReader.Read()方法在文档中移动时,SgmlReader类尝试自动插入结束标记(以关闭元素)。 但是,直到看到结束元素,他才能弄清楚结束标记的正确位置在哪里。 然后他说,哦,废话! 有</ FI>! 我需要以相反的顺序清空我的启动元素堆栈。 这对他来说很可爱,但是给了我一个看起来像这样的文档(在内存中):

<OFX>    
<SIGNONMSGSRQV1>
<SONRQ>   
  <DTCLIENT>20060128101000 
  <USERID>654321
    <USERPASS>123456
     <LANGUAGE>ENG 
        <FI>    
          <ORG>Corillian
           <FID>1001
</FID>
          </ORG> </FI>
     </LANGUAGE>
    </USERPASS>
  </USERID>
 </DTCLIENT>

...etc...

<OFX> <SIGNONMSGSRQV1> <SONRQ> <DTCLIENT> 20060128101000 <USERID> 654321 <用户密码> 123456 <语言> ENG <FI> <ORG> Corillian <FID> 1001 </ FID> </ ORG> </ FI> </ LANGUAGE> </ USERPASS> </ USERID> </ DTCLIENT> ...等等...

...which totally isn't the structure I'm looking for. I could write my own SgmlReader that knows more about OFX, but really, who has the time. So, my buddy Paul Gomes and I did this.

...这完全不是我要寻找的结构。 我可以编写自己的SgmlReader,它对OFX了解更多,但实际上,谁有时间。 因此,我的好友Paul Gomes和我做到了。

NOTE: There's one special tag in OFX called MSGBODY that is a simple type but always has an end tag, so we special cased that one. Notice also that we did all this WITHOUT changing the SgmlReader. It's just passed into the method as "reader."

注意:OFX中有一个特殊的标签,称为MSGBODY,它是一个简单的类型,但始终带有结束标签,因此我们特意将其标记为case。 还要注意,我们无需更改SgmlReader就可以完成所有这些操作。 它只是作为“阅读器”传递给该方法。

protected internal static void AutoCloseElementsInternal(SgmlReader reader, XmlWriter writer)

受保护的内部静态void AutoCloseElementsInternal(SgmlReader reader,XmlWriter writer)

{

{

    object msgBody = reader.NameTable.Add("MSGBODY");

对象msgBody = reader.NameTable.Add(“ MSGBODY”);

    object previousElement = null;

对象previousElement = null ;

    Stack elementsWeAlreadyEnded = new Stack();

堆栈元素WeAlreadyEnded = new Stack();

    while (reader.Read())

而(reader.Read())

    {

{

        switch ( reader.NodeType )

开关(reader.NodeType)

        {

{

            case XmlNodeType.Element:

大小写XmlNodeType.Element:

                previousElement = reader.LocalName;

previousElement = reader.LocalName;

                writer.WriteStartElement(reader.LocalName);

writer.WriteStartElement(reader.LocalName);

                break;

休息;

            case XmlNodeType.Text:

大小写XmlNodeType.Text:

                if(Strings.IsNullOrEmpty(reader.Value) == false)

如果(Strings.IsNullOrEmpty(reader.Value)== false )

                {

{

                    writer.WriteString( reader.Value.Trim());

writer.WriteString(reader.Value.Trim());

                    if (previousElement != null && !previousElement.Equals(msgBody))

如果(previousElement!= null &&!previousElement.Equals(msgBody))

                    {

{

                        writer.WriteEndElement();

writer.WriteEndElement();

                        elementsWeAlreadyEnded.Push(previousElement);

elementsWeAlreadyEnded.Push(previousElement);

                    }

}

                }

}

                else Debug.Assert(true, "big problems?");

否则Debug.Assert( true ,“大问题?”);

                break;

休息;

            case XmlNodeType.EndElement:

案例XmlNodeType.EndElement:

                if(elementsWeAlreadyEnded.Count > 0

如果(elementsWeAlreadyEnded.Count> 0

                    && Object.ReferenceEquals(elementsWeAlreadyEnded.Peek(),                        reader.LocalName))

&& Object.ReferenceEquals(elementsWeAlreadyEnded.Peek(),reader.LocalName))

                {

{

                    elementsWeAlreadyEnded.Pop();

elementsWeAlreadyEnded.Pop();

                }

}

                else

其他

                {

{

                    writer.WriteEndElement();

writer.WriteEndElement();

                }

}

                break;

休息;

            default:

默认值:

                writer.WriteNode(reader,false);

writer.WriteNode(reader, false );

                break;

休息;

        }

}

    }

}

}

}

We store the name of the most recently written start tag. If we write out a node of type XmlNodeType.Text, we push the start tag on a stack and immediately write out our own EndElement. Then, when we notice the SgmlReader starting to auto-close and send us synthetic EndElements, we ignore them if they are already at the top of our own stack. Otherwise, we let SgmlReader close non-synthetic EndElements.

我们存储最近写入的开始标签的名称。 如果我们写出XmlNodeType.Text类型的节点,则将开始标记压入堆栈,然后立即写出我们自己的EndElement。 然后,当我们注意到SgmlReader开始自动关闭并向我们发送合成EndElement时,如果它们已经位于我们自己堆栈的顶部,我们将忽略它们。 否则,我们让SgmlReader关闭非合成的EndElements。

The resulting OFX document now looks like this:

生成的OFX文档现在如下所示:

<OFX>
<SIGNONMSGSRQV1>
 <SONRQ>
  <DTCLIENT>20060128101000</DTCLIENT>
  <USERID>411300</USERID>
  <USERPASS>123456
</USERPASS>  <LANGUAGE>ENG</LANGUAGE>
  <FI>
   <ORG>Corillian</ORG>
   <FID>1001</FID>
  </FI>
  <APPID>MyApp</APPID>
  <APPVER>0500</APPVER> </SONRQ>
...etc...

<OFX> <SIGNONMSGSRQV1> <SONRQ> <DTCLIENT> 20060128101000 </ DTCLIENT> <USERID> 411300 </ USERID> <USERPASS> 123456 </ USERPASS> <LANGUAGE>英语</ LANGUAGE> <FI> <ORG> Corillian </ ORG> <FID> 1001 </ FID> </ FI> <APPID> MyApp </ APPID> <APPVER> 0500 </ APPVER> </ SONRQ> ...等等...

...and we can deal with it just like any other Xml Fragment, in our case, just allowing it to continue along its way in the XmlReader/XmlWriter Pipeline.

...我们可以像处理任何其他Xml Fragment一样处理它,只需允许它在XmlReader / XmlWriter管道中继续前进即可。

Thanks to Craig Andera for the reminder about Object.ReferenceEquals(), it's nicer than elementsWeAlreadyEnded.Peek() == (object)reader.LocalName.

感谢Craig Andera提醒有关Object.ReferenceEquals ()的问题,它比elementsWeAlreadyEnded.Peek()==(object)reader.LocalName更好。

翻译自: https://www.hanselman.com/blog/postprocessing-autoclosed-sgml-tags-with-the-sgmlreader

sgml

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值