Parsing XML Files with PowerShell _powershell操作xml:xmlreader解析xml文档-CSDN博客

本文链接：https://blog.csdn.net/bihailan123/article/details/6455715

In the context of using Windows PowerShell for lightweight software test automation, one of the most common tasks you need to perform is parsing data from XML files. For example, you may want to extract test case input and expected result data from an XML test cases file, or you might want to pull out results data from an XML test results file. Compared to parsing a flat text file, parsing most XML files is a bit tricky because of XML's hierarchical structure. There are several approaches you can take when parsing XML with PowerShell. In general, the most flexible technique is to read the entire XML file into memory as an XmlDocument object and then use methods such as SelectNodes(), SelectSingleNode(), GetAttribute(), and get_InnerXml() to parse the object in memory. Let me demonstrate with typical example. Suppose you want to parse this dummy XML test case data file:

<?xml version="1.0" ?>
<testCases>

</testCases>

The dummy file represents test case data for a hypothetical Sum() method. Listed below is a PowerShell script which parses the XML file and produces as output:

PS C:/XMLwithPowerShell> ./parseXML.ps1

Parsing file testCases.xml

Case ID = 001 Arg1 = 3 Optional = no Arg2 = 4 Expected value = 7
Case ID = 002 Arg1 = 5 Optional = yes Arg2 = 6 Expected value = 11

End parsing

The complete script is:

# parseXML.ps1

write-host "`nParsing file testCases.xml`n"
[System.Xml.XmlDocument] $xd = new-object System.Xml.XmlDocument
$file = resolve-path("testCases.xml")
$xd.load($file)

$nodelist = $xd.selectnodes("/testCases/testCase") # XPath is case sensitive
foreach ($testCaseNode in $nodelist) {
$id = $testCaseNode.getAttribute("id")
$inputsNode = $testCaseNode.selectSingleNode("inputs")
$arg1 = $inputsNode.selectSingleNode("arg1").get_InnerXml()
$optional = $inputsNode.selectSingleNode("arg1").getAttribute("optional")
$arg2 = $inputsNode.selectSingleNode("arg2").get_InnerXml()
$expected = $testCaseNode.selectSingleNode("expected").get_innerXml()
#$expected = $testCaseNode.expected
write-host "Case ID = $id Arg1 = $arg1 Optional = $optional Arg2 = $arg2 Expected value = $expected"
}

write-host "`nEnd parsing`n"

The first three statements of the script load file testCases.xml into memory as an XmlDocument object:

[System.Xml.XmlDocument] $xd = new-object System.Xml.XmlDocument
$file = resolve-path("testCases.xml")
$xd.load($file)

I could have loaded the XML file in a single line like so:

[xml] $xd = get-content "./testCases.xml"

Using the three-statement approach in the script has no technical advantage but is somewhat more readable by an engineer with C# coding experience. Next I fetch the all testCase nodes into memory:

$nodelist = $xd.selectnodes("/testCases/testCase")

The SelectNodes() method accepts an XPath string which is case-sensitive. With the testCase nodes now in memory I can iterate through each node with a foreach loop. Alternatively I could have iterated using a for loop with an index variable (say $i) in conjunction with the Item() method. For each node, I first fetch the test case ID attribute:

$id = $testCaseNode.getAttribute("id")

I use the GetAttribute() method of the XmlElement class. Interestingly I could have written this instead:

$id = $testCaseNode.id

This alternative illustrates an important point. In an effort to make parsing XML with PowerShell easier than with C# or VB.NET, the designers of PowerShell decided to directly expose attributes and values of XML elements in the form of properties. But since arbitrary XML data is available as properties, PowerShell does not expose standard .NET Framework properties (such as InnerXml) because there could be a name conflict. Note that PowerShell does expose standard .NET Framework methods such as GetAttribute(). Continuing in my script, next I grab the values of arg1:

$inputsNode = $testCaseNode.selectSingleNode("inputs")
$arg1 = $inputsNode.selectSingleNode("arg1").get_InnerXml()
$optional = $inputsNode.selectSingleNode("arg1").getAttribute("optional")

I use the SelectSingleNode() method to grab the single <input> node. Now instead of using the standard InnerXml property, which PowerShell does not expose, I use the underlying PowerShell get_InnerXml() method which corresponds to the non-exposed InnerXml property. OK, but just how did I know about this get_InnerXml() method? As with many PowerShell scripting tasks, before writing my script I had previously experimented by issuing interactive commands at the PowerShell prompt. For example, after interactively loading the XML file into memory (by typing the first three statements in my script), I typed commands such as:

> $nodelist = $xd.selectnodes("/testCases/testCase")
> $firstnode = $nodelist.item(0)
> $inputs = $firstnode.selectSingleNode("inputs")
> $arg1 = $inputs.selectSingleNode("arg1")
> $arg1 | get-member | more

Using the get-member cmdlet is the key to discovering exactly what properties and methods are available to an object. Anyway, the rest of the script should be reasonably self-explanatory because I use the same coding techniques. To summarize, although there are several ways to parse an XML file using PowerShell, a flexible approach is to use the XmlDocument class. After reading an XML file into memory as an XmlDocument object, you can select multiple nodes into a collection using the SelectNodes() method, grab a single node using the SelectSingleNode() method, retrieve an attribute using either the standard GetAttribute() method or the name of the attribute which PowerShell exposes as a property, and you can obtain an element value using the special get_InnerXml() PowerShell method.