Parsing XML Files with PowerShell

In the context of using Windows PowerShell for lightweight software test automation, one of the most common tasks you need to perform is parsing data from XML files. For example, you may want to extract test case input and expected result data from an XML test cases file, or you might want to pull out results data from an XML test results file. Compared to parsing a flat text file, parsing most XML files is a bit tricky because of XML's hierarchical structure. There are several approaches you can take when parsing XML with PowerShell. In general, the most flexible technique is to read the entire XML file into memory as an XmlDocument object and then use methods such as SelectNodes(), SelectSingleNode(), GetAttribute(), and get_InnerXml() to parse the object in memory. Let me demonstrate with typical example. Suppose you want to parse this dummy XML test case data file:
 
<?xml version="1.0" ?>
<testCases>
 <testCase id="001">
  <inputs>
    <arg1 optional="no">3</arg1>
    <arg2>4</arg2>
  </inputs>
  <expected>7</expected>
 </testCase>
 <testCase id="002">
  <inputs>
    <arg1 optional="yes">5</arg1>
    <arg2>6</arg2>
  </inputs>
  <expected>11</expected>
 </testCase>
</testCases>
 
The dummy file represents test case data for a hypothetical Sum() method. Listed below is a PowerShell script which parses the XML file and produces as output:
 
PS C:/XMLwithPowerShell> ./parseXML.ps1
Parsing file testCases.xml
Case ID = 001 Arg1 = 3 Optional = no Arg2 = 4 Expected value = 7
Case ID = 002 Arg1 = 5 Optional = yes Arg2 = 6 Expected value = 11
End parsing
 
The complete script is:
 
# parseXML.ps1
write-host "`nParsing file testCases.xml`n"
[System.Xml.XmlDocument] $xd = new-object System.Xml.XmlDocument
$file = resolve-path("testCases.xml")
$xd.load($file)
$nodelist = $xd.selectnodes("/testCases/testCase") # XPath is case sensitive
foreach ($testCaseNode in $nodelist) {
  $id = $testCaseNode.getAttribute("id")
  $inputsNode = $testCaseNode.selectSingleNode("inputs")
  $arg1 = $inputsNode.selectSingleNode("arg1").get_InnerXml()
  $optional = $inputsNode.selectSingleNode("arg1").getAttribute("optional")
  $arg2 = $inputsNode.selectSingleNode("arg2").get_InnerXml()
  $expected = $testCaseNode.selectSingleNode("expected").get_innerXml()
  #$expected = $testCaseNode.expected 
  write-host "Case ID = $id Arg1 = $arg1 Optional = $optional Arg2 = $arg2 Expected value = $expected"
}
write-host "`nEnd parsing`n"
 
The first three statements of the script load file testCases.xml into memory as an XmlDocument object:
 
[System.Xml.XmlDocument] $xd = new-object System.Xml.XmlDocument
$file = resolve-path("testCases.xml")
$xd.load($file)
 
I could have loaded the XML file in a single line like so:
 
[xml] $xd = get-content "./testCases.xml"
 
Using the three-statement approach in the script has no technical advantage but is somewhat more readable by an engineer with C# coding experience. Next I fetch the all testCase nodes into memory:
 
$nodelist = $xd.selectnodes("/testCases/testCase")
 
The SelectNodes() method accepts an XPath string which is case-sensitive. With the testCase nodes now in memory I can iterate through each node with a foreach loop. Alternatively I could have iterated using a for loop with an index variable (say $i) in conjunction with the Item() method. For each node, I first fetch the test case ID attribute:
 
$id = $testCaseNode.getAttribute("id")
 
I use the GetAttribute() method of the XmlElement class. Interestingly I could have written this instead:
 
$id = $testCaseNode.id
 
This alternative illustrates an important point. In an effort to make parsing XML with PowerShell easier than with C# or VB.NET, the designers of PowerShell decided to directly expose attributes and values of XML elements in the form of properties. But since arbitrary XML data is available as properties, PowerShell does not expose standard .NET Framework properties (such as InnerXml) because there could be a name conflict. Note that PowerShell does expose standard .NET Framework methods such as GetAttribute(). Continuing in my script, next I grab the values of arg1:
 
$inputsNode = $testCaseNode.selectSingleNode("inputs")
$arg1 = $inputsNode.selectSingleNode("arg1").get_InnerXml()
$optional = $inputsNode.selectSingleNode("arg1").getAttribute("optional")
 
I use the SelectSingleNode() method to grab the single <input> node. Now instead of using the standard InnerXml property, which PowerShell does not expose, I use the underlying PowerShell get_InnerXml() method which corresponds to the non-exposed InnerXml property. OK, but just how did I know about this get_InnerXml() method? As with many PowerShell scripting tasks, before writing my script I had previously experimented by issuing interactive commands at the PowerShell prompt. For example, after interactively loading the XML file into memory (by typing the first three statements in my script), I typed commands such as:
 
> $nodelist = $xd.selectnodes("/testCases/testCase")
> $firstnode = $nodelist.item(0)
> $inputs = $firstnode.selectSingleNode("inputs")
> $arg1 = $inputs.selectSingleNode("arg1")
> $arg1 | get-member | more
 
Using the get-member cmdlet is the key to discovering exactly what properties and methods are available to an object. Anyway, the rest of the script should be reasonably self-explanatory because I use the same coding techniques. To summarize, although there are several ways to parse an XML file using PowerShell, a flexible approach is to use the XmlDocument class. After reading an XML file into memory as an XmlDocument object, you can select multiple nodes into a collection using the SelectNodes() method, grab a single node using the SelectSingleNode() method, retrieve an attribute using either the standard GetAttribute() method or the name of the attribute which PowerShell exposes as a property, and you can obtain an element value using the special get_InnerXml() PowerShell method.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值