EfCom's Software Development Blog

 05/09/2009: Working with XML in .NET Back to Efcom Blog

 

 

XML - is a technology usage of which is rapidly growing. Almost all web application use XML as their main format for data exchange. More and more desktop application select XML as their main data format. Because of the increasing popularity of XML Microsoft give much better support to XML in the new version of the .Net framework.  


In .NET 1.0, if you wanted to read an XML file, you would initiate an XmlTextReader or an XmlValidatingReader object, and then address it through the abstract class XmlReader. Or in other words:
 
XmlReader reader = new XmlTextReader (“file.xml”) ;
 
While technically this is still an option, nowadays the smarter way is using the static method Create which XmlReader provides you with:
 
XmlReader reader = XmlReader.Create (“file.xml”) ;
 
What is it good for? You may not think it makes much difference, though it does – but wait with that, but this change allows .NET developers (that’s right, I’m talking about Microsoft) to subtly change the functionality of XmlReader and use the Reader best adapted to your needs without anything changing really.
 
But just a moment, you might say, there is a reason for the different XML-reading classes. What do you do if, say, you want to use a more sophisticated XML reader, one which verifies that the XML is compatible with a specific schema?
 
XmlReaderSettings
 
I won’t write anything about XML schemas, primarily because this has little to do with programming in .NET framework. I have brought up the question because it gives me a good reason to lay out the XmlReaderSettings, the new class (mahlaka) which determines the rules of XmlReader today.
 
When you initiate XmlReader (be et ha ithul shel XmlReader), you can pass the class XmlReaderSettings as a parameter. This class wil contain all kinds of parameters which will influence the performance of XmlReader. For example:
 
XmlReaderSettings settings = new XmlReaderSettings ( ) ;
settings. IgnoreComments = true ;
settings. IgnoreWhitespace = true ;
settings. IgnoreProcessingInstructions = true ;
settings. ValidationType = ValidationType . None ;
settings. CheckCharacters = true ;

 
And so on. You convert this class to XmlReader simply as a parameter:
 
XmlReader reader = XmlReader.Create (“file.xml” , settings) ;
 
Naturally, all these things have default values, so that if you look carefully, you will see that I have done a few unnecessary things in determining the XmlReaderSettings values above.
 
Scehma Validation
 
Here is a few lines on how to check that your Xml file is compatible with a schema. I will need to add the following lines to the previously determined settings:
 
settings.Schemas.Add (null, “schema.xsd”);
settings.ValidationType = ValidationType.Schema;

 
and then read the XML file the usual way. The parameter null in the first line is the value of the property targetNamespace which appears in the schema. If you want to avoid all this trouble, just input the null value, and it will extract the right value from the schema itself.
 
Since you usually don’t want to inform the user about the compatibility of XML with a specific schema, you can send a method of your own to the Event called ValidationEventHandler  which is in the XmlReaderSettings.
 
And what about the reading itself?
 
The reading itself is done in the same simple loop as before:
 
while (reader.Read ( ) )
{ }
reader.Close ( ) ;

 
There is still something innovative about this. If before we wanted to read non-textual information from the XML, we had to read it as a string, then perform casting or parsing, or use the relevant method of the class System.Xml>XmlConvert. Now we can simply use ReadContentAs[whatever]. For example: ReadContentAsDouble or ReadContentAsInt.
 
Another innovation that in order to perform these actions before, we had to make sure that our cursor is set over the text we want to convert and then use the method ReadContent. That is, if we have the following element:
 
<Amount>2</Amount>
 
we would have to place the cursor over the 2 to read it. Which is rather annoying, because we first have to make sure the cursor is placed over the element DefaultAmount and then move it forward to read the 2 with commands like:
 
if (reader.NodeType == XmlNodeType.Element && reader.Name == “Amount”)
{
              reader.Read ( ) ; //move the cursor to the text of the element
              int I = ReadContentAsInt ( ) ;
}

 
Now, we can read the content of the element directly and perform casting easily:
 
if (reader.NodeType == XmlNodeType.Element && reader.Name == “Amount”)
reader.ReadElementContentAsInt ( ) ;

 
And finally, keep in mind that XmlReader is useful for forward reading only. If you want to move backwards and forwards through the XML file, it is probably better for you to load the file into XmlDocument and then work on it with more advanced navigational tools.
 
You may have run into other XML parsing tools. The most famous among them is msxml2-4. They have all been written in C++ and include unmanaged code which should be very efficient. The most popular opinion says that System.Xml in .NET2.0 contains a code the efficiency of which is very close to msxml, and which has the same functionality as msxml.
 
Though one would have to look more thoroughly into it to draw conclusions about the performance of SystemXml in .NET 2.0, you are probably better off sacrificing some of the efficiency of working with XML files in order to ensure better stability as the result of working with manged code.
 
The support for XML in .NET has always been too broad, and in .NET 2.0 the situation has gotten even better. We could talk forever about navigation in XmlDocument which you have uploaded to memory.
 
 
A bit more about navigation through an XmlDocument
 
ReadSubtree
 
We have seen that if we want to read a tree with XmlReader, we begin with:
 
XmlReader reader = XmlReader.Create (“file.xml”);
 
And then perform a loop:
 
While (reader.Read ( ) )
{ }

 
This banal, unidirectional reading is forces severe limitations on us. But in most cases, using a more sophisticated reading object (like XmlDocument) is a colossal overkill. If for example we want to perform actions on several children of a specific tree node at once. To create several parallel XmlReaders would be an ugly solution. Besides, the most natural way to read the child would be by using a loop like that mentioned above. In that case, we will continue reading the tree long after we leave the child. So – should we add tests which check if at any moment we are reading the Element which ends the child? Better to use ReadSubtree instead:
 
While (reader.Read ( ) )
{
        if (reader.Name == “Modules”)
       {
                    XmlReader moduleReader = reader.ReadSubtree ( ) ;
                    DealWithModules (moduleReader) ;
        }
}

 
And we can deal with the subtree “Modules”:
 
Public void DealWithModules (XmlReader moduleReader)
{
            while (moduleReader.Read ( ) )
             { }
}

 
moduleReader will stop working when it reaches the end of the sub tree Modules – a far more elegant solution.
 
 
 
Working with Attributes
 
You may have noticed something else – that XmlReader, when using his Read method, does not go over Attributes. That is, if in our XML file we have the following line:
 
<Folder Name=”TESTER” ID=”8” Public=”True”>
<Modules>
         <Module ID=”212” OwnerID=”?” FolderID=”8” />

 
and if we perform Read on XmlReader  which is reading the file, we will first point at the element Folder, then the element Modules, and then the element Module. The XmlReader may stop at Whitespaces, but it does not stop at Attributes like “Name” or “ID”.
 
Fortunately, we can make it go over every Attribute. We want first to see if the current element at all has Attributes:
 
if (reader.AttributeCount > 0)
 
And if it already has Attributes, we can perform iteration on all of them:
 
reader.MoveToFirstAttribute ( ) ;
 
We can also skip over to a specific Attribute by using reader.MoveToAttribute, and get the value of every Attribute on the current node by using reader.GetAttribute – which may make you wonder why we are at all performing Iteration on the Attributes.

 

 

Thanks to Idan Zairman for this blog

 


Back to Efcom Software Services  |  Back to Top

 
   Design by Anna D