Monday, 28 November 2011

Transforming XML documents with XSLT

The XSLT transformation pipeline involves four elements: the source XML document(s) with the XSLT stylesheets, an XSLT processor ( template processing engine ), and the resulting document(s).

The output format of an XSLT transformation ( actually, in the context of XSLT a transformation doesn’t actually transform the source XML document – but instead uses it as input for creating other documents ) can range from PDF files to plain text files; this is due to XSLT’s powerful templating mechanism and versatile XSL Formatting Objects.

Here’s an XSL template which will create an XML file based on movies.xml, and add a column which will represent the ‘value’ of the movie – it’s rating divided by it’s price:
 1 <?xml version="1.0" encoding="ISO-8859-1"?>
 2 <xsl:stylesheet version="1.0"
 4
 5 <xsl:output method="xml" version="1.0" indent="yes"/>
 6     <xsl:template match="/">
 7         <movies>
 8         <xsl:for-each select="movies/movie">
 9             <movie>
10             <xsl:copy-of select="./*"/>
11             <value><xsl:value-of select="rating div price"/></value>
12             </movie>
13         </xsl:for-each>
14         </movies>
15     </xsl:template>
16 </xsl:stylesheet>

Notes
-         this solution is not ideal, since some elements are hard-coded ( ‘movies’ and ‘movie’ ); these should be somehow deduced
-         xsl:for-each is used on L08 to select each ‘movie’ node in turn ( in the order they appear in the original XML file – ‘document order’ )
-         xsl:copy-of is used on L10 to create a copies of all the child nodes of the current node. This statement will copy the ‘title’, ‘year’, etc elements for each ‘movie’ element.
-         on L11, the value element is created. xsl:value-of is used to insert the value resulting from dividing the value of the ‘rating’ element with the value of the ‘price’ element of the current node.

Microsoft’s msxsl tool (http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21714 ) can be used to transform the original XML file into a new one, with the added <value> element:
 msxsl movies.xml transform.xsl -o new_movies.xml

Sorting is accomplished by using the xsl:sort element ( http://www.w3.org/TR/xslt#sorting ). xsl:sort can only appear as a child of an xsl:apply-templates or an xsl:for-each element. To sort the list of movies by the name of the title, we can simply add a line after L08 in the previous XSLT stylesheet:
9         <xsl:sort select="title">

The resulting XML document will have it’ elements sorted corresponding to the alphabetical order of their ‘title’ elements. So, ‘Apocalypto’ will appear the first, and ‘Year One’ as the last one.