Hi.

I have an XML file which defines stops within travelcard zones. It is structured like this:

<Zones>
<Zone>
<ZoneID>Z1</ZoneID>
<ZoneName>Zone 1</ZoneName>
<ZoneArea>
<ZoneStopsIncluded>
<StopID>49000248</StopID>
<StopID>490000011E1</StopID>
<StopID>490000143E1</StopID>
...
<StopID>9400ZZLUWSM4</StopID> </ZoneStopsIncluded>
<ZoneStopsExcluded>
<StopID>9300TWP</StopID>
</ZoneStopsExcluded>
</ZoneArea>
</Zone>
<Zone>
<ZoneID>Z2</ZoneID>
<ZoneName>Zone 2</ZoneName>
<ZoneArea>
<ZoneStopsIncluded>
<StopID>490000167E1</StopID>
<StopID>490000220E1</StopID>
...
</ZoneStopsIncluded>
</ZoneArea>
</Zone>
</Zones>


with additional identically structured <Zone> elements for other zones as required. However, my data source had duplicates, and these have crept into the XML and I now need to remove them. There are quite a lot, so I don't want to do it by hand.

I wrote this Muenchian transformation to do the job:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xslutput indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="k1" match="StopID" use="text()"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select=" @* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="StopID">
<xsl:if test="generate-id() = generate-id(key('k1', text())[1])">
<xsl:copy>
<xsl:apply-templates select=" @* | node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

which works, but is somewhat overzealous. It removes all but the first instance of a stop code, which isn't quite right. The rules that I should be following are:

1. a stop can be in one or more zones
2. within a zone, a stop cannot be in both the "includes" list and the "excludes" list
3. within a zone, a stop cannot occur more than once in the "includes" or "excludes" list.

so by excluding all but the first instance, I am removing stops that are allowed because of (1). However, if I leave it as it is, I am breaking (3). Note that I do not have any examples of stops breaking (2) so I am less worried about that.

I have tried various options of nested IF clauses and combinations of generate-id() using ZoneID with the StopID, but I have failed dismally. Can anyone offer any help?

Many thanks.
Stuart