Stripping (HTML) tags in XSLT

As there doesn’t seem to be any built-in function in XLST for stripping tags from strings (e.g., to remove all markup from a piece of HTML-formatted text), people came up with a recursive template-based solution, which has been posted several times on the web (e.g., here). However, I found this approach hard to use when the string to be cleaned from all tags already is stored in a variable or is created by using a xsl:value-of statement. Therefore, I transformed the existing template-based solution into a function-based one, which is a bit shorter and easier to use. Here it is:

<xsl:function name="util:strip-tags">
  <xsl:param name="text"/>
  <xsl:choose>
    <xsl:when test="contains($text, '&lt;')">
      <xsl:value-of select="concat(substring-before($text, '&lt;'),
        util:strip-tags(substring-after($text, '&gt;')))"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$text"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

Note: Don’t forget to declare a namespace for this function (called util in the above code).

UPDATE: From the comments I see that an example might be helpful here. Well, here it is:

example.xsl:

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:util="http://whatever">

<xsl:output method="text"/>

<xsl:function name="util:strip-tags">
  <xsl:param name="text"/>
  <xsl:choose>
    <xsl:when test="contains($text, '&lt;')">
      <xsl:value-of select="concat(substring-before($text, '&lt;'),
        util:strip-tags(substring-after($text, '&gt;')))"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$text"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

<xsl:template match="/">
<xsl:value-of select="util:strip-tags(/content)"/>
</xsl:template>

</xsl:stylesheet>

input.xml:

<?xml version="1.0" encoding="UTF-8"?>
<content>
test <some><nice><tags>xyz</tags></nice></some> test
</content>

Now I use the SAXON XSLT processor to strip the tags (inside the content tag) from the input file. Note that you might need to change the path to the JAR file to make this example work for you:
java -jar /usr/share/java/saxon.jar input.xml example.xsl

The output:


test xyz test

This entry was posted in XML. Bookmark the permalink.

5 Responses to Stripping (HTML) tags in XSLT

  1. Raju says:

    Hi,

    Could you please provide the namespace information for the util which you have used above.

    Thanks,
    Raju

    • Here is an example of how to declare a namespace, define a function within it, and make some function calls: http://www.xml.com/pub/a/2003/09/03/trxml.html.

      Does this answer your question?

      • Raju says:

        Hi,

        I am getting the following error.

        “The following application error(s) occurred:
        Failed to render content because of an error java.lang.NoSuchMethodException: For extension function, could not find method org.apache.xml.utils.NodeVector.stripTags([ExpressionContext,] ). ”

        Thanks,
        Raju

        • I have updated my post. It now gives a complete example. Any other problem must be related to your specific XSLT processor. Please understand that I cannot help you with that.

  2. Pingback: Stripping (HTML) tags in XSLT | mkjay

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>