This topic explains how to apply and configure the HTML Cleanup tool, which identifies and reports HTML/XHTML coding and structural problems in the input and (optionally) transforms the code by fixing these problems and optionally converting the code into XHTML.

Sections include:

Understanding HTML Cleanup

The HTML Cleanup tool identifies and reports HTML/XHTML coding and structural problems in the input. Additionally, the tool can be used to transform the code by fixing these problems and optionally converting the code into XHTML. The transformed code is returned as an output.

By default, the HTML Cleanup tool is configured to clean HTML files. If you want the tool to convert code to XHTML, operate on HTML fragments, or operate on ASP, JSP, and PHP files, you change the configuration settings as described in Configuring HTML Cleanup.

It can be chained to another test to operate on the browser requests that occur as web scenario steps execute. Or it can be used as a standalone test that operates on the file/text specified in the tool configuration panel’s Input tab.

HTML Cleanup supports 4.01 HTML Character Entities.

As a test suite tool, it allows you to identify and clean HTML problems as part of your functional test scenario. To identify HTML problems during static analysis, use the "Check HTML Well-Formedness" rule, which is in the Cleanup HTML category. This rule has the same customization options as the HTML Cleanup tool. Note that the static analysis option does not allow you also transform the code by fixing these problems and optionally converting the code into XHTML.

Configuring HTML Cleanup

You can customize the following options:

Customizing Target Document Type

You can configure the type and level of cleanup performed by changing the options listed in the HTML Cleanup configuration panel’s Target Document Type field.

The following table describes the available modes:

OptionDescriptionExample

HTML Fragment

Cleans HTML fragments, but does not convert them to XHTML. In this mode, SOAtest:

  • Adds missing end tags and reports if a missing end tag was added for an unknown tag.
  • Sets default values for attributes (that is, those that are "true" by default).
  • Adds quotes around attribute values.
  • Checks for non-numerical values in attributes that require numerical values.
  • Removes orphaned end tags. SOAtest does not address the general structural issues in this mode.
This is the default mode.

<html>
hello world
<table WIDTH=20>

is transformed into

<html>

hello world

<table WIDTH="20"></table></html>

HTML Document

Cleans complete HTML documents, but does not convert them to XHTML. In this mode, SOAtest:

  • Performs all HTML Fragment mode actions.
  • Fixes problems with the over-all document structure by ensuring that the file satisfies normal HTML requirements.
  • Documents require <HTML> <HEAD> <TITLE> </TITLE> </HEAD> <BODY> </BODY> </HTML>
  • Framesets require <HTML> <HEAD> <TITLE> </TITLE> </HEAD> <FRAMESET> </FRAMESET> </HTML>

<html>
hello world
<table WIDTH=20>

is transformed into

<html><head><title></title></head><body>
hello world
<table WIDTH="20"></table></body></html>

XHTML Fragment

Cleans HTML fragments and converts them to XHTML. In this mode, SOAtest:

  • Performs all HTML Fragment mode actions.
  • Moves embedded scripts and style sheets to external files when necessary.
  • Adds missing attributes for various tags (for example, it adds a missing src attribute for IMG tags).
  • Ensures that all attributes are lower case.

<html>
hello world
<table WIDTH=20>

is transformed into

<html>
hello world
<table width="20"> </table></html>

XHTML (DTD)

Cleans HTML documents and converts them to XHTML. In this mode, SOAtest:

  • Performs all XHTML Fragment and HTML Document mode actions.
  • Attempts to convert the document to XHTML that conforms to either the default DTD (the xhtml-transitional DTD from the W3C) or the DTD you specify in the DTD Public ID and System ID fields.
  • Adds a DOCTYPE declaration.
  • Add an XML declaration (<?xml version="1.0"?>) at the beginning of the transformed source (if the Add XML Declaration option is enabled).

<html>

hello world

is transformed into

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title />
</head><body>
hello world
<table width="20">
</table></body></html>

Saving the Transformed Files

If you want SOAtest to save the files transformed by the Cleanup HTML tool, add a Write File tool output as follows:

  1. Right-click the HTML Cleanup tool node in the Test Case Explorer tab and choose Add Output. The Add Output dialog opens.
  2. Select Transformed Source from the left pane and Write File from the right pane of the Add Output dialog and click Finish. A Transformed Source > Write File node is added to the HTML Cleanup node.
  3. (Optional) Customize the Write File tool as described in Write File.