This topic explains how to apply and configure the HTML Cleanup tool, which identifies and reports HTML/XHTML coding and structural problems in the input and (optionally) transforms the code by fixing these problems and optionally converting the code into XHTML.

Sections include:

Understanding HTML Cleanup

The HTML Cleanup tool identifies and reports HTML/XHTML coding and structural problems in the input. Additionally, the tool can be used to transform the code by fixing these problems and optionally converting the code into XHTML. The transformed code is returned as an output.

By default, the HTML Cleanup tool is configured to clean HTML files. If you want the tool to convert code to XHTML, operate on HTML fragments, or operate on ASP, JSP, and PHP files, you change the configuration settings as described in Configuring HTML Cleanup.

It can be chained to another test to operate on the browser requests that occur as web scenario steps execute. Or, it can be used as a standalone test that operates on the file/text specified in the tool configuration panel’s Input tab.

HTML Cleanup supports 4.01 HTML Character Entities.

As a test suite tool, it allows you to identify and clean HTML problems as part of your functional test scenario. To identify HTML problems during static analysis, use the "Check HTML Well-Formedness" rule, which is in the Cleanup HTML category. This rule has the same customization options as the HTML Cleanup tool. Note that the static analysis option does not allow you also transform the code by fixing these problems and optionally converting the code into XHTML.

Configuring HTML Cleanup

You can customize the following options:

  • Show informational messages: With the default tool configuration, this option determines whether SOAtest reports the changes made during cleanup. If you changed the HTML Cleanup configuration so that it no longer sends a "transformed source" output to the Edit tool or another tool, this option determines whether SOAtest reports the changes that are required to make the designated transformation. Messages will be reported in a results window, and can be accessed via the Window menu.

  • Process ASP, JSP, PHP files: Determines whether SOAtest attempts to perform the specified target action on available ASP, JSP, and PHP files. Note that SOAtest will ignore ASP <% ... %>, JSP <% ... %>, and PHP <? ... ?> tags in those files—as well as ignore JSP scriptlets and custom actions that dynamically generate HTML attributes or attribute values—when this option is enabled.

  • Keep embedded scripts and styles: Determines whether SOAtest attempts to extract all scripts and styles (including those that contain special characters) to external files.

  • Target Document Type: Configures the type and level of cleanup performed. For more information on the available options, see Customizing Target Document Type.
  • Add XML Declaration: Determines whether SOAtest adds an XML declaration (<?xml version="1.0"?>) at the beginning of the transformed source. This option is only available in XHTML (DTD) mode.

  • Update IDs in DOCTYPE declaration: Determines whether SOAtest replaces the IDs of the DOCTYPE declaration (if the document already contains a DOCTYPE declaration) or adds the DOCTYPE declaration with IDs (if the document does not yet contain a DOCTYPE declaration). This option is only available in XHTML (DTD) mode.

Customizing Target Document Type

You can configure the type and level of cleanup performed by changing the options listed in the HTML Cleanup configuration panel’s Target Document Type field.

The following table describes the available modes:

OptionDescriptionExample

HTML Fragment

Cleans HTML fragments, but does not convert them to XHTML. In this mode, SOAtest:

  • Adds missing end tags and reports if a missing end tag was added for an unknown tag.
  • Sets default values for attributes (i.e., those that are "true" by default).
  • Adds quotes around attribute values.
  • Checks for non-numerical values in attributes that require numerical values.
  • Removes orphaned end tags. SOAtest does not address the general structural issues in this mode.

Note

This is the default mode.

<html>
hello world
<table WIDTH=20>

is transformed into

<html>

hello world

<table WIDTH="20"></table></html>

HTML Document

Cleans complete HTML documents, but does not convert them to XHTML. In this mode, SOAtest:

  • Performs all HTML Fragment mode actions.
  • Fixes problems with the over-all document structure by ensuring that the file satisfies normal HTML requirements.
  • Documents require <HTML> <HEAD> <TITLE> </TITLE> </HEAD> <BODY> </BODY> </HTML>
  • Framesets require <HTML> <HEAD> <TITLE> </TITLE> </HEAD> <FRAMESET> </FRAMESET> </HTML>

<html>
hello world
<table WIDTH=20>

is transformed into

<html><head><title></title></head><body>
hello world
<table WIDTH="20"></table></body></html>

XHTML Fragment

Cleans HTML fragments and converts them to XHTML. In this mode, SOAtest:

  • Performs all HTML Fragment mode actions.
  • Moves embedded scripts and style sheets to external files when necessary.
  • Adds missing attributes for various tags (for example, it adds a missing src attribute for IMG tags.
  • Ensures that all attributes are lower case.

<html>
hello world
<table WIDTH=20>

is transformed into

<html>
hello world
<table width="20"> </table></html>

XHTML (DTD)

Cleans HTML documents and converts them to XHTML. In this mode, SOAtest:

  • Performs all XHTML Fragment and HTML Document mode actions.
  • Attempts to convert the document to XHTML that conforms to either the default DTD (the xhtml-transitional DTD from the W3C) or the DTD you specify in the DTD Public ID and System ID fields.
  • Adds a DOCTYPE declaration.
  • Add an XML declaration (<?xml version="1.0"?>) at the beginning of the transformed source (if the Add XML Declaration option is enabled).

<html>

hello world

is transformed into

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title />
</head><body>
hello world
<table width="20">
</table></body></html>

Saving the Transformed Files


If you want SOAtest to save the files transformed by the Cleanup HTML tool, add a Write File tool output as follows:

  1. Right-click the HTML Cleanup tool node in the Test Case Explorer tab and select Add Output from the shortcut menu. The Add Output dialog displays.



  2. Select Transformed Source from the left pane and Write File from the right pane of the Add Output dialog and click Finish. A Transformed Source> Write File node is added to the HTML Cleanup node.
  3. (Optional) Customize the Write File tool as described in Write File.