HTB 2.0 - HTML/XML/XSL Beautifier and Re-formatter


Table of Contents


Author

   T.G. Schramer Consulting
   (949) 249-1824 
   lastimo@cox.net 

Back to Table of Contents

Files

HTB is a stand-alone command-line binary program and nothing more is required.

The following files are included in this package:

 htb_docs.html      -  This file.
 license.txt        -  License agreement.
 readme.txt         -  General info.
 linux
   htb              -  Linux kernel binary. (TBD)
 sun                
   htb              -  SunOS binary.
 windows
   htb.exe          -  Win32 binary, Win95, Win98, WinNT, Win2000, WinXP.
   htb.ico          -  Windows icon. Use with runhtb.bat or as desired.
   runhtb.bat       -  Batch file for WinNT/2000/XP to allow drag and drop
                        processing from Desktop.
Back to Table of Contents

Overview

HTB is a command-line program that reformats HTML and XML (including XHTML & XSL) source for consistency and better readability. Although the problem of cleaning-up tagged files may seem straight-forward, in actuality it is a surprisingly subjective problem, depending on which tags were used, their intent, and the complexity of the overall document. HTB leverages the experience and intent of HTML usage in the rendering of each tag and extends the official HTML specification with commonly used legacy tags. One of nine logical behaviors was assigned to each tag based not only on the HTML specification, but on how the tag is commonly used in the field. Because this subjectivity may not appeal to all, a comprehensive set of other rendering options is available to suit other tastes and document styles.

Back to Table of Contents

License

HTB is free and may be freely distributed with adherence to the following license agreement:

License Agreement

By downloading this software, you indicate
that you agree with the terms of this agreement.

Important - Please read this agreement carefully.

Copyright:
This software program and any associated material are protected by copyright law. The HTB program is a proprietary product of T.G. Schramer Consulting. T.G. Schramer Consulting retains title to and ownership in the copyright of the HTB software program and the associated materials.

Redistribution and use of the HTB binary and accompanying documentation, with or without modification, are permitted provided that the following conditions are met:

  • Redistribution of documentation and/or HTB binary program must retain the above copyright notice, this list of conditions and the disclaimer below.

  • Redistribution must be free of charge whether stand-alone or part of a larger package such as a CD-ROM archive.

Disclaimer:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Back to Table of Contents

Installation

Simply copy the correct binary file to your system's program directory (c:\windows\system32, c:\win98\command, /usr/bin, etc.) or create a directory for HTB and run it directly from there. For Windows NT ,2000, and XP, optionally create a Desktop shortcut to "runhtb.bat", and files may be processed directly from a drag-and-drop operation. If using this drag-and-drop method, reformated files are co-located with the originals but have the extension ".htb". Rename the files to have the correct extension or associate the ".htb" extension with a Web browser to open directly.

Back to Table of Contents

Execution

To run HTB, type "htb" from a command-line prompt. Running HTB without arguments will display a summary screen of the options. Specifying a single filename containing markup tags will send the re-formatted output to the screen. Specify a second file or redirect "standard output" to save the output to a new file. The original file is never altered. The single character command-line options may be preceded with either "-" or "/" and may be combined into one argument. The order of the options is not important, but only the last value in a set of conflicting options will be effective. The default behavior may not provide the best cleanup, so try different command-line options to get the desired results. Option combinations -as, -ams, -n have given good results also.

One other way to run HTB is with the new -f option. This forces HTB to run as a "filter" which reads from the "standard input" stream and sends results to the "standard output" stream. This allows HTB to be embedded into other programs and processes.

Back to Table of Contents

Errors & Verification

Modern browsers are amazingly forgiving of poorly written HTML. HTB will usually report the line number and text of offending tags and ignores them in the re-formatting process. Occasionally with quoting inconsistencies, HTB will stop re-formatting at the point the problem was found. This reporting mechanism makes HTB useful as a simple syntax validator, even if the generated output is not saved. HTB does not attempt to correct invalid files and in most cases still does a good job of re-formatting. If syntax correction is needed, try Tidy from W3C which is a very good cleanup program. Tidy may even be used in combination with HTB using command-line piping and the HTB -f option.

All HTB error messages sent to the "standard error" stream which can be captured to a file using "standard error" redirection ("2>" or "2>>").

Example 1:

- Beautify myfile.htm and save the output to newfile.htm but save error messages in another file called error.txt by redirecting the "standard error" stream to a file.

htb myfile.htm newfile.htm 2> error.txt (error.txt will be created or overwritten)
htb myfile.htm newfile.htm 2>> error.txt (error.txt will be created or appended to)

Example 2:

- Correct syntax errors using Tidy from W3C and then beautify myfile.htm and save the output to newfile.htm. Save error messages from both programs to a file called error.txt by redirecting the "standard error" stream to a file.

tidy myfile.htm 2> error.txt | htb -f > newfile.htm 2>> error.txt

Back to Table of Contents

Version 2.0

HTB 2.0 is a major enhancement over 1.0 which has been available for several years. In addition to options -a, -e, -f, -j, -r, -t, -x, -y, -z, many bug fixes and enhancements have been added. Among them, are correct handling of APPLET, OBJECT, SCRIPT & STYLE tags and much better handling of errors, comments and nested TABLES with their rows and cells. Many of the new options allow separation of HTML from other data in the document, like text, comments, or non-HTML tags.

Back to Table of Contents

XML Support (including XHTML & XSL)

In the new hybrid world of Server Pages and XSL, HTB 2.0 was also expanded to support XML compliant syntax including XHTML & XSL and be forgiving of custom markup tags often added by HTML extensions and third party Web applications. XSL beautification is now fully supported with logical rendering behaviors assigned to every element in the XSL 1.0 specification. Hybrid XHTML/XML documents can still be beautified with HTML case changes, since the likely case sensitive XML tags are handled independently of HTML which are not case sensitive. This special XML tag handling is done automatically whenever an XML compliant file is detected or may be forced on using the -x option for files containing "well-formed" XML, but may not strictly adhere to the XML specification. XML auto detection extends to ASP and JSP files, although these formats are not strictly XML compliant. The -y option has been added to switch off special XML handling and treat all tags the same whether HTML or not.

Back to Table of Contents


Default Behavior

When running HTB without options, the default behavior is to indent with increments of 3 spaces, make all tags and their attributes upper-case, and break tags exceeding the 80 column limit. (It should be noted, the output will often exceed 80 columns since whitespace integrity places a restriction on line-break insertion). Tag attribute values are always rendered within quotes to correct this common omission and in sorted order within the tag. In addition, XML auto-detection is active. If an XML file is detected, non-HTML tags are assumed to be "well-formed" XML (including XSL) and the correct indenting is applied. Case changes are not applied to these unknown XML tags, since they are often case sensitive. XML attributes are also kept in their original order within the tag instead of being sorted as with HTML attributes. If XML is not detected, unknown tags are ignored.

Before:
<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" botmargin="0" marginwidth="0" marginheight="0" link="#666666" vlink="#666666" alink="#000000">
<table width="800" border="0" cellpadding="0" cellspacing="0">
<tr>
<td colspan="2" width="196" bgcolor="cccccc" valign="top"><img src="/images/homepage/rev/logo_06.gif" width="196" height="63"></td>
<td bgcolor="cccccc" width="600" valign="top">
<table width="600" border="0" cellpadding="0" cellspacing="0" valign="top">
<tr>
<td valign="top" height="17" bgcolor="#CCCCCC"><img src="/images/homepage/rev/comp8_07.gif" width="600" height="17"></td>
</tr>
After:
<BODY ALINK="#000000" BGCOLOR="#FFFFFF" BOTMARGIN="0" LEFTMARGIN="0" 
      LINK="#666666" MARGINHEIGHT="0" MARGINWIDTH="0" TOPMARGIN="0" 
      VLINK="#666666">
<TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" WIDTH="800">
   <TR>
      <TD BGCOLOR="cccccc" COLSPAN="2" VALIGN="top" WIDTH="196"><IMG HEIGHT="63"
                                                                     SRC="/images/homepage/rev/logo_06.gif"
                                                                     WIDTH="196"></TD>
      <TD BGCOLOR="cccccc" VALIGN="top" WIDTH="600"> 
         <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top" 
                WIDTH="600">
            <TR>
               <TD BGCOLOR="#CCCCCC" HEIGHT="17" VALIGN="top"><IMG HEIGHT="17"
                                                                   SRC="/images/homepage/rev/comp8_07.gif"
                                                                   WIDTH="600"></TD>
            </TR>

Back to Table of Contents

Options

-a:   Multi-Attribute Tag Break

The -a command-line option causes all tags containing more that one attribute to be broken over multiple lines, each with a single attribute. The attributes are aligned vertically with the first attribute. A similar attribute break will occur by default, but only on tags exceeding the column 80 limit, and each line may contain more than one attribute.

Before:

<BODY BGCOLOR="#FFFFFF" MARGINWIDTH="0" MARGINHEIGHT="0" LINK="#666666" VLINK="#666666" ALINK="#000000">
<TABLE WIDTH="800" BORDER="0" CELLPADDING="0" CELLSPACING="0">
<TR>
<TD COLSPAN="2" WIDTH="196" BGCOLOR="cccccc" VALIGN="top"><IMG SRC="/images/homepage/rev/logo_06.gif" WIDTH="196" HEIGHT="63"></TD>
<TD BGCOLOR="cccccc" WIDTH="600" VALIGN="top">
<TABLE WIDTH="600" BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top">
<TR>
<TD VALIGN="top" HEIGHT="17" BGCOLOR="#CCCCCC"><IMG SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600" HEIGHT="17"></TD>
</TR>
After:
<BODY ALINK="#000000"
      BGCOLOR="#FFFFFF"
      LINK="#666666"
      MARGINHEIGHT="0"
      MARGINWIDTH="0"
      VLINK="#666666">
<TABLE BORDER="0"
       CELLPADDING="0"
       CELLSPACING="0"
       WIDTH="800">
   <TR>
      <TD BGCOLOR="cccccc"
          COLSPAN="2"
          VALIGN="top"
          WIDTH="196"><IMG HEIGHT="63"
                           SRC="/images/homepage/rev/logo_06.gif"
                           WIDTH="196"></TD>
      <TD BGCOLOR="cccccc"
          VALIGN="top"
          WIDTH="600"> 
         <TABLE BORDER="0"
                CELLPADDING="0"
                CELLSPACING="0"
                VALIGN="top"
                WIDTH="600">
            <TR>
               <TD BGCOLOR="#CCCCCC"
                   HEIGHT="17"
                   VALIGN="top"><IMG HEIGHT="17"
                                    SRC="/images/homepage/rev/comp8_07.gif"
                                    WIDTH="600"></TD>
            </TR>
Back to Table of Contents

-b:   All Attribute Tag Break

The -b command-line option causes all tag attributes to be broken on succeeding lines. The attributes are aligned vertically with the last character in the tag name.

Before:

<BODY BGCOLOR="#FFFFFF" MARGINWIDTH="0" MARGINHEIGHT="0" LINK="#666666" VLINK="#666666" ALINK="#000000">
<TABLE WIDTH="800" BORDER="0" CELLPADDING="0" CELLSPACING="0">
<TR>
<TD COLSPAN="2" WIDTH="196" BGCOLOR="cccccc" VALIGN="top"><IMG SRC="/images/homepage/rev/logo_06.gif" WIDTH="196" HEIGHT="63"></TD>
<TD BGCOLOR="cccccc" WIDTH="600" VALIGN="top">
<TABLE WIDTH="600" BORDER="0" CELLPADDING="0" CELLSPACING="0" VALIGN="top">
<TR>
<TD VALIGN="top" HEIGHT="17" BGCOLOR="#CCCCCC"><IMG SRC="/images/homepage/rev/comp8_07.gif" WIDTH="600" HEIGHT="17"></TD>
</TR>
After:
<BODY
    ALINK="#000000"
    BGCOLOR="#FFFFFF"
    BOTMARGIN="0"
    MARGINHEIGHT="0"
    MARGINWIDTH="0"
    LEFTMARGIN="0"
    LINK="#666666"
    TOPMARGIN="0"
    VLINK="#666666">
<TABLE
     BORDER="0"
     CELLPADDING="0"
     CELLSPACING="0"
     WIDTH="800">
   <TR>
      <TD
        BGCOLOR="cccccc"
        COLSPAN="2"
        VALIGN="top"
        WIDTH="196"><IMG
                       HEIGHT="63"
                       SRC="/images/homepage/rev/logo_06.gif"
                       WIDTH="196"></TD>
      <TD
        BGCOLOR="cccccc"
        VALIGN="top"
        WIDTH="600"> 
         <TABLE
              BORDER="0"
              VALIGN="top"
              CELLPADDING="0"
              CELLSPACING="0"
              WIDTH="600">
            <TR>
               <TD
                 BGCOLOR="#CCCCCC"
                 HEIGHT="17"
                 VALIGN="top"><IMG
                                HEIGHT="17"
                                SRC="/images/homepage/rev/comp8_07.gif"
                                WIDTH="600"></TD>
            </TR>
Back to Table of Contents

-c:   Add Carriage Returns

The -c command-line option adds an extra carriage return character to each output line of reformatted data. This allows Unix versions of HTB to create a DOS/Windows compatible text files directly.

Back to Table of Contents

-d:   Omit Carriage Returns

The -d command-line option inhibits extra carriage return character output even if present in the source data. This allows the Windows version of HTB to create a Unix compatible text file directly. This is the default behavior and correctly creates a natively compatible format whether Unix or Windows.

Back to Table of Contents

-e:   Escaped Tag Conversion

The -e command-line option replaces the special markup characters "<", ">", and "&" with escape strings "&lt;", "&gt;", and "&amp;" respectively. Also, the tag sequence "<HTML><BODY><PRE>" is added to the beginning of the output data and the sequence "</PRE></BODY></HTML>" is appended to the end of the data. This creates an entirely new HTML document, which when viewed with a Web Browser, will appear as source instead of normal rendering. This is useful in creating markup tag documentation and is the mechanism used to create the examples in this document. Use in combination with the -k option to do the conversion without applying other reformatting options.

Back to Table of Contents

-f:   Run as Filter (use stdin & stdout)

The -f command-line option will cause HTB to read from the "standard input" stream and write to "standard output". This makes HTB a filter program and allows embedding HTB within other stream manipulation processes and programs like command-line "piping". Other options may be combined with the filter option, but all file names specified with HTB are ignored.

Example Usage:

- Display only lines containing the text string, "hidden" in myfile.htm (most likely <INPUT TYPE="hidden"...> tags) and beautify them with HTB attribute break (-a option) using command-line piping...

findstr "hidden" myfile.htm | htb -af (Windows command-line)
grep -i hidden myfile.htm | htb -af (Unix command-line)

See the Errors & Verification section for another example of the HTB -f option when used in combination with the Tidy HTML cleanup program.

Back to Table of Contents

-h:   Help Screen

The -h command-line option (or incomplete/invalid command-line options) will display the following Help Screen:

 htb - HTML/XML Beautifier 2.0, TG Schramer Consulting, lastimo@cox.net

  "htb" is a program to beautify HTML/XML files and has the following format:
              "htb -(options) <input filename> <new filename>"

 Options:
   a: Force break of all multi-attribute Tags with alignment on the 1st one
        (default for Tags going over 80 columns as whitespace permits).
   b: Force break of every Tag Attribute onto a new line with alignment on the
        last character in the Tag.
   c: Force extra Carriage Return character after each line (allows creation
        of DOS compatible file from Unix system).
   d: Never add extra Carriage Return character after each line (default).
   e: Escape Tag characters & create browser viewable source conversion
        (ie. "<" to "&lt;", ">" to "&gt;", etc.).
   f: Run as filter - read from standard input and write to standard output
        (any file names also specified are ignored).
   h: This screen.
   j: Join lines wherever possible and remove comments & extra whitespace.
        (overrides re-formatting options and compresses output).
   k: Keep current layout, just apply upper/lower case (overrides non-case 
        related options).
   l: Make Tag names lower case.
   m: Make Tag Attribute case the opposite of the Tag name.
   n: Never break Tag Attributes onto separate lines.
   r: Remove Non-HTML tags. HTML 4.01 and common legacy Tags remain
        (overrides x option).
   s: Remove tabs from SCRIPTS and indent using blanks. Scripts could look
        worse, but the tabs are gone. By default scripts are not changed.
   t: Strip all but plain text content from input. No tags or comments remain.
   u: Make Tag names upper case (default).
   x: Treat unknown tags as "well-formed" XML. Case changes & attribute 
        sorting only applied to known HTML tags (default if XML detected).
   y: Turn off XML detection (overrides x option, case changes go on all Tags).
   z: Remove stand-alone comments (not within SCRIPT, STYLE, etc).
 0-9: Use (number) of spaces for indenting (default = 3).

 Options may be combined into one argument and any order (ie. -l -m -5 = -lm5).
 If output file is not defined, re-formatted data is sent to "standard out".
 Defaults: Tags/Attributes upper case, break Tags over 80 & indent by 3.

 Examples:
 - Make Tags and Attributes lower case and use 4 for indenting:
                       "htb -l4 index.html newindx.html"

 - Defaults + no Tag breaking, remove comments & treat non-HTML tags as XML.
                       "htb -nxz index.html newindx.html"    
Back to Table of Contents

-j:   Join Lines - Compress Output

The -j command-line option removes all unnecessary whitespace & comments and joins the output lines together whenever possible. The result is totally "unbeautified" output, but the size will be reduced from 10-40% for quicker transfer over the network. Use this option whenever performance is more important than readability.

Back to Table of Contents

-k:   Keep Layout - Case Changes Only

When the current indenting and appearance of your tagged document is acceptable, the -k command-line option may be used to change only the case of the tag names and attributes with no other changes applied.

Example:

- Keep the current layout of an HTML document, but change the tag attribute names to lower case (-m option, opposite of tag name case which by default is upper)...

htb -km myfile.html

Before:
<FORM ENCTYPE="multipart/form-data" NAME="coreform" METHOD="POST">
<INPUT TYPE="submit" VALUE="Submit Request"> 
<INPUT NAME="cgi" TYPE="button" VALUE="cgi2xml">cgi2xml 
<TABLE BORDER="5" CELLPADDING="5">
   <TR>
      <TD> <FONT COLOR="purple"> 
         <H4>Output formatting:</H4> </FONT>Debug: 
         <INPUT NAME="debug"><BR> 
         <BR> Filter: 
         <INPUT NAME="filter"><BR> Output: 
         <INPUT NAME="output"><BR> 
         <BR> Pagestart: 
         <INPUT SIZE="4" NAME="pagestart"><BR> Pagesize: 
         <INPUT SIZE="4" NAME="pagesize"><BR> 
      </TD>
   </TR>
</TABLE>
</FORM>
After:
<FORM enctype="multipart/form-data" name="coreform" method="POST">
<INPUT type="submit" value="Submit Request"> 
<INPUT name="cgi" type="button" value="cgi2xml">cgi2xml 
<TABLE border="5" cellpadding="5">
   <TR>
      <TD> <FONT color="purple"> 
         <H4>Output formatting:</H4> </FONT>Debug: 
         <INPUT name="debug"><BR> 
         <BR> Filter: 
         <INPUT name="filter"><BR> Output: 
         <INPUT name="output"><BR> 
         <BR> Pagestart: 
         <INPUT size="4" name="pagestart"><BR> Pagesize: 
         <INPUT size="4" name="pagesize"><BR> 
      </TD>
   </TR>
</TABLE>
</FORM>
Back to Table of Contents

-l:   Tag Names Lower Case

The -l command-line option changes all HTML tag names and their attributes to lower case. Combine with the -m (mixed case) option to keep the tag names lower case, but make the attribute names upper case.

Before:

<FORM ENCTYPE="multipart/form-data" NAME="coreform" METHOD="POST">
<INPUT TYPE="submit" VALUE="Submit Request"> 
<INPUT NAME="cgi" TYPE="button" VALUE="cgi2xml">cgi2xml 
<TABLE BORDER="5" CELLPADDING="5">
   <TR>
      <TD> <FONT COLOR="purple"> 
         <H4>Output formatting:</H4> </FONT>Debug: 
         <INPUT NAME="debug"><BR> 
         <BR> Filter: 
         <INPUT NAME="filter"><BR> Output: 
         <INPUT NAME="output"><BR> 
         <BR> Pagestart: 
         <INPUT SIZE="4" NAME="pagestart"><BR> Pagesize: 
         <INPUT SIZE="4" NAME="pagesize"><BR> 
      </TD>
   </TR>
</TABLE>
</FORM>
After:
<form enctype="multipart/form-data" method="post" name="coreform">
<input type="submit" value="Submit Request"> 
<input name="cgi" type="button" value="cgi2xml">cgi2xml 
<table border="5" cellpadding="5">
   <tr>
      <td> <font color="purple"> 
         <h4>Output formatting:</h4> </font>Debug: 
         <input name="debug"><br> 
         <br> Filter: 
         <input name="filter"><br> Output: 
         <input name="output"><br> 
         <br> Pagestart: 
         <input name="pagestart" size="4"><br> Pagesize: 
         <input name="pagesize size="4"><br> 
      </td>
   </tr>
</table>
</form>
Back to Table of Contents

-m:   Tag Attributes Opposite Case

The -m command-line option makes the tag attribute case the opposite of the tag name. Since the HTB default is to make tag names upper case, the addition of this option will make the tag attributes lower case. If combined with the -l option (lower case) the tag names will be lower case, and the tag attributes will be upper case. See the -k option for an example.

Back to Table of Contents

-n:   Never Break Tags Between Lines

The -n command-line option cancels the default behavior of breaking tags which exceed the 80 column limit and keeps tags intact within a single line of output regardless of their length. This is often desirable, especially on XSL files.

Back to Table of Contents

-r:   Remove Non-HTML Tags

The -r command-line option strips any tag which is not part of the HTML 4.01 specification (and a group of widely recognized, commonly used legacy tags) from the output. Its a convenient way to separate HTML from hybrid files like ASP, JSP, XSL or files containing custom tags. The stripped tags are reported along with any errors to "standard error".

Example:

- Remove all non-HTML tags from an XSL/XHTML file...

htb -r myfile.xsl

Before:
   <xsl:for-each select="ELEMENT/NODE1"> 
      <xsl:variable select="position()-1" name="vpos" /> 
      <TR VALIGN="top">
         <TD ALIGN="center"><FONT SIZE="1" FACE="Helvetica"><xsl:value-of select="$vpos" /></FONT> 
         </TD>
         <TD ALIGN="center"><FONT FACE="Helvetica"> 
            <INPUT NAME="ELEM{$vpos}" TYPE="text" VALUE="Element {$vpos}" /></FONT> 
         </TD>
         <TD ALIGN="center"><FONT FACE="Helvetica"> 
            <INPUT NAME="NUMB{$vpos}" TYPE="text" VALUE="2" /></FONT> 
         </TD>
         <TD ALIGN="center"><FONT FACE="Helvetica"> 
            <xsl:variable select="count(//NODE1[@id &gt; -1)" name="pcnt" /> 
            <xsl:variable name="selsize"> 
               <xsl:choose><xsl:when test="$pcnt &lt; 5"> 
                  <xsl:value-of select="$pcnt" /> 
               </xsl:when><xsl:otherwise> 
                  <xsl:value-of select="'5'" /> 
               </xsl:otherwise></xsl:choose> 
            </xsl:variable> 
            <SELECT SIZE="{$selsize}" NAME="VALU{$vpos}">
               <xsl:for-each select="//VALUE[@id &gt; -1]"> 
                  <OPTION VALUE="{@id}">
                  <xsl:value-of select="NAME" /></OPTION> 
               </xsl:for-each> 
            </SELECT></FONT> 
         </TD>
      </TR>
   </xsl:for-each> 
After:
   <TR VALIGN="top">
      <TD ALIGN="center"><FONT FACE="Helvetica" SIZE="1"></FONT> 
      </TD>
      <TD ALIGN="center"><FONT FACE="Helvetica"> 
         <INPUT NAME="ELEM{$vpos}" TYPE="text" VALUE="Element {$vpos}" /></FONT> 
      </TD>
      <TD ALIGN="center"><FONT FACE="Helvetica"> 
         <INPUT NAME="NUMB{$vpos}" TYPE="text" VALUE="2" /></FONT> 
      </TD>
      <TD ALIGN="center"><FONT FACE="Helvetica"> 
         <SELECT NAME="VALU{$vpos}" SIZE="{$selsize}">
            <OPTION VALUE="{@id}"></OPTION>
         </SELECT></FONT> 
      </TD>
   </TR>
Back to Table of Contents

-s:   Remove Tabs from SCRIPTs

HTB automatically removes any tab characters found in the source document during the indenting process, but by default SCRIPTs are kept intact. To completely remove all tabs, specify the -s option and tab characters found within SCRIPT elements will be replaced with sets if of indented spaces. This could make the indented script statements look slightly worse and may require minor editing, but the beautified output is clear of any tab characters.

Back to Table of Contents

-t:   Convert to Plain Text

The -t command-line option strips all markup tags, comments and converts the input to plain text. All ASCII and ISO8859-1 HTML escape strings are converted back to the characters they represent. An attempt is made to compress extra whitespace, but in general the text will require additional re-formatting to be made presentable. Use this option to isolate the textual content within tagged documents (not necessarily HTML) for use in other documentation.

Back to Table of Contents

-u:   Tag Names Upper Case

The -u command-line option changes all HTML tag names and their attributes to upper case. Since this is the default behavior of HTB, it is not required. Use the -m (mixed case) option to keep the tag names upper case, but make the attribute names lower case.

Before:

<form enctype="multipart/form-data" name="coreform" method="POST">
<input type="submit" value="Submit Request"> 
<input name="cgi" type="button" value="cgi2xml">cgi2xml 
<table border="5" cellpadding="5">
   <tr>
      <td> <font color="purple"> 
         <h4>Output formatting:</h4> </font>Debug: 
         <input name="debug"><br> 
         <br> Filter: 
         <input name="filter"><br> Output: 
         <input name="output"><br> 
         <br> Pagestart: 
         <input size="4" name="pagestart"><br> Pagesize: 
         <input size="4" name="pagesize"><br> 
      </td>
   </tr>
</table>
</form>
After:
<FORM ENCTYPE="multipart/form-data" METHOD="POST" NAME="coreform">
<INPUT TYPE="submit" VALUE="Submit Request"> 
<INPUT NAME="cgi" TYPE="button" VALUE="cgi2xml">cgi2xml 
<TABLE BORDER="5" CELLPADDING="5">
   <TR>
      <TD> <FONT COLOR="purple"> 
         <H4>Output formatting:</H4> </FONT>Debug: 
         <INPUT NAME="debug"><BR> 
         <BR> Filter: 
         <INPUT NAME="filter"><BR> Output: 
         <INPUT NAME="output"><BR> 
         <BR> Pagestart: 
         <INPUT NAME="pagestart" SIZE="4"><BR> Pagesize: 
         <INPUT NAME="pagesize" SIZE="4"><BR> 
      </TD>
   </TR>
</TABLE>
</FORM>
Back to Table of Contents

-x:   Unknown Tags are XML

HTB automatically detects XML compliant files and is able to apply reformatting to unknown tags since they meet the predictable behavior of the XML specification. If the input document is not strictly XML compliant, but does contain custom tagging which may be considered "well-formed" XML, the -x option may be used to apply XML handling on these otherwise ignored tags. If XML is detected, either automatically, or with the -x option, the tag case is NOT changed for these non-HTML tags, since they are often case-sensitive. Also, the attributes of unknown tags will remain in original order instead of being sorted as with HTML attributes. To turn off XML auto-detection and apply case changes and attribute sorting to all tags known and unknown, use the -y option.

Example:

- Make tag names and attributes lower case, never break tags, and treat unknown tags in an HTML file as well formed XML...

htb -lnx myfile.html

Before:
<TR><TD WIDTH=182 ALIGN=left BGCOLOR="#ffffff">
<NYT_HEADLINE>
<A

HREF="/onthisday/20020619.html"><FONT SIZE="3" FACE="times"><B>On June 19 ...<BR></B></FONT></A>
</NYT_HEADLINE>
<NYT_BYLINE>
<FONT SIZE="-1"></FONT>
</NYT_BYLINE>
<NYT_SUMMARY>
<FONT SIZE="-1">
<B>1964:</B> The Civil Rights Act of 1964 was approved.   (<A 
HREF="/onthisday/big/0619.html">See this front page.</A>) <BR>
<B>1903:</B> Lou Gehrig was born.  <A 
HREF="/onthisday/bday/0619.html">(Read about his life.)</A> <BR>
<B>1886:</B> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. <A 
HREF="/onthisday/harp/0619.html">(See the cartoon.)</A></FONT>
</TD></TR>
After:
<tr>
   <td align="left" bgcolor="#ffffff" width="182"> 
      <NYT_HEADLINE> 
         <a href="/onthisday/20020619.html"><font face="times" size="3"><b>On June 19 ...<br></b></font></a> 
      </NYT_HEADLINE> 
      <NYT_BYLINE> <font size="-1"></font> 
      </NYT_BYLINE> 
      <NYT_SUMMARY> <font size="-1"> <b>1964:</b> The Civil Rights Act of 1964 was approved. (<a href="/onthisday/big/0619.html">See this front page.</a>) 
         <br> <b>1903:</b> Lou Gehrig was born. 
         <a href="/onthisday/bday/0619.html">(Read about his life.)</a> 
         <br> <b>1886:</b> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. 
         <a href="/onthisday/harp/0619.html">(See the cartoon.)</a></font> 
   </td>
</tr>
Back to Table of Contents

-y:   Turn off XML detection

HTB automatically detects XML compliant files and treats the unknown tags differently than HTML tags. XML tags are indented as whitespace permits and case changes & attribute sorting are not applied. To turn off this default behavior and apply case changes & sorting to all tags known and unknown, specify the -y option.

Example:

- Never break tags, make all tags lower case whether HTML or not, and do not change indenting for unknown tags...

htb -lny myfile.html

Before:
<TR><TD WIDTH=182 ALIGN=left BGCOLOR="#ffffff">
<NYT_HEADLINE>
<A

HREF="/onthisday/20020619.html"><FONT SIZE="3" FACE="times"><B>On June 19 ...<BR></B></FONT></A>
</NYT_HEADLINE>
<NYT_BYLINE>
<FONT SIZE="-1"></FONT>
</NYT_BYLINE>
<NYT_SUMMARY>
<FONT SIZE="-1">
<B>1964:</B> The Civil Rights Act of 1964 was approved.   (<A 
HREF="/onthisday/big/0619.html">See this front page.</A>) <BR>
<B>1903:</B> Lou Gehrig was born.  <A 
HREF="/onthisday/bday/0619.html">(Read about his life.)</A> <BR>
<B>1886:</B> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. <A 
HREF="/onthisday/harp/0619.html">(See the cartoon.)</A></FONT>
</TD></TR>
After:
<tr>
   <td align="left" bgcolor="#ffffff" width="182"> 
      <nyt_headline> 
      <a href="/onthisday/20020619.html"><font face="times" size="3"><b>On June 19 ...<br></b></font></a> 
      </nyt_headline> 
      <nyt_byline> <font size="-1"></font> 
      </nyt_byline> 
      <nyt_summary> <font size="-1"> <b>1964:</b> The Civil Rights Act of 1964 was approved. (<a href="/onthisday/big/0619.html">See this front page.</a>) 
      <br> <b>1903:</b> Lou Gehrig was born. 
      <a href="/onthisday/bday/0619.html">(Read about his life.)</a> 
      <br> <b>1886:</b> Harper's Weekly featured a cartoon about the proposed annexation of Nova Scotia. 
      <a href="/onthisday/harp/0619.html">(See the cartoon.)</a></font> 
   </td>
</tr>
Back to Table of Contents

-z:   Remove Comments

The -z command-line option removes all stand-alone comments from the input data. This does not include JavaScript comments or comment blocks within APPLET, OBJECT, SCRIPT, and STYLE tags used to hide text from browsers. The revised output should render and function as the original. The -z option is useful in reducing tagged file sizes when the comment blocks are no longer needed, or in removing dead, commented-out sections within documents which tend to collect over time. The stripped comments are not lost, however. These are sent to the "standard error" stream and may be collected in another file for reference or for use in documentation by "standard error" redirection ("2>" or "2>>"). If "standard error" is not redirected, the stripped comments will be seen scrolling by on the screen. Use in combination with the -k option to strip comments without otherwise changing the document layout.

Example Usage:

- Beautify myfile.htm and save the output to newfile.htm but save the stripped comments in another file called comments.txt by redirecting the "standard error" stream to a file.

htb -z myfile.htm newfile.htm 2> comments.txt (comments.txt will be created or overwritten)
htb -z myfile.htm newfile.htm 2>> comments.txt (comments.txt will be created or appended to)


Back to Table of Contents

0-9:   Spaces for Indenting

A command-line option from 0 to 9 represents the number of spaces used for increments of indenting. Specifying 0 will cause all indenting to be removed and the tags will shifted to the left. If not specified, the default is to indent by 3.