Thursday, July 15, 2010

Using html2ps

html2ps is a perl script available from http://user.it.uu.se/~jan/html2ps.html that can be used to convert html to postscript [and then you can convert the postscript to pdf].

Prerequisites
Perl
Ghostscript
GSView [for viewing postscript files]
ImageMagick

Versions used
html2ps – 1.0 beta7
ActivePerl – 5.8.8.820
Ghostscript – 8.71
GSView – 4.9
ImageMagick - ImageMagick-6.6.2-3-Q16-windows-dll


Installation
  • Download the zip package from http://user.it.uu.se/~jan/html2ps.html
  • Extract the zip package to c:\
  • Rename the extracted folder to c:\html2ps
  • Download ghostscript from http://pages.cs.wisc.edu/~ghost/
  • Run the ghostscript installation file
  • Add ghostscript to the system path. Add both the bin and lib directories
  • Download the windows binary release exe of InstallMagick from http://www.imagemagick.org/script/index.php
  • Install InstallMagick and specify that the imagemagick installation path should be included in the system path
  • Ensure the perl bin directory is in the system path
  • Open a command prompt window
cd c:\html2ps
perl install
  • Accept the installation script defaults. When asked to enter the name of this directory, type c:\html2ps
  • Once the install script is finished, edit the file c:\html2ps\html2ps. Replace the line $tmpname=$posix?POSIX::tmpnam():"h2p_$$"; with
$tmpname=$posix?POSIX::tmpnam():"h2p_$$";
if($^O =~ m/win/i) {
$tmpname="h2p_$$";}
  • Ghostscript and ImageMagick aren't required for html2ps to work, but some configuration parameters and documents may need them, or other additonal libraries.


Converting a html file to postscript
Example converting the html2ps user guide
cd c:\html2ps
perl html2ps -d -D -f sample -o test.ps html2ps.html


Converting the postscript file to pdf
Use ps2pdf that comes with the ghostscript installation
ps2pdf test.ps test.pdf


Locating the table of contents at the beginning of the document
You can modify many aspects of the postscript file generated by html2ps. This would involve creating and modifying a configuration file. The file "sample" is one such configuration file. You can make a copy of it and add your own modifications to customize the file generated. Review the html2ps user guide html document for configuration options. As an example, you can set the table of contents to be generated at the start of the document instead of the end. Modify the file sample, editing the toc line to the following

option {
toc: hb;

Run html2ps and ps2pdf to confirm that the table of contents is now at the beginning of the document.

perl html2ps -d -D -f sample -o test.ps html2ps.html
ps2pdf test.ps test.pdf


Correcting display of euro signs in pdf bookmarks
If generating a table of contents, after converting the postscript file to pdf the pdf bookmarks may be displayed with two euro signs at the beginning of the bookmark text. To correct this, edit the file html2ps. Replace the line $dh.="/h$nhd [($hind\\240\\240)($htxt)] D\n"; with

$dh.="/h$nhd [($hind\\56\\40)($htxt)] D\n";

Replace the line $toc.="$hv NH le{$nref($hind\\240\\240)$hv C($htxt)$nref 1 TN()EA()BN}if\n"; with

$toc.="$hv NH le{$nref($hind\\56\\40)$hv C($htxt)$nref 1 TN()EA()BN}if\n";


Processing a file with images
If an html file contains links to images held locally and referenced with relative links e.g. src="images/sample.png", use the base option when calling html2ps providing the base url to be appended to all relative links for images.

perl html2ps -d -D -b file:///c:/some/path/ -f sample -o example.ps c:\some\path\example.html


Changing the default look of hyperlinks
By default, text for all links, both to internal document sections and to external web locations, are rendered in a final pdf surrounded by boxes. To have them rendered without the boxes, add a definition to the style sheet definitions in the configuration file used.

A:link { color: blue }
The color must be something other than black. In addition, when running html2ps, you'll need to add the -U parameter.

perl html2ps -d -D -U -f myconfig.txt -o example.ps example.html


Left aligning H1 elements
The example configuration file "sample" provided specifies that H1 elements are centred. If left aligning is required, remove the text-align: center portion of the H1 style rule.


Undesired blank pages
By default, an extra (empty) page is printed, when necessary, to ensure that the title page, the table of contents, and the document itself will start on odd pages. This is typically desirable for double sided printing. If this is not desired, add the extrapage flag to the @html2ps block of the configuration file, setting it to 0.

@html2ps {
extrapage: 0;


Starting new pages
You can have a page break inserted anywhere in the html text, e.g. before H1 elements. To do this, modify the source html and insert <!--NewPage-->


Using CSS
html2ps ignores css contained in the html document. You can define styles in the configuration file. Only a subset of css is supported by html2ps. This subset is outlined in the user guide, in the CSS2 blocks section.


Using custom colours
By default html2ps only recognizes 16 colours. These are defined in the colour block of the html2ps file. To use additional colours, e.g. in css rules, edit your configuration file. In the @html2ps block, add a colour block with the custom colours you use in the style rules. e.g.
Colour{
brown: A52A2A;
}


The user guide that comes with html2ps has explanations for all possible options for modifying the look of the generated postscript, and possibly eventual pdf document.

No comments:

Post a Comment