colorize.py

colorize.py is a python module which converts various source code into HTML. It adds <span class="type"> elements to the source code, where "type" is a the type of code, e.g. keyword, comment, string, tag, ...

Download: colorize.py.gz

A simple Example

Suppose we have some (python) source which may look like:

# square of x
def sqr(x):
    return x*x

colorize.py will convert these lines into something like:

<span class="comment"># my function</span>
<span class="keyword">def</span> <span class="def">myfunc</span>(x):
    <span class="keyword">return</span> x*x

which will look like:

# square of x
def sqr(x):
    return x*x

depending on the css used.

Quick start

Download the source and name it "colorize.py" (you may want to put it into your python tree).

Use from the command line

To use colorize.py by itself, run it as

python colorize.py -c <source_code_file> <output_html_file>

This will create a standalone HTML file displaying only the source code of the input file. There are options which allow for displaying line numbers, and the end of lines, for example

python colorize.py -cEN sqr.py sqr.html

will produce HTML which looks like this:

   1# square of x$
   2def sqr(x):$
   3    return x*x$

For a list of all options, use -h. If one is writing a documentation, it would be very cumbersome to make a file of each line of code, run the above command, and then cut and paste the output into the document one is writing. Therefore, colorized.py has some very basic escape commands, which will allow for switching between HTML, txt or whatever and source code to be colorized.

For example the source code in the example above is escaped like:

will produce HTML which looks like this:

[=== code lang="py", showends=True, number=True ===]
# square of x
def sqr(x):
    return x*x
[=== end ===]

For a list of all options, use -h`.

The unescaped parts of the this data are converted into html using John Gruber's Markdown. To obtain the html for the above, run:

python colorize.py -f -m example.txt

To be able to execute this command, you will need to have markdown.py installed.

Using colorize.py as a module

If you are interested in converting a single piece of code, you can use the function colorize:

from colorize import colorize
html = colorize( lang=py', code=<your_code_string> )
# or read from file:
html = colorize( lang='py', code=<your_file>, file=True )

This function will always return a string containing colorized html, regardless of whether the source code is a string or is read from a file. Following keywords are allowed:

Keyword Type Default Description
code string required source code, but when the keyword file equals True file name of source code.
file bool False When True the keyword code referes to a file name.
lang string 'None' specify the language the source code is written in, i.e. specify explicitly which parser/colorizer should be used.
spanAttr string 'class' attribute of the <span> elements added during the conversion. Only 'class' and 'style' are allowed.
pre bool True If True, put code inside the <pre> elements, such that whitespaces will be preserved and no <br /> are required.
showends bool False If True, display $ at end of each line.
number bool False If True, number all output lines.
quite bool False If True, do not output the warnings Warning: No newline at end of code.
br string '<br />' The markup for hard breaks, has only an effect when keyword pre is set to False.
nbsp integer 2 This integer indicates how many strings '&nbsp;' will be used when replacing a single whitespace ' '.
ends string '$' When showends=True, this string is displayed at the end of each line.
lnt string '%4d' A string which is used as a template for the linenumers when number=True.
ntab integer 4 An integer specifying with how man whitespaces "\t" should be replaced (before a possible replacement from ' ' to &nbsp;.
codeTmpl string see text A string which is used as a template for the entire code. The deault value depends of the value of the keyword pre. When pre=False, the default is '%s'. When pre=True, the default is '<pre><code>%s</code></pre>'.
Table 1: keywords for function colorize.

class Colorize

If you need to make several conversions and keep track of the various css keywords being used (not all code examples on one page have to be written in the same language, use same color mapping):

from colorize import Colorize
x = Colorize( lang='py', showends=True ) # set some defaults
html1 = x( code=<code1_string>, showends=False )
html2 = x( code=<code2_string>, lang='xml' )
html3 = x( code=<file>, file=True )
css = x.css()

When an instance of the class Colorize is called as a function, its arguments have higher priority than the defaults which are set upon initialization of the instance. These default settings may always be changed using the method resetDefaults(). All these functions assume the list of keywords shown in the table above.

The method css() returns a string containing CSS data which may be used in when presenting the html data. This string may look like,

.comment { color: #ae2020; }
.keyword { color: #9e20ef; }
.esc { 
    font-family: monospace;
    color: #777;
    background-color: #eff3cf;
    font-style: normal;
    font-weight: normal; 
    }
.def { color: #0000ff; }

depending on which languages and which styles have been used to generate the html strings. The method css accepts the following arguments:

Keyword Type Description
loc string If given, this string is prepended to each css keyword, e.g. if we say loc='div#id3 span' our css examples would start as:
div#id3 span.comment { color: #ae2020; }
div#id3 span.keyword { ...
outfile string If given, the css code will be written into the file.
static string A string containing "static" css which will merely be added to the output.
Table 2: keywords for method css.

function fill

This function calls a parser which identifies the escaped sections of a document text and calls Colorize to perform the transformations for the code sections.

from colorize import fill
html = fill( <document_string> )

The unescaped sections may be transformed as well:

html = fill( <document_string>, <transformation_function> )

The second arguments is an optional function which transforms unescaped document sections. When running the program on the command line using '-f -m', this function is markdown. Any function like object may be used here, e.g.

html = fill( <document_string>, lambda s: s.replace('py','Python') )

Syntax

This section describes the escaping syntax of the document files and describes the commands available. There are two types of commands:

A: Commands with start and end "tag", like:

[=== file showends=True, ntab=8 ===]
   example.py
[=== end ===]

B: Commands that don't include text, like:

[=== defaults lang="py", br='<br>', pre=False ===]

C: Comments:

[!=== This is a comment.  This will not appear in the output. ===]

The table shows the commands available.

Command Type Attributes What it does:
code A Table 1* Includes the source code.
file A Table 1* Includes the source code found in file.
defaults B Table 1* Sets default values.
css B Table 2 Puts CSS code into its place or write the CSS into a file.
dump A None Does not do anything with the following data, just include it as is in the resulting document.
*except the keywords "code" and "file"
Table 3: Commands available in document file.