Desktop search engine

In writing this search engine and its web interface, I had the following Objectives:

Learning about indexed text search.
Becoming more familiar in working with PostgreSQL and Python.

This search engine may also serve as a good starting point for building an search engine which can be added to an online web site. I have tried to keep the design of search engine an clean and simple as possible, avoiding too many features.

Requirements

Python 2.5
PostgreSQL 8.3
tsearch2: The text search module tsearch2 is included in the source distribution, but you may need to compile and install it separately.
psycopg2: Python-PostgreSQL database adapter

Installation

Create a database. You don't have to be superuser to do this (if the superuser has granted you the permission to create databases).

$ createdb dts

dts is the default name for the database used. Next, the superuser (usually called postgres) has to execute the script /contrib/tsearch2/tsearch2.sql in the PostgreSQL source package.

postgres: POSTGRES_SOURCE/contrib/tsearch2$ psql dts <tsearch2.sql

Now the user can start working with the tsearch2 functions. In order for our engine to work, we need to initialize the database dts. Conveniently, this can be done by:

$ ./engine.py --create

This will create a table and an index to the text search vectors in the database.

Everything is now setup for the scripts to run. First, we can check what the statistics option of engine.py works. You should see something like this:

$ ./engine.py -s
Statistics ...
Indexed files: 0
Total size of files: zero

If this works, one can start indexing files in directories:

$ ./engine.py -i /some/path/ /another/path/ ...

And run queries:

$ ./engine.py hello world

Web front end

The program webapp is a simple web interface for the search engine. The web interface launches its own web server when invoked. It is also possible to run the web application behind another web server using FastCGI. In order to do so, you'll need flup which a Web Server Gateway Interface (WSGI) for FastCGI. I've only tested the application with Firefox, but it uses only very basic HTML and CSS (no Ajax), so it should work with all browsers.

Download

Source: search.tar.gz