date: | August 26, 2006 – Still under construction |
---|

First, read the *README* to get an overview of what is available and how
to use it.

This supplements that information in particular areas.

(Also see the *I/O Libraries* section of the
*README*)

When building programs, it is often important to look at images. The function

```
pixWrite(char *filename, PIX *pix, l_int32 format)
```

writes an image to a file. See any of the programs in the prog directory for examples. You can display this image with a variety of programs, such as xv (which scales the image automatically to fit on the screen), display (which displays at full resolution), gqview (which displays at full resolution and allows easy zooming), and gimp (which is set up for image manipulation and displays by default at low resolution). For programmatic display with xv, we provide a function

```
pixDisplay(PIX *pix, l_int32 x, l_int32 y)
```

which scales the image to fit on the screen if necessary and then displays it with the UL corner at (x, y).

Images are read into memory from a file using

```
PIX *pixRead(char *filename)
```

For the file formats that are supported (PNG, JFIF_JPEG, TIFF (various compressions), PNM and BMP), the extension (if any) is ignored and the format type is determined from the file itself.

**Regression test:** prog/ioformats.c

The `Pix` data structure is the internal (memory) representation of
images in this library. It is very simple, and is described in pix.h,
along with some of the flags and other data structures that are
associated with it. The field accessors for `Pix`, provided in
pix1.c, should ALWAYS be used. The `pixClone()` function is used to
get a new handle (pointer) to the same `Pix` data structure, without
actually copying the image data to a new array. The `pixDestroy(PIX
**)` function should be used on every handle you have — see the
comments in pix1.c at the `pixClone()` definition.

Throughout we use these definitions:

- bpp
- bits/pixel (leptonica supports 1, 2, 4, 8, 16 and 32)
- ppi
- pixels/inch (resolution of image relative to original scanned page)
- src
- source image in image processing operation
- dest
- destination image in image processing operation

A `Pix` can also have a colormap, and we support a number of
operations on colormaps in colormap.c. Colormapped images can have
depths of 1, 2, 4 or 8 bpp. Where appropriate, functions will handle
both colormapped and non-colormapped `Pix`. Functions that use
interpolation, such as grayscale or color area-mapping rotation, will
make a temporary image without the colormap, and use that to compute the
dest `Pix`, which will then not have a colormap. It should be noted
that except for in-place functions, the src `Pix` is never altered.

Except for RGB images, all pixels in a `Pix` are packed (the pixels
are represented as compactly as possible without compression). Each
raster line is 32-bit aligned. See the comments in pix.h that describe
the constraints and conventions for the image data representation. RGB
images are packed into 32 bits, leaving 8 bits for an alpha channel that
is not used.

A fundamental imaging operation, this is an operation that takes a rectangular region of one image and combines it with a rectangular region of a second image, using one of 12 boolean operations, and writing the result into the second image. The 12 operations between two images are described in detail in rop.c.

There are also in-place rasterops, where a rectangular region of a single image is painted according to its (shifted) values. The in-place rasterops can be used to translate a full image, or a vertical or horizontal band of the image. The latter are used to shear the image; e.g., a horizontal shear is implemented by shifting full-width bands horizontally, as described in shear.c. With in-place rasterops, one must be careful not to overwrite data that will be used later.

All rasterops operate on images of any depth, and they are automatically
clipped to the respective images to avoid illegal reads and writes.
They have a large number of uses, including a relatively fast
implementation of binary morphology (for 1 bpp images). For examples and
details, see also the writeup at *Rasterop (a.k.a. Bitblt)*.

A large variety of efficient scaling functions can be found in
scale.c, many of which are described in *Image Scaling*. The generic
function, `pixScale()`, does the best job given the image type and the
scaling factors. The best upscaling is typically done with linear
interpolation, and the best downscaling is done either with a lowpass
filter followed by subsampling, or by area mapping. The former is a fast
anti-aliased approximation, particularly for small scaling factors
(i.e., large downscaling). The area mapping method integrates with
subpixel accuracy over the region of the src image that corresponds to
each dest pixel.

Some of the other fast scaling operations given in scale.c are:

sampling:

`pixScaleBySampling()`2x and 4x linear interpolation upscaling for gray and color images: e.g.,

`pixScaleColorLI()`integer subsampling of RGB to gray or binary; e.g.,

`pixScalRGBToGrayFast()`antialias lowpass filter downscaling:

`pixScaleSmooth()`antialias area-mapping downscaling:

`pixScaleAreaMap()`antialias downscaling from RGB to gray by 2x:

`pixScaleRGBToGray2()`downscaling 1 bpp images to 8 bpp gray by several downscaling factors (2, 3, 4, 6, 8, 16):

`pixScaleToGray()`binary scaling by pixel sampling:

`pixScaleBinary()`mipmap pyramid downscaling 1 bpp images to 8 bpp gray:

`pixScaleToGrayMipmap()`mipmap pyramid gray downscaling:

`pixScaleMipmap()`gray upscaling by 2x or 4x, followed by binarization using a threshold: e.g.,

`pixScaleGray2xLIThresh()`gray upscaling by 2x or 4x, followed by binarization using dithering: e.g.,

`pixScaleGray2xLIDither()`

Special fast scaling on binary images is also available, and is useful for image analysis of scanned binary text. Examples are:

in binreduce.c, 2x reduction of 1 bpp images using either subsampling or rank filtering: e.g.,

`pixReduceRankBinary2()`in binexpand.c, power-of-2 replicative expansion of 1 bpp images:

`pixExpandBinary()`

**Some scaling scripts**:

prog/scaletest1.c: different general scaling functions

prog/scaletest2.c: multiple tests of scale-to-gray; color scaling tests.

prog/reducetest.c: rank binary cascade of up to four 2x reductions.

prog/expandtest.c: power-of-2 replicative expansion.

**Regression test:** prog/scaletest3.c

Rotation seems mundane, but there are in fact a large number of ways of
doing it, some of which are described in *Image Rotation*. The top-level
general rotator is `pixRotate()` in rotate.c. Here’s the description
from the source file:

The general rotation pixRotate() does the best job for rotating about the image center. For 1 bpp, it uses shear; for others, it uses either shear or area mapping. If requested, it expands the output image so that no pixels are lost in the rotation, and this can be done on multiple successive shears without expanding beyond the maximum necessary size.

There are three other top-level rotation source files, each of which uses different methods for different purposes:

rotateshear.c: This has the top-level

`pixRotateShear()`to do rotation by either 2 or 3 shears about an arbitrary point. This is very fast, being implemented by a sequence of rasterops, and works for images of all depths, including colormapped. An in-place version is also implemented, using in-place rasterops to perform the in-place shear operations.rotateam.c: This has the top-level

`pixRotateAM()`to do area mapping rotation about the image center for grayscale and color images. It also has a similar function,`pixRotateAMCorner()`for rotating about the UL corner.rotateorth.c: This has the top-level functions for 90 and 180 degree rotation,

`pixRotate90()`and`pixRotate180()`, along with LR and TB flipping,`pixRotateLR()`and`pixRotateTB()`, using LUTs when feasable.

**Some rotation scripts**:

prog/rotatetest1.c: selection of different rotations, including successive rotations with unwinding.

prog/rotateorthtest1.c: various orth rotations, with timing and other tests.

**Regression tests:**

prog/rotatetest2.c

prog/rotateorthtest2.c

Image shear is another special linear transform in the plane. It can be
used to approximate a continuous rotation, using either 2 shears (for
small angles) or 3 shears. Because it is implemented with rasterops, it
is both very fast and it works for all depths. For its use in rotation,
see *Image Rotation*.

Shear can be performed either with src and dest, or in-place. The latter
uses in-place rasterops. Vertical shear is used in the implementation of
the skew angle finder. The definition of the shear transform is given in
*Affine Transformations (and cousins)*.

**Some scripts using shear**:

prog/rotatetest1.c: includes timing for various rotation by shear.

prog/sheartest.c: various shear operations about arbitrary lines, both between src and dest and in-place.

Affine transforms are the most general linear transforms in a plane. They are specified by 3 corresponding points (i.e., 6 coefficients) in the two coordinate spaces. They can be implemented both in a pointwise fashion (with or without interpolation) and as a set of successive special linear transforms (translation, scaling, shear). We provide an example of the latter, but its use in applications is deprecated; in all situations you should use the pointwise transforms. See the code in affine.c for details.

Projective and bilinear transforms are more general, nonlinear, 4-point
transforms in the plane, and they are specified by 8 coefficients. The
implementations are in projective.c and bilinear.c,
respectively. Whereas affine transforms keep straight lines straight and
preserve parallel lines, projective transforms only keep straight lines
straight. And bilinear transforms do not even preserve straight
lines. Affine transforms project a 3-dimensional scene onto a plane at
infinity, whereas projective transforms view the 3-D scene at a finite
distance, so that lines that are parallel in the affine transform all
meet at a ‘vanishing point’ in the projective transform. For example,
projective transforms can remove “keystoning” in an object imaged by a
camera at close range. See *Affine Transformations (and cousins)* for details.

**Some scripts using 3- and 4-point transforms**:

prog/affinetest.c: basic affine transform tests, plus a comparison between pointwise and sequential implementations.

prog/projectivetest.c: compares sampled and interpolated projective transforms; i.e.,

`pixProjectiveSampled()`and`pixProjectiveInterpolated()`. Sampled transforms use, for each dest pixel, the closest pixel in the src, whereas interpolated transforms take a weighted average of four src pixels for each dest pixel. For 1 bpp images, only sampled can be used; for images with depth > 1, interpolation is slower but gives better results.Likewise, prog/bilineartest.c compares sampled and interpolated bilinear transforms; i.e.,

`pixBilinearSampled()`and`pixBilinearInterpolated()`transforms.

Bin morph ...

Bin morph ...

Block convolution is my term for a convolution, applied to a grayscale
image, using a rectangular kernel with constant value. For this case,
the so-called “integral image” formulation can be used to compute the
convolution in a time that is independent of the size of the convolving
kernel. To do this, it is necessary to precompute an accumulation matrix
from which each value in the dest can be computed by adding (or
subtracting) four entries in the matrix. For details, see range. See
*Fast Convolution*. Using the same technique, it is also possible to
apply a rank order filter with a rectangular kernel to binary images,
again in a time independent of the size of the kernel.

Conn comps ...