Leptonica 1.68
C Image Processing Library

baseline.c File Reference

Find text baselines; determine and remove local skew. More...

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "allheaders.h"

Go to the source code of this file.

Defines

#define DEBUG_PLOT   0

Functions

NUMApixFindBaselines (PIX *pixs, PTA **ppta, l_int32 debug)
PIXpixDeskewLocal (PIX *pixs, l_int32 nslices, l_int32 redsweep, l_int32 redsearch, l_float32 sweeprange, l_float32 sweepdelta, l_float32 minbsdelta)
l_int32 pixGetLocalSkewTransform (PIX *pixs, l_int32 nslices, l_int32 redsweep, l_int32 redsearch, l_float32 sweeprange, l_float32 sweepdelta, l_float32 minbsdelta, PTA **pptas, PTA **pptad)
NUMApixGetLocalSkewAngles (PIX *pixs, l_int32 nslices, l_int32 redsweep, l_int32 redsearch, l_float32 sweeprange, l_float32 sweepdelta, l_float32 minbsdelta, l_float32 *pa, l_float32 *pb)

Variables

static const l_int32 MIN_DIST_IN_PEAK = 35
static const l_int32 PEAK_THRESHOLD_RATIO = 20
static const l_int32 ZERO_THRESHOLD_RATIO = 100
static const l_int32 DEFAULT_SLICES = 10
static const l_int32 DEFAULT_SWEEP_REDUCTION = 2
static const l_int32 DEFAULT_BS_REDUCTION = 1
static const l_float32 DEFAULT_SWEEP_RANGE = 5.
static const l_float32 DEFAULT_SWEEP_DELTA = 1.
static const l_float32 DEFAULT_MINBS_DELTA = 0.01
static const l_float32 OVERLAP_FRACTION = 0.5
static const l_float32 MIN_ALLOWED_CONFIDENCE = 3.0

Detailed Description

Find text baselines; determine and remove local skew.

    Locate text baselines in an image
         NUMA     *pixFindBaselines()

    Projective transform to remove local skew
         PIX      *pixDeskewLocal()

    Determine local skew
         l_int32   pixGetLocalSkewTransform()
         NUMA     *pixGetLocalSkewAngles()

We have two apparently different functions here:
  - finding baselines
  - finding a projective transform to remove keystone warping
The function pixGetLocalSkewAngles() returns an array of angles,
one for each raster line, and the baselines of the text lines
should intersect the left edge of the image with that angle.

Definition in file baseline.c.


Define Documentation

#define DEBUG_PLOT   0

Definition at line 43 of file baseline.c.


Function Documentation

NUMA* pixFindBaselines ( PIX pixs,
PTA **  ppta,
l_int32  debug 
)

pixFindBaselines()

Input: pixs (1 bpp) &pta (<optional return>=""> pairs of pts corresponding to approx. ends of each text line) debug (usually 0; set to 1 for debugging output) Return: na (of baseline y values), or null on error

Notes: (1) Input binary image must have text lines already aligned horizontally. This can be done by either rotating the image with pixDeskew(), or, if a projective transform is required, by doing pixDeskewLocal() first. (2) Input null for &pta if you don't want this returned. The pta will come in pairs of points (left and right end of each baseline). (3) Caution: this will not work properly on text with multiple columns, where the lines are not aligned between columns. If there are multiple columns, they should be extracted separately before finding the baselines. (4) This function constructs different types of output for baselines; namely, a set of raster line values and a set of end points of each baseline. (5) This function was designed to handle short and long text lines without using dangerous thresholds on the peak heights. It does this by combining the differential signal with a morphological analysis of the locations of the text lines. One can also combine this data to normalize the peak heights, by weighting the differential signal in the region of each baseline by the inverse of the width of the text line found there. (6) There are various debug sections that can be turned on with the debug flag.

Definition at line 106 of file baseline.c.

References boxaDestroy(), boxaGetBoxGeometry(), boxaGetCount(), boxaSort(), boxaTransform(), ERROR_PTR, FALSE, FREE, GPLOT_POINTS, GPLOT_X11, gplotAddPlot(), gplotCreate(), gplotDestroy(), gplotMakeOutput(), gplotSimple1(), IFF_PNG, L_ABS, L_SORT_BY_Y, L_SORT_INCREASING, MIN_DIST_IN_PEAK, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetCount(), numaGetIArray(), numaGetIValue(), numaGetMax(), PEAK_THRESHOLD_RATIO, pixConnComp(), pixConvertTo32(), pixCountPixelsByRow(), pixDestroy(), pixDisplay(), pixGetHeight(), pixGetWidth(), pixMorphSequence(), pixRenderLineArb(), pixWrite(), PROCNAME, ptaAddPt(), ptaCreate(), ptaGetCount(), ptaGetIPt(), TRUE, x1, x2, y1, y2, and ZERO_THRESHOLD_RATIO.

Referenced by main().

PIX* pixDeskewLocal ( PIX pixs,
l_int32  nslices,
l_int32  redsweep,
l_int32  redsearch,
l_float32  sweeprange,
l_float32  sweepdelta,
l_float32  minbsdelta 
)

pixDeskewLocal()

Input: pixs nslices (the number of horizontal overlapping slices; must be larger than 1 and not exceed 20; use 0 for default) redsweep (sweep reduction factor: 1, 2, 4 or 8; use 0 for default value) redsearch (search reduction factor: 1, 2, 4 or 8, and not larger than redsweep; use 0 for default value) sweeprange (half the full range, assumed about 0; in degrees; use 0.0 for default value) sweepdelta (angle increment of sweep; in degrees; use 0.0 for default value) minbsdelta (min binary search increment angle; in degrees; use 0.0 for default value) Return: pixd, or null on error

Notes: (1) This function allows deskew of a page whose skew changes approximately linearly with vertical position. It uses a projective tranform that in effect does a differential shear about the LHS of the page, and makes all text lines horizontal. (2) The origin of the keystoning can be either a cheap document feeder that rotates the page as it is passed through, or a camera image taken from either the left or right side of the vertical. (3) The image transformation is a projective warping, not a rotation. Apart from this function, the text lines must be properly aligned vertically with respect to each other. This can be done by pre-processing the page; e.g., by rotating or horizontally shearing it. Typically, this can be achieved by vertically aligning the page edge.

Definition at line 295 of file baseline.c.

References ERROR_PTR, L_BRING_IN_WHITE, NULL, pixGetLocalSkewTransform(), pixProjectiveSampledPta(), PROCNAME, and ptaDestroy().

Referenced by main().

l_int32 pixGetLocalSkewTransform ( PIX pixs,
l_int32  nslices,
l_int32  redsweep,
l_int32  redsearch,
l_float32  sweeprange,
l_float32  sweepdelta,
l_float32  minbsdelta,
PTA **  pptas,
PTA **  pptad 
)

pixGetLocalSkewTransform()

Input: pixs nslices (the number of horizontal overlapping slices; must be larger than 1 and not exceed 20; use 0 for default) redsweep (sweep reduction factor: 1, 2, 4 or 8; use 0 for default value) redsearch (search reduction factor: 1, 2, 4 or 8, and not larger than redsweep; use 0 for default value) sweeprange (half the full range, assumed about 0; in degrees; use 0.0 for default value) sweepdelta (angle increment of sweep; in degrees; use 0.0 for default value) minbsdelta (min binary search increment angle; in degrees; use 0.0 for default value) &ptas (<return> 4 points in the source) &ptad (<return> the corresponding 4 pts in the dest) Return: 0 if OK, 1 on error

Notes: (1) This generates two pairs of points in the src, each pair corresponding to a pair of points that would lie along the same raster line in a transformed (dewarped) image. (2) The sets of 4 src and 4 dest points returned by this function can then be used, in a projective or bilinear transform, to remove keystoning in the src.

Definition at line 361 of file baseline.c.

References DEFAULT_BS_REDUCTION, DEFAULT_MINBS_DELTA, DEFAULT_SLICES, DEFAULT_SWEEP_DELTA, DEFAULT_SWEEP_RANGE, DEFAULT_SWEEP_REDUCTION, ERROR_INT, NULL, numaDestroy(), numaGetFValue(), pixGetHeight(), pixGetLocalSkewAngles(), pixGetWidth(), PROCNAME, ptaAddPt(), and ptaCreate().

Referenced by pixDeskewLocal().

NUMA* pixGetLocalSkewAngles ( PIX pixs,
l_int32  nslices,
l_int32  redsweep,
l_int32  redsearch,
l_float32  sweeprange,
l_float32  sweepdelta,
l_float32  minbsdelta,
l_float32 pa,
l_float32 pb 
)

pixGetLocalSkewAngles()

Input: pixs nslices (the number of horizontal overlapping slices; must be larger than 1 and not exceed 20; use 0 for default) redsweep (sweep reduction factor: 1, 2, 4 or 8; use 0 for default value) redsearch (search reduction factor: 1, 2, 4 or 8, and not larger than redsweep; use 0 for default value) sweeprange (half the full range, assumed about 0; in degrees; use 0.0 for default value) sweepdelta (angle increment of sweep; in degrees; use 0.0 for default value) minbsdelta (min binary search increment angle; in degrees; use 0.0 for default value) &a (<optional return>=""> slope of skew as fctn of y) &b (<optional return>=""> intercept at y=0 of skew as fctn of y) Return: naskew, or null on error

Notes: (1) The local skew is measured in a set of overlapping strips. We then do a least square linear fit parameters to get the slope and intercept parameters a and b in skew-angle = a * y + b (degrees) for the local skew as a function of raster line y. This is then used to make naskew, which can be interpreted as the computed skew angle (in degrees) at the left edge of each raster line. (2) naskew can then be used to find the baselines of text, because each text line has a baseline that should intersect the left edge of the image with the angle given by this array, evaluated at the raster line of intersection.

Definition at line 476 of file baseline.c.

References boxCreate(), boxDestroy(), DEFAULT_BS_REDUCTION, DEFAULT_MINBS_DELTA, DEFAULT_SLICES, DEFAULT_SWEEP_DELTA, DEFAULT_SWEEP_RANGE, DEFAULT_SWEEP_REDUCTION, ERROR_PTR, GPLOT_POINTS, GPLOT_X11, gplotAddPlot(), gplotCreate(), gplotDestroy(), gplotMakeOutput(), L_MAX, L_MIN, MIN_ALLOWED_CONFIDENCE, NULL, numaAddNumber(), numaCreate(), numaDestroy(), OVERLAP_FRACTION, pixClipRectangle(), pixDestroy(), pixFindSkewSweepAndSearch(), pixGetHeight(), pixGetWidth(), PROCNAME, ptaAddPt(), ptaCreate(), ptaDestroy(), ptaGetArrays(), ptaGetCount(), and ptaGetLinearLSF().

Referenced by main(), and pixGetLocalSkewTransform().


Variable Documentation

const l_int32 MIN_DIST_IN_PEAK = 35 [static]

Definition at line 47 of file baseline.c.

Referenced by pixFindBaselines().

const l_int32 PEAK_THRESHOLD_RATIO = 20 [static]

Definition at line 50 of file baseline.c.

Referenced by pixFindBaselines().

const l_int32 ZERO_THRESHOLD_RATIO = 100 [static]

Definition at line 51 of file baseline.c.

Referenced by pixFindBaselines().

const l_int32 DEFAULT_SLICES = 10 [static]

Definition at line 54 of file baseline.c.

Referenced by pixGetLocalSkewAngles(), and pixGetLocalSkewTransform().

const l_int32 DEFAULT_SWEEP_REDUCTION = 2 [static]

Definition at line 55 of file baseline.c.

Referenced by pixGetLocalSkewAngles(), and pixGetLocalSkewTransform().

const l_int32 DEFAULT_BS_REDUCTION = 1 [static]

Definition at line 56 of file baseline.c.

Referenced by pixGetLocalSkewAngles(), and pixGetLocalSkewTransform().

const l_float32 DEFAULT_SWEEP_RANGE = 5. [static]

Definition at line 57 of file baseline.c.

Referenced by pixGetLocalSkewAngles(), and pixGetLocalSkewTransform().

const l_float32 DEFAULT_SWEEP_DELTA = 1. [static]

Definition at line 58 of file baseline.c.

Referenced by pixGetLocalSkewAngles(), and pixGetLocalSkewTransform().

const l_float32 DEFAULT_MINBS_DELTA = 0.01 [static]

Definition at line 59 of file baseline.c.

Referenced by pixGetLocalSkewAngles(), and pixGetLocalSkewTransform().

const l_float32 OVERLAP_FRACTION = 0.5 [static]

Definition at line 62 of file baseline.c.

Referenced by pixGetLocalSkewAngles().

const l_float32 MIN_ALLOWED_CONFIDENCE = 3.0 [static]

Definition at line 65 of file baseline.c.

Referenced by pixGetLocalSkewAngles().

 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Defines