Dewarp scanned book pages by generating a vertical disparity array based on textlines. More...

#include <math.h>
#include "allheaders.h"

Defines
#define	DEBUG_TEXTLINE_CENTERS 0
#define	DEBUG_SHORT_LINES 0
Functions
L_DEWARP *	dewarpCreate (PIX *pixs, l_int32 pageno, l_int32 sampling, l_int32 minlines, l_int32 applyhoriz)
void	dewarpDestroy (L_DEWARP **pdew)
l_int32	dewarpBuildModel (L_DEWARP *dew, l_int32 debugflag)
PTAA *	pixGetTextlineCenters (PIX *pixs, l_int32 debugflag)
PTA *	pixGetMeanVerticals (PIX *pixs, l_int32 x, l_int32 y)
PTAA *	ptaaRemoveShortLines (PIX pixs, PTAA ptaas, l_float32 fract, l_int32 debugflag)
FPIX *	fpixBuildHorizontalDisparity (FPIX fpixv, l_float32 factor, l_int32 pextraw)
FPIX *	fpixSampledDisparity (FPIX *fpixs, l_int32 sampling)
l_int32	dewarpApplyDisparity (L_DEWARP dew, PIX pixs, l_int32 debugflag)
PIX *	pixApplyVerticalDisparity (PIX pixs, FPIX fpix)
PIX *	pixApplyHorizontalDisparity (PIX pixs, FPIX fpix, l_int32 extraw)
l_int32	dewarpMinimize (L_DEWARP *dew)
l_int32	dewarpPopulateFullRes (L_DEWARP *dew)
L_DEWARP *	dewarpRead (const char *filename)
L_DEWARP *	dewarpReadStream (FILE *fp)
l_int32	dewarpWrite (const char filename, L_DEWARP dew)
l_int32	dewarpWriteStream (FILE fp, L_DEWARP dew)
Variables
static const l_int32	L_DEFAULT_SAMPLING = 30
static const l_float32	DEFAULT_SLOPE_FACTOR = 2000.

Detailed Description

Dewarp scanned book pages by generating a vertical disparity array based on textlines.

    Create/destroy
        L_DEWARP      *dewarpCreate()
        void           dewarpDestroy()

    Build warp model
        l_int32        dewarpBuildModel()
        PTAA          *pixGetTextlineCenters()
        PTA           *pixGetMeanVerticals()
        PTAA          *ptaaRemoveShortLines()
        FPIX          *fpixBuildHorizontalDisparity()
        FPIX          *fpixSampledDisparity()

    Apply warping disparity array
        l_int32        dewarpApplyDisparity()
        l_int32        pixApplyVerticalDisparity()
        l_int32        pixApplyHorizontalDisparity()

    Stripping out data and populating full res disparity
        l_int32        dewarpMinimize()
        l_int32        dewarpPopulateFullRes()

    Serialized I/O
        L_DEWARP      *dewarpRead()
        L_DEWARP      *dewarpReadStream()
        l_int32        dewarpWrite()
        l_int32        dewarpWriteStream()

Basic functioning:
   Pix *pixb = "binarize"(pixs);
   L_Dewarp *dew = dewarpCreate(pixb, ...);
   dewarpBuildModel(dew, 0);
   dewarpApplyDisparity(dew, pixs, 0);
   // result is in dew->pixd;

Minimizing the data in a model by stripping out images,
numas, and full resolution disparity arrays:
   dewarpMinimize(dew);

Applying a model (stripped or not) to another image:
   dewarpApplyDisparity(dew, newpix, 0);

Description of the problem and the approach
-------------------------------------------

When a book page is scanned, there are several possible causes
for the text lines to appear to be curved:
 (1) A barrel (fish-eye) effect because the camera is at
     a finite distance from the page.  Take the normal from
     the camera to the page (the 'optic axis').  Lines on
     the page "below" this point will appear to curve upward
     (negative curvature); lines "above" this will curve downward.
 (2) Radial distortion from the camera lens.  Probably not
     a big factor.
 (3) Local curvature of the page in to (or out of) the image
     plane (which is perpendicular to the optic axis).
     This has no effect if the page is flat.

The goal is to compute the "disparity" field, D(x,y), which
is actually a vector composed of the horizontal and vertical
disparity fields H(x,y) and V(x,y).  Each of these is a local
function that gives the amount each point in the image is
required to move in order to rectify the horizontal and vertical
lines.

Effects (1) and (2) can be compensated for by calibrating
the scene, using a flat page with horizontal and vertical lines.
Then H(x,y) and V(x,y) can be found as two (non-parametric) arrays
of values.  Suppose this has been done.  Then the remaining
distortion is due to (3).

Now, if we knew everywhere the angle between the perpendicular
to the paper and the optic axis (call it 'alpha'), the
actual shape of the page could in principle be found by integration,
and the remaining disparities, H(x,y) and V(x,y), could be
found.  But we don't know alpha.  If there are text lines on
the page, we can assume they should be horizontal, so we can
compute the vertical disparity, which is the local translation
required to make the text lines parallel to the rasters.

The basic question relating to (3) is this:

   Is it possible, using the shape of the text lines alone,
   to compute both the vertical and horizontal disparity fields?

The problem is to find H(x,y).  In an image with horizontal
text lines, the only vertical "lines" that we can infer are
perhaps the left and right margins.

Start with a simple case.  Suppose the binding is along a
vertical line, and the page curvature is independent of y.
Then if the page curves in toward the binding, there will be
a fractional foreshortening of that region in the x-direction, going
as the sine of the angle between the optic axis and local the
normal to the page.  For this situation, the horizontal
disparity is independent of y: H(x,y) == H(x).

Now consider V(x,0) and V(x,h), the vertical disparity along
the top and bottom of the image.  With a little thought you
can convince yourself that the local foreshortening,
as a function of x, is proportional to the difference
between the slope of V(x,0) and V(x,h).  The horizontal
disparity can then be computed by integrating the local foreshortening
over x.  Integration of the slope of V(x,0) and V(x,h) gives
the vertical disparity itself.  We have to normalize to h, the
height of the page.  So the very simple result is that

    H(x) ~ (V(x,0) - V(x,h)) / h         [1]

which is easily computed.  There is a proportionality constant
that depends on the ratio of h to the distance to the camera.
Can we actually believe this for the case where the bending
is independent of y?  I believe the answer is yes,
as long as you first remove the apparent distortion due
to the camera being at a finite distance.

If you know the intersection of the optical axis with the page
and the distance to the camera, and if the page is perpendicular
to the optic axis, you can compute the horizontal and vertical
disparities due to (1) and (2) and remove them.  The resulting
distortion should be entirely due to bending (3), for which
the relation

    Hx(x) dx = C * ((Vx(x,0) - Vx(x, h))/h) dx         [2]

holds for each point in x (Hx and Vx are partial derivatives w/rt x).
Integrating over x, and using H(0) = 0, we get the result [1].

I believe this result holds differentially for each value of y, so
that in the case where the bending is not independent of y,
the expression (V(x,0) - V(x,h)) / h goes over to Vy(x,y).  Then

   H(x,y) = Integral(0,x) (Vyx(x,y) dx)         [3]

where Vyx() is the partial derivative of V w/rt both x and y.

There should be a simple mathematical relation between
the horizontal and vertical disparities for the situation
where the paper bends without stretching or kinking.
I was hoping that we would get a relation between H and V such
as Hx(x,y) ~ Vy(x,y), which would imply that H and V are real
and imaginary parts of a complex potential, each of which
satisfy the laplace equation.  But then the gradients of the
two potentials would be normal, and that does not appear to be the case.
Thus, the questions of proving the relations above (for small bending),
or finding a simpler relation between H and V than those equations,
remain open.  So far, we have only used [1] for the horizontal
disparity H(x).

In the version of the code that follows, we use text lines
to find V(x,y), and then, optionally, approximate H(x)
from the values V(x,0) and V(x,h), as described above.
The details are all in the code, but here is the basic outline.
We assume that in the plane perpendicular to the optic axis
(alpha = 0), horizontal and vertical lines have been rectified.
(If not, they can be rectified using the methods described below,
applied separately as steps (1,2,3) in the horizontal and
vertical directions.)

(1) Find lines going approximately through the center of the
    text in each text line.  Accept only lines that are
    close in length to the longest line.
(2) Generate a regular and highly subsampled vertical
    disparity field V(x,y).
(3) Interpolate this to generate a full resolution vertical
    disparity field.
(4) Optionally generate a full resolution horizontal disparity
    field, H(x).
(5) Apply the vertical dewarping, followed optionally by the
    horizontal dewarping.

Step (1) is clearly described by the code in pixGetTextlineCenters().

Steps (2) and (3) follow directly from the data in step (1),
and constitute the bulk of the work done in dewarpBuildModel().
Virtually all the noise in the data is smoothed out by doing
least-square quadratic fits, first horizontally to the data
points representing the text line centers, and then vertically.
The trick is to sample these lines on a regular grid.
First each horizontal line is sampled at equally spaced
intervals horizontally.  We thus get a set of points,
one in each line, that are vertically aligned, and
the data we represent is the vertical distance of each point
from the min or max value on the curve, depending on the
sign of the curvature component.  Each of these vertically
aligned sets of points constitutes a sampled vertical disparity,
and we do a LS quartic fit to each of them, followed by
vertical sampling at regular intervals.  We now have a subsampled
grid of points, all equally spaced, giving at each point the local
vertical disparity.  Finally, the full resolution vertical disparity
is formed by interpolation.  All the least square fits do a
great job of smoothing everything out, as can be observed by
the contour maps that are generated for the vertical disparity field.

Step (4) is trivially done with the approximation described above.
Once V(x,y) and H(x,y) are derived, step (5) is done trivially.
For vertical dewarp, source pixels at the top and bottom image
boundaries are used whenever a request is made for a pixel that
is outside the image.  For horizontal dewarp, the dest image width
is increased to hold all transformed source pixels (remember,
in that step, the image is widened).

Definition in file dewarp.c.

Define Documentation

#define DEBUG_TEXTLINE_CENTERS 0

Definition at line 225 of file dewarp.c.

Referenced by dewarpBuildModel().

#define DEBUG_SHORT_LINES 0

Definition at line 226 of file dewarp.c.

Referenced by dewarpBuildModel().

Function Documentation

L_DEWARP* dewarpCreate	(	PIX *	pixs,
		l_int32	pageno,
		l_int32	sampling,
		l_int32	minlines,
		l_int32	applyhoriz
	)

dewarpCreate()

Input: pixs (1 bpp) pageno (page number) sampling (use -1 or 0 for default value; otherwise minimum of 5) minlines (minimum number of lines to accept; e.g., 10) applyhoriz (1 to estimate horiz disparity; 0 to skip) Return: dew (or null on error)

Notes: (1) The page number is typically 0-based. If scanned from a book, the even pages are usually on the left. Disparity arrays built for even pages should only be applied to even pages. (2) The sampling factor is for the disparity array. The number used is not critical; anything between 10 and 60 should be fine. (3) The minimum number of nearly full-length lines required to generate a vertical disparity array. Use a small number if you are willing to accept a questionable array.

Definition at line 261 of file dewarp.c.

References L_Dewarp::applyhoriz, CALLOC, ERROR_PTR, L_DEFAULT_SAMPLING, L_WARNING, L_Dewarp::minlines, NULL, L_Dewarp::nx, L_Dewarp::ny, L_Dewarp::pageno, pixClone(), pixGetDepth(), pixGetDimensions(), L_Dewarp::pixs, PROCNAME, and L_Dewarp::sampling.

Referenced by main().

void dewarpDestroy ( L_DEWARP ** pdew )

dewarpDestroy()

Input: &dew (<will be="" set="" to="" null="" before="" returning>="">) Return: void

Definition at line 309 of file dewarp.c.

References fpixDestroy(), FREE, L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, L_WARNING, L_Dewarp::nacurves, L_Dewarp::naflats, NULL, numaDestroy(), L_Dewarp::pixd, pixDestroy(), L_Dewarp::pixs, PROCNAME, L_Dewarp::samphdispar, and L_Dewarp::sampvdispar.

Referenced by main().

l_int32 dewarpBuildModel	(	L_DEWARP *	dew,
		l_int32	debugflag
	)

dewarpBuildModel()

Input: dew debugflag (1 for debugging output) Return: 0 if OK, 1 on error

Notes: (1) This is the basic function that builds the vertical disparity array, which allows determination of the src pixel in the input image corresponding to each dest pixel in the dewarped image. (2) The method is as follows: * Estimate the centers of all the long textlines and fit a LS quadratic to each one. This smooths the curves. * Sample each curve at a regular interval, find the y-value of the flat point on each curve, and subtract the sampled curve value from this value. This is the vertical disparity. * Fit a LS quadratic to each set of vertically aligned disparity samples. This smooths the disparity values in the vertical direction. Then resample at the same regular interval, We now have a regular grid of smoothed vertical disparity valuels. * Interpolate this grid to get a full resolution disparity map. This can be applied directly to the src image pixels to dewarp the image in the vertical direction, making all textlines horizontal.

Definition at line 369 of file dewarp.c.

References L_Dewarp::applyhoriz, applyQuadraticFit(), DEBUG_SHORT_LINES, DEBUG_TEXTLINE_CENTERS, ERROR_INT, L_Dewarp::extraw, fpixBuildHorizontalDisparity(), fpixCreate(), fpixRenderContours(), fpixSampledDisparity(), fpixScaleByInteger(), fpixSetPixel(), FREE, L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, genTempFilename(), IFF_PNG, L_CLONE, L_INSERT, L_NOCOPY, L_SORT_INCREASING, L_Dewarp::nacurves, L_Dewarp::naflats, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetFArray(), numaGetFValue(), numaGetSortIndex(), numaSortByIndex(), numaWrite(), L_Dewarp::nx, nx, L_Dewarp::ny, ny, pixConvertTo32(), pixDestroy(), pixDisplay(), pixDisplayPtaa(), pixDisplayWithTitle(), pixGetTextlineCenters(), L_Dewarp::pixs, pixWriteTempfile(), PROCNAME, ptaaAddPta(), ptaaCreate(), ptaAddPt(), ptaaDestroy(), ptaaGetCount(), ptaaGetPt(), ptaaGetPta(), ptaaRemoveShortLines(), ptaaSortByIndex(), ptaaWrite(), ptaCreate(), ptaCreateFromNuma(), ptaDestroy(), ptaGetArrays(), ptaGetPt(), ptaGetQuadraticLSF(), ptaGetRange(), L_Dewarp::samphdispar, L_Dewarp::sampling, L_Dewarp::sampvdispar, L_Dewarp::success, and FillSeg::y.

Referenced by main().

PTAA* pixGetTextlineCenters	(	PIX *	pixs,
		l_int32	debugflag
	)

pixGetTextlineCenters()

Input: pixs (1 bpp) debugflag (1 for debug output) Return: ptaa (of center values of textlines)

Notes: (1) This in general does not have a point for each value of x, because there will be gaps between words. It doesn't matter because we will fit a quadratic to the points that we do have.

Definition at line 620 of file dewarp.c.

References boxaDestroy(), ERROR_PTR, L_CLONE, L_INSERT, L_SELECT_IF_BOTH, L_SELECT_IF_GT, NULL, pixaDestroy(), pixaDisplay(), pixaGetBoxGeometry(), pixaGetCount(), pixaGetPix(), pixaSelectBySize(), pixConnComp(), pixCreateTemplate(), pixDestroy(), pixDisplayPtaa(), pixDisplayWithTitle(), pixGetDepth(), pixGetDimensions(), pixGetMeanVerticals(), pixMorphSequence(), PROCNAME, ptaaAddPta(), and ptaaCreate().

Referenced by dewarpBuildModel(), and main().

PTA* pixGetMeanVerticals	(	PIX *	pixs,
		l_int32	x,
		l_int32	y
	)

ptaGetMeanVerticals()

Input: pixs (1 bpp, single c.c.) x,y (location of UL corner of pixs with respect to page image Return: pta (mean y-values in component for each x-value, both translated by (x,y)

Definition at line 697 of file dewarp.c.

References ERROR_PTR, GET_DATA_BIT, NULL, pixGetData(), pixGetDepth(), pixGetDimensions(), pixGetWpl(), PROCNAME, ptaAddPt(), and ptaCreate().

Referenced by pixGetTextlineCenters().

PTAA* ptaaRemoveShortLines	(	PIX *	pixs,
		PTAA *	ptaas,
		l_float32	fract,
		l_int32	debugflag
	)

ptaaRemoveShortLines()

Input: pixs (1 bpp) ptaas (input lines) fract (minimum fraction of longest line to keep) debugflag Return: ptaad (containing only lines of sufficient length), or null on error

Definition at line 743 of file dewarp.c.

References ERROR_PTR, L_CLONE, L_INSERT, L_SORT_DECREASING, L_WARNING, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetIValue(), numaGetSortIndex(), pixCopy(), pixDestroy(), pixDisplayPtaa(), pixDisplayWithTitle(), pixGetDepth(), pixGetDimensions(), PROCNAME, ptaaAddPta(), ptaaCreate(), ptaaGetCount(), ptaaGetPta(), ptaDestroy(), and ptaGetRange().

Referenced by dewarpBuildModel(), and main().

FPIX* fpixBuildHorizontalDisparity	(	FPIX *	fpixv,
		l_float32	factor,
		l_int32 *	pextraw
	)

fpixBuildHorizontalDisparity()

Input: fpixv (vertical disparity model) factor (conversion factor for vertical disparity slope; use 0 for default) &extraw (<return> extra width to be added to dewarped pix) Return: fpixh, or null on error

Notes: (1) This takes the difference in vertical disparity at top and bottom of the image, and converts it to an assumed horizontal disparity.

Definition at line 818 of file dewarp.c.

References DEFAULT_SLOPE_FACTOR, ERROR_PTR, fpixCreate(), fpixGetData(), fpixGetDimensions(), fpixGetPixel(), fpixGetWpl(), L_NOCOPY, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetFArray(), numaGetMax(), and PROCNAME.

Referenced by dewarpBuildModel().

FPIX* fpixSampledDisparity	(	FPIX *	fpixs,
		l_int32	sampling
	)

fpixSampledDisparity()

Input: fpixs (full resolution disparity model) sampling (sampling factor) Return: fpixd (sampled disparity model), or null on error

Notes: (1) The input array is sampled at the right and top edges, and at every pixels horizontally and vertically. (2) The sampled array is constructed large enough to (a) cover fpixs and (b) have the sampled grid on all boundary pixels in fpixd. Having sampled pixels around the boundary simplifies interpolation. (3) There must be at least 3 sampled points horizontally and vertically.

Definition at line 892 of file dewarp.c.

References CALLOC, ERROR_PTR, fpixCreate(), fpixGetDimensions(), fpixGetPixel(), fpixSetPixel(), FREE, NULL, and PROCNAME.

Referenced by dewarpBuildModel().

l_int32 dewarpApplyDisparity	(	L_DEWARP *	dew,
		PIX *	pixs,
		l_int32	debugflag
	)

dewarpApplyDisparity()

Input: dew pixs (image to be modified; can be 1, 8 or 32 bpp) debugflag Return: 0 if OK, 1 on error

Notes: (1) This applies the vertical disparity array to the specified image. For src pixels above the image, we use the pixels in the first raster line. (2) This works with stripped models. If the full resolution disparity array(s) are missing, they are remade.

Definition at line 959 of file dewarp.c.

References L_Dewarp::applyhoriz, dewarpPopulateFullRes(), ERROR_INT, L_Dewarp::extraw, L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, IFF_PNG, NULL, pixApplyHorizontalDisparity(), pixApplyVerticalDisparity(), L_Dewarp::pixd, pixDestroy(), pixDisplayWithTitle(), pixWriteTempfile(), PROCNAME, and L_Dewarp::success.

Referenced by main().

PIX* pixApplyVerticalDisparity	(	PIX *	pixs,
		FPIX *	fpix
	)

pixApplyVerticalDisparity()

Input: pixs (1, 8 or 32 bpp) fpix (vertical disparity array) Return: pixd (modified by fpix), or null on error

Notes: (1) This applies the vertical disparity array to the specified image. For src pixels above the image, we use the pixels in the first raster line.

Definition at line 1016 of file dewarp.c.

References ERROR_PTR, fpixGetData(), fpixGetDimensions(), fpixGetWpl(), FREE, GET_DATA_BIT, GET_DATA_BYTE, GET_DATA_FOUR_BYTES, NULL, pixCreateTemplate(), pixGetData(), pixGetDimensions(), pixGetLinePtrs(), pixGetWpl(), PROCNAME, SET_DATA_BIT, and SET_DATA_BYTE.

Referenced by dewarpApplyDisparity().

PIX* pixApplyHorizontalDisparity	(	PIX *	pixs,
		FPIX *	fpix,
		l_int32	extraw
	)

pixApplyHorizontalDisparity()

Input: pixs (1, 8 or 32 bpp) fpix (horizontal disparity array) extraw (extra width added to pixd) Return: pixd (modified by fpix), or null on error

Notes: (1) This applies the horizontal disparity array to the specified image.

Definition at line 1105 of file dewarp.c.

References ERROR_PTR, fpixGetData(), fpixGetDimensions(), fpixGetWpl(), GET_DATA_BIT, GET_DATA_BYTE, NULL, pixCreate(), pixGetData(), pixGetDimensions(), pixGetWpl(), PROCNAME, SET_DATA_BIT, and SET_DATA_BYTE.

Referenced by dewarpApplyDisparity().

l_int32 dewarpMinimize ( L_DEWARP * dew )

dewarpMinimize()

Input: dew Return: 0 if OK, 1 on error

Notes: (1) This removes all data that is not needed for serialization. It keeps the subsampled disparity array(s), so the full resolution arrays can be reconstructed.

Definition at line 1198 of file dewarp.c.

References ERROR_INT, fpixDestroy(), L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, L_Dewarp::nacurves, L_Dewarp::naflats, numaDestroy(), L_Dewarp::pixd, pixDestroy(), L_Dewarp::pixs, and PROCNAME.