Leptonica 1.68
C Image Processing Library

Dewarp scanned book pages by generating a vertical disparity array based on textlines. More...
Go to the source code of this file.
Dewarp scanned book pages by generating a vertical disparity array based on textlines.
Create/destroy L_DEWARP *dewarpCreate() void dewarpDestroy() Build warp model l_int32 dewarpBuildModel() PTAA *pixGetTextlineCenters() PTA *pixGetMeanVerticals() PTAA *ptaaRemoveShortLines() FPIX *fpixBuildHorizontalDisparity() FPIX *fpixSampledDisparity() Apply warping disparity array l_int32 dewarpApplyDisparity() l_int32 pixApplyVerticalDisparity() l_int32 pixApplyHorizontalDisparity() Stripping out data and populating full res disparity l_int32 dewarpMinimize() l_int32 dewarpPopulateFullRes() Serialized I/O L_DEWARP *dewarpRead() L_DEWARP *dewarpReadStream() l_int32 dewarpWrite() l_int32 dewarpWriteStream() Basic functioning: Pix *pixb = "binarize"(pixs); L_Dewarp *dew = dewarpCreate(pixb, ...); dewarpBuildModel(dew, 0); dewarpApplyDisparity(dew, pixs, 0); // result is in dew>pixd; Minimizing the data in a model by stripping out images, numas, and full resolution disparity arrays: dewarpMinimize(dew); Applying a model (stripped or not) to another image: dewarpApplyDisparity(dew, newpix, 0); Description of the problem and the approach  When a book page is scanned, there are several possible causes for the text lines to appear to be curved: (1) A barrel (fisheye) effect because the camera is at a finite distance from the page. Take the normal from the camera to the page (the 'optic axis'). Lines on the page "below" this point will appear to curve upward (negative curvature); lines "above" this will curve downward. (2) Radial distortion from the camera lens. Probably not a big factor. (3) Local curvature of the page in to (or out of) the image plane (which is perpendicular to the optic axis). This has no effect if the page is flat. The goal is to compute the "disparity" field, D(x,y), which is actually a vector composed of the horizontal and vertical disparity fields H(x,y) and V(x,y). Each of these is a local function that gives the amount each point in the image is required to move in order to rectify the horizontal and vertical lines. Effects (1) and (2) can be compensated for by calibrating the scene, using a flat page with horizontal and vertical lines. Then H(x,y) and V(x,y) can be found as two (nonparametric) arrays of values. Suppose this has been done. Then the remaining distortion is due to (3). Now, if we knew everywhere the angle between the perpendicular to the paper and the optic axis (call it 'alpha'), the actual shape of the page could in principle be found by integration, and the remaining disparities, H(x,y) and V(x,y), could be found. But we don't know alpha. If there are text lines on the page, we can assume they should be horizontal, so we can compute the vertical disparity, which is the local translation required to make the text lines parallel to the rasters. The basic question relating to (3) is this: Is it possible, using the shape of the text lines alone, to compute both the vertical and horizontal disparity fields? The problem is to find H(x,y). In an image with horizontal text lines, the only vertical "lines" that we can infer are perhaps the left and right margins. Start with a simple case. Suppose the binding is along a vertical line, and the page curvature is independent of y. Then if the page curves in toward the binding, there will be a fractional foreshortening of that region in the xdirection, going as the sine of the angle between the optic axis and local the normal to the page. For this situation, the horizontal disparity is independent of y: H(x,y) == H(x). Now consider V(x,0) and V(x,h), the vertical disparity along the top and bottom of the image. With a little thought you can convince yourself that the local foreshortening, as a function of x, is proportional to the difference between the slope of V(x,0) and V(x,h). The horizontal disparity can then be computed by integrating the local foreshortening over x. Integration of the slope of V(x,0) and V(x,h) gives the vertical disparity itself. We have to normalize to h, the height of the page. So the very simple result is that H(x) ~ (V(x,0)  V(x,h)) / h [1] which is easily computed. There is a proportionality constant that depends on the ratio of h to the distance to the camera. Can we actually believe this for the case where the bending is independent of y? I believe the answer is yes, as long as you first remove the apparent distortion due to the camera being at a finite distance. If you know the intersection of the optical axis with the page and the distance to the camera, and if the page is perpendicular to the optic axis, you can compute the horizontal and vertical disparities due to (1) and (2) and remove them. The resulting distortion should be entirely due to bending (3), for which the relation Hx(x) dx = C * ((Vx(x,0)  Vx(x, h))/h) dx [2] holds for each point in x (Hx and Vx are partial derivatives w/rt x). Integrating over x, and using H(0) = 0, we get the result [1]. I believe this result holds differentially for each value of y, so that in the case where the bending is not independent of y, the expression (V(x,0)  V(x,h)) / h goes over to Vy(x,y). Then H(x,y) = Integral(0,x) (Vyx(x,y) dx) [3] where Vyx() is the partial derivative of V w/rt both x and y. There should be a simple mathematical relation between the horizontal and vertical disparities for the situation where the paper bends without stretching or kinking. I was hoping that we would get a relation between H and V such as Hx(x,y) ~ Vy(x,y), which would imply that H and V are real and imaginary parts of a complex potential, each of which satisfy the laplace equation. But then the gradients of the two potentials would be normal, and that does not appear to be the case. Thus, the questions of proving the relations above (for small bending), or finding a simpler relation between H and V than those equations, remain open. So far, we have only used [1] for the horizontal disparity H(x). In the version of the code that follows, we use text lines to find V(x,y), and then, optionally, approximate H(x) from the values V(x,0) and V(x,h), as described above. The details are all in the code, but here is the basic outline. We assume that in the plane perpendicular to the optic axis (alpha = 0), horizontal and vertical lines have been rectified. (If not, they can be rectified using the methods described below, applied separately as steps (1,2,3) in the horizontal and vertical directions.) (1) Find lines going approximately through the center of the text in each text line. Accept only lines that are close in length to the longest line. (2) Generate a regular and highly subsampled vertical disparity field V(x,y). (3) Interpolate this to generate a full resolution vertical disparity field. (4) Optionally generate a full resolution horizontal disparity field, H(x). (5) Apply the vertical dewarping, followed optionally by the horizontal dewarping. Step (1) is clearly described by the code in pixGetTextlineCenters(). Steps (2) and (3) follow directly from the data in step (1), and constitute the bulk of the work done in dewarpBuildModel(). Virtually all the noise in the data is smoothed out by doing leastsquare quadratic fits, first horizontally to the data points representing the text line centers, and then vertically. The trick is to sample these lines on a regular grid. First each horizontal line is sampled at equally spaced intervals horizontally. We thus get a set of points, one in each line, that are vertically aligned, and the data we represent is the vertical distance of each point from the min or max value on the curve, depending on the sign of the curvature component. Each of these vertically aligned sets of points constitutes a sampled vertical disparity, and we do a LS quartic fit to each of them, followed by vertical sampling at regular intervals. We now have a subsampled grid of points, all equally spaced, giving at each point the local vertical disparity. Finally, the full resolution vertical disparity is formed by interpolation. All the least square fits do a great job of smoothing everything out, as can be observed by the contour maps that are generated for the vertical disparity field. Step (4) is trivially done with the approximation described above. Once V(x,y) and H(x,y) are derived, step (5) is done trivially. For vertical dewarp, source pixels at the top and bottom image boundaries are used whenever a request is made for a pixel that is outside the image. For horizontal dewarp, the dest image width is increased to hold all transformed source pixels (remember, in that step, the image is widened).
Definition in file dewarp.c.
#define DEBUG_TEXTLINE_CENTERS 0 
Definition at line 225 of file dewarp.c.
Referenced by dewarpBuildModel().
#define DEBUG_SHORT_LINES 0 
Definition at line 226 of file dewarp.c.
Referenced by dewarpBuildModel().
L_DEWARP* dewarpCreate  (  PIX *  pixs, 
l_int32  pageno,  
l_int32  sampling,  
l_int32  minlines,  
l_int32  applyhoriz  
) 
Input: pixs (1 bpp) pageno (page number) sampling (use 1 or 0 for default value; otherwise minimum of 5) minlines (minimum number of lines to accept; e.g., 10) applyhoriz (1 to estimate horiz disparity; 0 to skip) Return: dew (or null on error)
Notes: (1) The page number is typically 0based. If scanned from a book, the even pages are usually on the left. Disparity arrays built for even pages should only be applied to even pages. (2) The sampling factor is for the disparity array. The number used is not critical; anything between 10 and 60 should be fine. (3) The minimum number of nearly fulllength lines required to generate a vertical disparity array. Use a small number if you are willing to accept a questionable array.
Definition at line 261 of file dewarp.c.
References L_Dewarp::applyhoriz, CALLOC, ERROR_PTR, L_DEFAULT_SAMPLING, L_WARNING, L_Dewarp::minlines, NULL, L_Dewarp::nx, L_Dewarp::ny, L_Dewarp::pageno, pixClone(), pixGetDepth(), pixGetDimensions(), L_Dewarp::pixs, PROCNAME, and L_Dewarp::sampling.
Referenced by main().
void dewarpDestroy  (  L_DEWARP **  pdew  ) 
Input: &dew (<will be="" set="" to="" null="" before="" returning>="">) Return: void
Definition at line 309 of file dewarp.c.
References fpixDestroy(), FREE, L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, L_WARNING, L_Dewarp::nacurves, L_Dewarp::naflats, NULL, numaDestroy(), L_Dewarp::pixd, pixDestroy(), L_Dewarp::pixs, PROCNAME, L_Dewarp::samphdispar, and L_Dewarp::sampvdispar.
Referenced by main().
Input: dew debugflag (1 for debugging output) Return: 0 if OK, 1 on error
Notes: (1) This is the basic function that builds the vertical disparity array, which allows determination of the src pixel in the input image corresponding to each dest pixel in the dewarped image. (2) The method is as follows: * Estimate the centers of all the long textlines and fit a LS quadratic to each one. This smooths the curves. * Sample each curve at a regular interval, find the yvalue of the flat point on each curve, and subtract the sampled curve value from this value. This is the vertical disparity. * Fit a LS quadratic to each set of vertically aligned disparity samples. This smooths the disparity values in the vertical direction. Then resample at the same regular interval, We now have a regular grid of smoothed vertical disparity valuels. * Interpolate this grid to get a full resolution disparity map. This can be applied directly to the src image pixels to dewarp the image in the vertical direction, making all textlines horizontal.
Definition at line 369 of file dewarp.c.
References L_Dewarp::applyhoriz, applyQuadraticFit(), DEBUG_SHORT_LINES, DEBUG_TEXTLINE_CENTERS, ERROR_INT, L_Dewarp::extraw, fpixBuildHorizontalDisparity(), fpixCreate(), fpixRenderContours(), fpixSampledDisparity(), fpixScaleByInteger(), fpixSetPixel(), FREE, L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, genTempFilename(), IFF_PNG, L_CLONE, L_INSERT, L_NOCOPY, L_SORT_INCREASING, L_Dewarp::nacurves, L_Dewarp::naflats, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetFArray(), numaGetFValue(), numaGetSortIndex(), numaSortByIndex(), numaWrite(), L_Dewarp::nx, nx, L_Dewarp::ny, ny, pixConvertTo32(), pixDestroy(), pixDisplay(), pixDisplayPtaa(), pixDisplayWithTitle(), pixGetTextlineCenters(), L_Dewarp::pixs, pixWriteTempfile(), PROCNAME, ptaaAddPta(), ptaaCreate(), ptaAddPt(), ptaaDestroy(), ptaaGetCount(), ptaaGetPt(), ptaaGetPta(), ptaaRemoveShortLines(), ptaaSortByIndex(), ptaaWrite(), ptaCreate(), ptaCreateFromNuma(), ptaDestroy(), ptaGetArrays(), ptaGetPt(), ptaGetQuadraticLSF(), ptaGetRange(), L_Dewarp::samphdispar, L_Dewarp::sampling, L_Dewarp::sampvdispar, L_Dewarp::success, and FillSeg::y.
Referenced by main().
Input: pixs (1 bpp) debugflag (1 for debug output) Return: ptaa (of center values of textlines)
Notes: (1) This in general does not have a point for each value of x, because there will be gaps between words. It doesn't matter because we will fit a quadratic to the points that we do have.
Definition at line 620 of file dewarp.c.
References boxaDestroy(), ERROR_PTR, L_CLONE, L_INSERT, L_SELECT_IF_BOTH, L_SELECT_IF_GT, NULL, pixaDestroy(), pixaDisplay(), pixaGetBoxGeometry(), pixaGetCount(), pixaGetPix(), pixaSelectBySize(), pixConnComp(), pixCreateTemplate(), pixDestroy(), pixDisplayPtaa(), pixDisplayWithTitle(), pixGetDepth(), pixGetDimensions(), pixGetMeanVerticals(), pixMorphSequence(), PROCNAME, ptaaAddPta(), and ptaaCreate().
Referenced by dewarpBuildModel(), and main().
ptaGetMeanVerticals()
Input: pixs (1 bpp, single c.c.) x,y (location of UL corner of pixs with respect to page image Return: pta (mean yvalues in component for each xvalue, both translated by (x,y)
Definition at line 697 of file dewarp.c.
References ERROR_PTR, GET_DATA_BIT, NULL, pixGetData(), pixGetDepth(), pixGetDimensions(), pixGetWpl(), PROCNAME, ptaAddPt(), and ptaCreate().
Referenced by pixGetTextlineCenters().
Input: pixs (1 bpp) ptaas (input lines) fract (minimum fraction of longest line to keep) debugflag Return: ptaad (containing only lines of sufficient length), or null on error
Definition at line 743 of file dewarp.c.
References ERROR_PTR, L_CLONE, L_INSERT, L_SORT_DECREASING, L_WARNING, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetIValue(), numaGetSortIndex(), pixCopy(), pixDestroy(), pixDisplayPtaa(), pixDisplayWithTitle(), pixGetDepth(), pixGetDimensions(), PROCNAME, ptaaAddPta(), ptaaCreate(), ptaaGetCount(), ptaaGetPta(), ptaDestroy(), and ptaGetRange().
Referenced by dewarpBuildModel(), and main().
fpixBuildHorizontalDisparity()
Input: fpixv (vertical disparity model) factor (conversion factor for vertical disparity slope; use 0 for default) &extraw (<return> extra width to be added to dewarped pix) Return: fpixh, or null on error
Notes: (1) This takes the difference in vertical disparity at top and bottom of the image, and converts it to an assumed horizontal disparity.
Definition at line 818 of file dewarp.c.
References DEFAULT_SLOPE_FACTOR, ERROR_PTR, fpixCreate(), fpixGetData(), fpixGetDimensions(), fpixGetPixel(), fpixGetWpl(), L_NOCOPY, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetFArray(), numaGetMax(), and PROCNAME.
Referenced by dewarpBuildModel().
Input: fpixs (full resolution disparity model) sampling (sampling factor) Return: fpixd (sampled disparity model), or null on error
Notes: (1) The input array is sampled at the right and top edges, and at every pixels horizontally and vertically. (2) The sampled array is constructed large enough to (a) cover fpixs and (b) have the sampled grid on all boundary pixels in fpixd. Having sampled pixels around the boundary simplifies interpolation. (3) There must be at least 3 sampled points horizontally and vertically.
Definition at line 892 of file dewarp.c.
References CALLOC, ERROR_PTR, fpixCreate(), fpixGetDimensions(), fpixGetPixel(), fpixSetPixel(), FREE, NULL, and PROCNAME.
Referenced by dewarpBuildModel().
Input: dew pixs (image to be modified; can be 1, 8 or 32 bpp) debugflag Return: 0 if OK, 1 on error
Notes: (1) This applies the vertical disparity array to the specified image. For src pixels above the image, we use the pixels in the first raster line. (2) This works with stripped models. If the full resolution disparity array(s) are missing, they are remade.
Definition at line 959 of file dewarp.c.
References L_Dewarp::applyhoriz, dewarpPopulateFullRes(), ERROR_INT, L_Dewarp::extraw, L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, IFF_PNG, NULL, pixApplyHorizontalDisparity(), pixApplyVerticalDisparity(), L_Dewarp::pixd, pixDestroy(), pixDisplayWithTitle(), pixWriteTempfile(), PROCNAME, and L_Dewarp::success.
Referenced by main().
Input: pixs (1, 8 or 32 bpp) fpix (vertical disparity array) Return: pixd (modified by fpix), or null on error
Notes: (1) This applies the vertical disparity array to the specified image. For src pixels above the image, we use the pixels in the first raster line.
Definition at line 1016 of file dewarp.c.
References ERROR_PTR, fpixGetData(), fpixGetDimensions(), fpixGetWpl(), FREE, GET_DATA_BIT, GET_DATA_BYTE, GET_DATA_FOUR_BYTES, NULL, pixCreateTemplate(), pixGetData(), pixGetDimensions(), pixGetLinePtrs(), pixGetWpl(), PROCNAME, SET_DATA_BIT, and SET_DATA_BYTE.
Referenced by dewarpApplyDisparity().
Input: pixs (1, 8 or 32 bpp) fpix (horizontal disparity array) extraw (extra width added to pixd) Return: pixd (modified by fpix), or null on error
Notes: (1) This applies the horizontal disparity array to the specified image.
Definition at line 1105 of file dewarp.c.
References ERROR_PTR, fpixGetData(), fpixGetDimensions(), fpixGetWpl(), GET_DATA_BIT, GET_DATA_BYTE, NULL, pixCreate(), pixGetData(), pixGetDimensions(), pixGetWpl(), PROCNAME, SET_DATA_BIT, and SET_DATA_BYTE.
Referenced by dewarpApplyDisparity().
Input: dew Return: 0 if OK, 1 on error
Notes: (1) This removes all data that is not needed for serialization. It keeps the subsampled disparity array(s), so the full resolution arrays can be reconstructed.
Definition at line 1198 of file dewarp.c.
References ERROR_INT, fpixDestroy(), L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, L_Dewarp::nacurves, L_Dewarp::naflats, numaDestroy(), L_Dewarp::pixd, pixDestroy(), L_Dewarp::pixs, and PROCNAME.
Referenced by main().
Input: dew Return: 0 if OK, 1 on error
Notes: (1) If the full resolution vertical (and, optionally horizontal) disparity arrays do not exist, they are built from the subsampled ones.
Definition at line 1227 of file dewarp.c.
References ERROR_INT, fpixScaleByInteger(), L_Dewarp::fullhdispar, L_Dewarp::fullvdispar, PROCNAME, L_Dewarp::samphdispar, L_Dewarp::sampling, and L_Dewarp::sampvdispar.
Referenced by dewarpApplyDisparity().
L_DEWARP* dewarpRead  (  const char *  filename  ) 
Input: filename Return: dew, or null on error
Definition at line 1256 of file dewarp.c.
References dewarpReadStream(), ERROR_PTR, fopenReadStream(), NULL, and PROCNAME.
Referenced by main().
L_DEWARP* dewarpReadStream  (  FILE *  fp  ) 
Input: stream Return: dew, or null on error
Notes: (1) The dewarp struct is stored in minimized format, with only subsampled disparity arrays.
Definition at line 1289 of file dewarp.c.
References L_Dewarp::applyhoriz, CALLOC, DEWARP_VERSION_NUMBER, ERROR_PTR, fpixReadStream(), NULL, L_Dewarp::nx, nx, L_Dewarp::ny, ny, L_Dewarp::pageno, PROCNAME, L_Dewarp::samphdispar, L_Dewarp::sampling, L_Dewarp::sampvdispar, L_Dewarp::success, and version.
Referenced by dewarpRead().
Input: filename dew Return: 0 if OK, 1 on error
Definition at line 1341 of file dewarp.c.
References dewarpWriteStream(), ERROR_INT, fopenWriteStream(), NULL, and PROCNAME.
Referenced by main().
Input: stream (opened for "wb") dew Return: 0 if OK, 1 on error
Definition at line 1371 of file dewarp.c.
References DEWARP_VERSION_NUMBER, ERROR_INT, fpixWriteStream(), L_Dewarp::nx, L_Dewarp::ny, L_Dewarp::pageno, PROCNAME, L_Dewarp::samphdispar, L_Dewarp::sampling, and L_Dewarp::sampvdispar.
Referenced by dewarpWrite().
const l_int32 L_DEFAULT_SAMPLING = 30 [static] 
Definition at line 233 of file dewarp.c.
Referenced by dewarpCreate().
const l_float32 DEFAULT_SLOPE_FACTOR = 2000. [static] 
Definition at line 234 of file dewarp.c.
Referenced by fpixBuildHorizontalDisparity().