date: | Aug 31, 2006 |
---|
The generation of connected component (c.c.) borders is a relatively simple application in Leptonica. There are many rendering machines that can convert outlines to binary images. PostScript and other page description languages use outline fonts for scalable rendering, to get smooth pixellated curves at any output device resolution. There are several different font formats in use, such as the original PostScript outline fonts, Microsoft’s TrueType fonts, and the open source FreeType fonts.
Borders of connected components can be generated by following either the border pixels or the “cracks” between border pixels of different colors. The first type are called chain codes; the second are crack codes. It is relatively simple to convert crack codes to chain codes, but we will use chain codes throughout. Borders must be followed consistently so that the inside of the c.c. is on either the left or right. We choose having the inside on the right side, so that outer borders are traversed cw and inner (hole) borders are traversed ccw.
After the chain code for the border(s) of a c.c. are determined, the following constructions are of interest:
Finding a compressed, serialized format for writing to a file.
Linking the outer border with all inner borders to form a single, connected chain for each c.c.
Rendering the outline as the original raster.
Let’s look at each of the operations in some detail. The source is in ccbord.c and ccbord.h, which should be consulted for further details.
We first find the bitmaps and bounding boxes for each 8-connected component. The border of an 8-connected component is represented by an 8-connected chain of pixels. Then for each c.c., the outer border is found by first adding a one-pixel border of OFF pixels to the PIX for the c.c., finding a first border pixel in the fg (i.e., an ON pixel next to an OFF pixel), and sequentially finding the next pixel on the border. We use a method described by Rosenfeld and Kak in Digital Picture Processing, Vol 2, pp. 219ff, Academic Press, 1982. The next pixel in an 8-connected border is found by looking in the 8 directions from the current pixel, starting with the pixel next in cw rotation from the previous border pixel. The search sweeps around until an ON pixel is found. The chain is terminated after the first pixel is reached again AND the next pixel would be the second pixel in the chain. This is necessary because pixels can be traversed on the chain multiple times (up to 4 times; consider a 3x3 plus sign).
Remember, the background to 8-connected foreground is 4-connected. The holes are found as follows. We take each minimum-sized image of an 8 c.c., and use 4-filling from the border. This fill stops when you hit the outside of the c.c. Then inverting the result, you get an image of the holes (if any) as 4-connected fg objects. Run the c.c. finder looking for these 4-connected objects. Then for each hole, the hole border in the fg of the original c.c. is found by first finding a pixel within the hole, then searching in the original c.c. for a border pixel, and then following the border to generate the chain code. Doing it this way, we are guaranteed to find all holes within components, and not to run into trouble in very complicated situations where there can be components within holes within components, etc, nested to any arbitrary depth.
We show a simple way to do this, that gives a typical compression on text images that is about 25% better on borders of connected components than using png on the original image. If there are halftones, you can expect worse compression than png. These outline representations should generally not be used on halftones, but they work well with text and unstippled line art.
The chain code can be represented as a sequence of directions for each step. There are 8 directions, so this requires 3 bits of information. Pack 2 steps into each byte. You can do better, putting 8 steps into 3 bytes, but gzip, which works on bytes, will be less successful at compressing such data, and we use gzip on the sequential step data. For each c.c. you also need to store the UL corner coordinates, and for each border (both outer and hole) you need to store the coordinates of the starting pixel. We also store the width and height of each c.c., but this is not necessary because it can be determined from the outer border. This data is collected in memory, using the byte buffer utility in bbuffer.c. It is then compressed in memory with gzip and written out to file. It can be read back, gunzip’d in memory, and parsed, with the construction of the CCBORDA data structure.
There are other ways to do this. For example, we can store run-lengths at each direction rather than single steps. We can compress using a Huffman code on run-lengths, rather than a universal code on bytes composed of 2-step direction codes. I adopted the approach here because it’s easy to implement using the byte buffer utility and the in-memory gzip functions.
Kris Popat suggested that it is useful to connect the hole borders with the outer border, so that there is a single border for each c.c. (In fact, this whole chain code representation is Kris’s suggestion.) To avoid having artifacts generated by the rendering machine, each cut path between inner and outer borders is traversed twice, in opposite directions, along exactly the same pixel path. The renderers are usually smart enough to figure out what is inside and what is outside.
The implementation is most simple if we find a path from each hole border to the outer border, making sure that it is fully contained within the fg of the c.c. We use again a bitmap of the c.c. under study. For each hole, we start at the center of the hole, making sure it is an OFF pixel, and march in one of 4 directions until we find an ON pixel, which must be on the hole border. We then continue in the same direction until we hit either an OFF pixel or an ON pixel at the edge of the bitmap. Check if the last ON pixel is on the outer border of the c.c. If it is, we have our cut path; if not, choose another of the four directions and repeat. This way we have four chances. Because the application is for text, where the most likely joining of characters is horizontal, we pick the first two directions vertically and the last two horizontally. If we fail in all four directions, we just lose a hole.
Then, once a cut path is located for each hole, we start on the outer border. When we come to a cut path, we take it, go around the hole, come back on the cut path to the outer border, and continue there. After the outer border has been completely traversed, we are finished.
For completeness, I note that we offer the following representations of the chain code in the CCBORDA data structure:
Separate outer and hole borders, in local coords in the c.c.
Separate outer and hole borders, in global coords
Separate outer and hole borders, in step (direction) codes
Single border with cut paths, in local coords in the c.c.
Single border with cut paths, in global coords. For this, you can have either all the points on the border, or just the turning points, so that a straight line between turning points intersects all the actual border pixels. For a typical image, using turning points reduces the number of stored points by a factor of about 2.
The standard method for filling outlines is scan line conversion. This sweeps a line across the image, noting when it goes through end points of oriented line segments, and keeping track of how many lines (and, optionally, their orientations) that it cuts. You can understand how this works intuitively by tracking across just one raster line, starting at the left edge. Suppose you start in bg. The first line you cross will be oriented up, and after crossing it you will be in fg. (This follows our convention that inside is on the right as you traverse the path.) For simple shapes, each line you cross will be oriented oppositely to the preceeding line, and you will toggle from fg <–> bg. In the general case there can be any number of lines oriented up or down, and a rule is needed to determine what to do when crossing. There are two common rules:
Nonzero winding number rule. Sum the crossings, using +1 if the path crosses oriented up and -1 if it is going down. If the sum is 0, you are in bg; otherwise, you are in fg.
Even-odd rule. Sum the crossings, independent of path orientation. If the sum is even, you are in bg; otherwise, you are in fg.
For simple shapes, these two rules give the same result, but for complex paths they will differ. Using the nonzero winding number rule, if you pass 2 lines both oriented up, the winding number is 2; you are still in fg and will remain there until the winding number has gone back to 0. Scan line conversion has not yet been implemented in Leptonica.
We provide instead two topological algorithms for filling the outlines, which are represented as separate outer and hole borders for each c.c. These are the functions ccbaDisplayImage1() and ccbaDisplayImage2(). Both algorithms are described in some detail at the top of ccbord.c. The algorithms are very similar, but Method 2 is a little simpler, so I describe it here.
Each 8-connected component is filled separately, and then rasterop’d into the destination image. For each c.c.,
Make a PIX with the border pixels in the c.c. in the fg, and with an added 1 pixel border of bg pixels. If w and h are the width and height of the c.c., the PIX has width w+2 and height h+2. This is the clipping mask.
Make a seed image of the same size, and for each border (both outer and holes) put one seed pixel outside the border. “Outside” is determined by our right-hand convention for borders. The 1 pixel border was added to each image to simplify the procedure for finding a seed outside the outer border of the c.c.; namely, by guaranteeing that those pixels are accessible as seeds.
Fill the seed image, using the border pixels as a clipping mask to stop each fill at the borders. This is done with pixSeedfillBinary(), inverting the clipping mask to make it a filling mask, and using 4-connected filling for the bg. We have now filled both the holes and the region outside the outer border of the c.c.
Invert the seed image, to get the properly filled c.c, still centered in the oversize (by 2 pixels) PIX.
Rasterop using XOR the filled c.c. (but not including the outer 1 pixel boundary) into the full dest image.
This is relatively fast, requiring only seedfill and rasterop as the basic bit-parallel functions.
You can run all these operations with prog/ccbordtest.c. Let R be the ratio of 8-c.c. to 4-c.c. For typical text, R is very close to 1. However, if halftones are present, R can be much smaller than 1. The following three page images were used to test different aspects of the border finding and rendering routines:
feyn.png. This is a typical scan of a text page, with R = 4305/4452. Many of the c.c. are multiple characters within a word that have been joined.
witten.png. This is a high quality scanned page, with R = 4972/5208. It has very few touching characters. However, it has a large stippled “O” that has many small holes, and our algorithm for joining interior hole borders with the outer border misses three of them.
rabi.png. This has a large, topologically complicated halftone pattern on the page. One of the c.c. in the halftone contains 351 holes! Most of the c.c. in the page are in the halftone: the ratio R = 21748/98981 is much less than 1. To reduce output when the outer and hole borders are joined into a single path, we set a maximum number of holes for any c.c, and if this number is exceeded, we assume the component is not text and no hole borders are connected. It will consequently be rasterized (filled) without holes. (By contrast, the topological renderer, which uses separate hole and outer borders, rasterizes the image correctly, regardless of the number of holes.)