Shift, scale, convolve, histograms, rank statistics, splitting distributions, extrema finding, threshold crossing, frequency analysis. More...

#include <math.h>
#include "allheaders.h"

Defines
#define	DEBUG_HISTO 0
#define	DEBUG_CROSSINGS 0
#define	DEBUG_FREQUENCY 0
Functions
NUMA *	numaErode (NUMA *nas, l_int32 size)
NUMA *	numaDilate (NUMA *nas, l_int32 size)
NUMA *	numaOpen (NUMA *nas, l_int32 size)
NUMA *	numaClose (NUMA *nas, l_int32 size)
NUMA *	numaTransform (NUMA *nas, l_float32 shift, l_float32 scale)
l_int32	numaWindowedStats (NUMA nas, l_int32 wc, NUMA pnam, NUMA pnams, NUMA pnav, NUMA *pnarv)
NUMA *	numaWindowedMean (NUMA *nas, l_int32 wc)
NUMA *	numaWindowedMeanSquare (NUMA *nas, l_int32 wc)
l_int32	numaWindowedVariance (NUMA nam, NUMA nams, NUMA pnav, NUMA pnarv)
NUMA *	numaConvertToInt (NUMA *nas)
NUMA *	numaMakeHistogram (NUMA na, l_int32 maxbins, l_int32 pbinsize, l_int32 *pbinstart)
NUMA *	numaMakeHistogramAuto (NUMA *na, l_int32 maxbins)
NUMA *	numaMakeHistogramClipped (NUMA *na, l_float32 binsize, l_float32 maxsize)
NUMA *	numaRebinHistogram (NUMA *nas, l_int32 newsize)
NUMA *	numaNormalizeHistogram (NUMA *nas, l_float32 area)
l_int32	numaGetStatsUsingHistogram (NUMA na, l_int32 maxbins, l_float32 pmin, l_float32 pmax, l_float32 pmean, l_float32 pvariance, l_float32 pmedian, l_float32 rank, l_float32 prval, NUMA *phisto)
l_int32	numaGetHistogramStats (NUMA nahisto, l_float32 startx, l_float32 deltax, l_float32 pxmean, l_float32 pxmedian, l_float32 pxmode, l_float32 *pxvariance)
l_int32	numaGetHistogramStatsOnInterval (NUMA nahisto, l_float32 startx, l_float32 deltax, l_int32 ifirst, l_int32 ilast, l_float32 pxmean, l_float32 pxmedian, l_float32 pxmode, l_float32 *pxvariance)
l_int32	numaMakeRankFromHistogram (l_float32 startx, l_float32 deltax, NUMA nasy, l_int32 npts, NUMA pnax, NUMA *pnay)
l_int32	numaHistogramGetRankFromVal (NUMA na, l_float32 rval, l_float32 prank)
l_int32	numaHistogramGetValFromRank (NUMA na, l_float32 rank, l_float32 prval)
l_int32	numaDiscretizeRankAndIntensity (NUMA na, l_int32 nbins, NUMA pnarbin, NUMA pnam, NUMA pnar, NUMA *pnabb)
l_int32	numaGetRankBinValues (NUMA na, l_int32 nbins, NUMA pnarbin, NUMA *pnam)
l_int32	numaSplitDistribution (NUMA na, l_float32 scorefract, l_int32 psplitindex, l_float32 pave1, l_float32 pave2, l_float32 pnum1, l_float32 pnum2, NUMA **pnascore)
NUMA *	numaFindPeaks (NUMA *nas, l_int32 nmax, l_float32 fract1, l_float32 fract2)
NUMA *	numaFindExtrema (NUMA *nas, l_float32 delta)
l_int32	numaCountReversals (NUMA nas, l_float32 minreversal, l_int32 pnr, l_float32 *pnrpl)
l_int32	numaSelectCrossingThreshold (NUMA nax, NUMA nay, l_float32 estthresh, l_float32 *pbestthresh)
NUMA *	numaCrossingsByThreshold (NUMA nax, NUMA nay, l_float32 thresh)
NUMA *	numaCrossingsByPeaks (NUMA nax, NUMA nay, l_float32 delta)
l_int32	numaEvalBestHaarParameters (NUMA nas, l_float32 relweight, l_int32 nwidth, l_int32 nshift, l_float32 minwidth, l_float32 maxwidth, l_float32 pbestwidth, l_float32 pbestshift, l_float32 pbestscore)
l_int32	numaEvalHaarSum (NUMA nas, l_float32 width, l_float32 shift, l_float32 relweight, l_float32 pscore)
Variables
static const l_int32	BinSizeArray []
static const l_int32	NBinSizes = 24

Detailed Description

Shift, scale, convolve, histograms, rank statistics, splitting distributions, extrema finding, threshold crossing, frequency analysis.

    Morphological (min/max) operations
        NUMA        *numaErode()
        NUMA        *numaDilate()
        NUMA        *numaOpen()
        NUMA        *numaClose()

    Other transforms
        NUMA        *numaTransform()
        l_int32      numaWindowedStats()
        NUMA        *numaWindowedMean()
        NUMA        *numaWindowedMeanSquare()
        l_int32      numaWindowedVariance()
        NUMA        *numaConvertToInt()

    Histogram generation and statistics
        NUMA        *numaMakeHistogram()
        NUMA        *numaMakeHistogramAuto()
        NUMA        *numaMakeHistogramClipped()
        NUMA        *numaRebinHistogram()
        NUMA        *numaNormalizeHistogram()
        l_int32      numaGetStatsUsingHistogram()
        l_int32      numaGetHistogramStats()
        l_int32      numaGetHistogramStatsOnInterval()
        l_int32      numaMakeRankFromHistogram()
        l_int32      numaHistogramGetRankFromVal()
        l_int32      numaHistogramGetValFromRank()
        l_int32      numaDiscretizeRankAndIntensity()
        l_int32      numaGetRankBinValues()

    Splitting a distribution
        l_int32      numaSplitDistribution()

    Extrema finding
        NUMA        *numaFindPeaks()
        NUMA        *numaFindExtrema()
        l_int32     *numaCountReversals()

    Threshold crossings and frequency analysis
        l_int32      numaSelectCrossingThreshold()
        NUMA        *numaCrossingsByThreshold()
        NUMA        *numaCrossingsByPeaks()
        NUMA        *numaEvalBestHaarParameters()
        l_int32      numaEvalHaarSum()

  Things to remember when using the Numa:

  (1) The numa is a struct, not an array.  Always use accessors
      (see numabasic.c), never the fields directly.

  (2) The number array holds l_float32 values.  It can also
      be used to store l_int32 values.  See numabasic.c for
      details on using the accessors.

  (3) Occasionally, in the comments we denote the i-th element of a
      numa by na[i].  This is conceptual only -- the numa is not an array!

  Some general comments on histograms:

  (1) Histograms are the generic statistical representation of
      the data about some attribute.  Typically they're not
      normalized -- they simply give the number of occurrences
      within each range of values of the attribute.  This range
      of values is referred to as a 'bucket'.  For example,
      the histogram could specify how many connected components
      are found for each value of their width; in that case,
      the bucket size is 1.

  (2) In leptonica, all buckets have the same size.  Histograms
      are therefore specified by a numa of occurrences, along
      with two other numbers: the 'value' associated with the
      occupants of the first bucket and the size (i.e., 'width')
      of each bucket.  These two numbers then allow us to calculate
      the value associated with the occupants of each bucket.
      These numbers are fields in the numa, initialized to
      a startx value of 0.0 and a binsize of 1.0.  Accessors for
      these fields are functions numa*XParameters().  All histograms
      must have these two numbers properly set.

Definition in file numafunc2.c.

Define Documentation

#define DEBUG_HISTO 0

Definition at line 110 of file numafunc2.c.

#define DEBUG_CROSSINGS 0

Definition at line 111 of file numafunc2.c.

#define DEBUG_FREQUENCY 0

Definition at line 112 of file numafunc2.c.

Function Documentation

NUMA* numaErode	(	NUMA *	nas,
		l_int32	size
	)

numaErode()

Input: nas size (of sel; greater than 0, odd; origin implicitly in center) Return: nad (eroded), or null on error

Notes: (1) The structuring element (sel) is linear, all "hits" (2) If size == 1, this returns a copy (3) General comment. The morphological operations are equivalent to those that would be performed on a 1-dimensional fpix. However, because we have not implemented morphological operations on fpix, we do this here. Because it is only 1 dimensional, there is no reason to use the more complicated van Herk/Gil-Werman algorithm, and we do it by brute force.

Definition at line 138 of file numafunc2.c.

References CALLOC, ERROR_PTR, FREE, L_MIN, L_NOCOPY, L_WARNING, NULL, numaCopy(), numaCopyXParameters(), numaGetCount(), numaGetFArray(), numaMakeConstant(), PROCNAME, and size.

Referenced by main(), numaClose(), and numaOpen().

NUMA* numaDilate	(	NUMA *	nas,
		l_int32	size
	)

numaDilate()

Input: nas size (of sel; greater than 0, odd; origin implicitly in center) Return: nad (dilated), or null on error

Notes: (1) The structuring element (sel) is linear, all "hits" (2) If size == 1, this returns a copy

Definition at line 204 of file numafunc2.c.

References CALLOC, ERROR_PTR, FREE, L_MAX, L_NOCOPY, L_WARNING, NULL, numaCopy(), numaCopyXParameters(), numaGetCount(), numaGetFArray(), numaMakeConstant(), PROCNAME, and size.

Referenced by main(), numaClose(), and numaOpen().

NUMA* numaOpen	(	NUMA *	nas,
		l_int32	size
	)

numaOpen()

Input: nas size (of sel; greater than 0, odd; origin implicitly in center) Return: nad (opened), or null on error

Notes: (1) The structuring element (sel) is linear, all "hits" (2) If size == 1, this returns a copy

Definition at line 270 of file numafunc2.c.

References ERROR_PTR, L_WARNING, NULL, numaCopy(), numaDestroy(), numaDilate(), numaErode(), and PROCNAME.

Referenced by main().

NUMA* numaClose	(	NUMA *	nas,
		l_int32	size
	)

numaClose()

Input: nas size (of sel; greater than 0, odd; origin implicitly in center) Return: nad (opened), or null on error

Notes: (1) The structuring element (sel) is linear, all "hits" (2) If size == 1, this returns a copy (3) We add a border before doing this operation, for the same reason that we add a border to a pix before doing a safe closing. Without the border, a small component near the border gets clipped at the border on dilation, and can be entirely removed by the following erosion, violating the basic extensivity property of closing.

Definition at line 314 of file numafunc2.c.

References ERROR_PTR, L_WARNING, NULL, numaAddBorder(), numaCopy(), numaDestroy(), numaDilate(), numaErode(), numaRemoveBorder(), and PROCNAME.

Referenced by main().

NUMA* numaTransform	(	NUMA *	nas,
		l_float32	shift,
		l_float32	scale
	)

numaTransform()

Input: nas shift (add this to each number) scale (multiply each number by this) Return: nad (with all values shifted and scaled, or null on error)

Notes: (1) Each number is shifted before scaling. (2) The operation sequence is opposite to that for Box and Pta: scale first, then shift.

Definition at line 361 of file numafunc2.c.

References ERROR_PTR, NULL, numaAddNumber(), numaCreate(), numaGetCount(), numaGetFValue(), and PROCNAME.

l_int32 numaWindowedStats	(	NUMA *	nas,
		l_int32	wc,
		NUMA **	pnam,
		NUMA **	pnams,
		NUMA **	pnav,
		NUMA **	pnarv
	)

numaWindowedStats()

Input: nas (input numa) wc (half width of the window) &nam (<optional return>=""> mean value in window) &nams (<optional return>=""> mean square value in window) &pnav (<optional return>=""> variance in window) &pnarv (<optional return>=""> rms deviation from the mean) Return: 0 if OK, 1 on error

Notes: (1) This is a high-level convenience function for calculating any or all of these derived arrays. (2) These statistical measures over the values in the rectangular window are:

average value: <x> (nam)
average squared value: <x*x> (nams)
variance: <(x - <x>)*(x - <x>)> = <x*x> - <x>*<x> (nav)
square-root of variance: (narv) where the brackets < .. > indicate that the average value is to be taken over the window. (3) Note that the variance is just the mean square difference from the mean value; and the square root of the variance is the root mean square difference from the mean, sometimes also called the 'standard deviation'. (4) Internally, use mirrored borders to handle values near the end of each array.

Definition at line 415 of file numafunc2.c.

References ERROR_INT, L_WARNING, numaDestroy(), numaGetCount(), numaWindowedMean(), numaWindowedMeanSquare(), numaWindowedVariance(), and PROCNAME.

Referenced by main().

NUMA* numaWindowedMean	(	NUMA *	nas,
		l_int32	wc
	)

numaWindowedMean()

Input: nas wc (half width of the convolution window) Return: nad (after low-pass filtering), or null on error

Notes: (1) This is a convolution. The window has width = 2 * + 1. (2) We add a mirrored border of size to each end of the array.

Definition at line 464 of file numafunc2.c.

References CALLOC, ERROR_PTR, FREE, L_MIRRORED_BORDER, L_NOCOPY, L_WARNING, NULL, numaAddSpecifiedBorder(), numaDestroy(), numaGetCount(), numaGetFArray(), numaMakeConstant(), and PROCNAME.

Referenced by main(), and numaWindowedStats().

NUMA* numaWindowedMeanSquare	(	NUMA *	nas,
		l_int32	wc
	)

numaWindowedMeanSquare()

Input: nas wc (half width of the window) Return: nad (containing windowed mean square values), or null on error

Notes: (1) The window has width = 2 * + 1. (2) We add a mirrored border of size to each end of the array.

Definition at line 519 of file numafunc2.c.

References CALLOC, ERROR_PTR, FREE, L_MIRRORED_BORDER, L_NOCOPY, L_WARNING, NULL, numaAddSpecifiedBorder(), numaDestroy(), numaGetCount(), numaGetFArray(), numaMakeConstant(), and PROCNAME.

Referenced by numaWindowedStats().

l_int32 numaWindowedVariance	(	NUMA *	nam,
		NUMA *	nams,
		NUMA **	pnav,
		NUMA **	pnarv
	)

numaWindowedVariance()

Input: nam (windowed mean values) nams (windowed mean square values) &pnav (<optional return>=""> numa of variance -- the ms deviation from the mean) &pnarv (<optional return>=""> numa of rms deviation from the mean) Return: 0 if OK, 1 on error

Notes: (1) The numas of windowed mean and mean square are precomputed, using numaWindowedMean() and numaWindowedMeanSquare(). (2) Either or both of the variance and square-root of variance are returned, where the variance is the average over the window of the mean square difference of the pixel value from the mean: <(x - <x>)*(x - <x>)> = <x*x> - <x>*<x>

Definition at line 582 of file numafunc2.c.

References ERROR_INT, L_NOCOPY, numaGetCount(), numaGetFArray(), numaMakeConstant(), and PROCNAME.

Referenced by numaWindowedStats().

NUMA* numaConvertToInt ( NUMA * nas )

numaConvertToInt()

Input: na Return: na with all values rounded to nearest integer, or null on error

Definition at line 638 of file numafunc2.c.

References ERROR_PTR, NULL, numaAddNumber(), numaCreate(), numaGetCount(), numaGetIValue(), and PROCNAME.

Referenced by numaMakeHistogram().

NUMA* numaMakeHistogram	(	NUMA *	na,
		l_int32	maxbins,
		l_int32 *	pbinsize,
		l_int32 *	pbinstart
	)

numaMakeHistogram()

Input: na maxbins (max number of histogram bins) &binsize (<return> size of histogram bins) &binstart (<optional return>=""> start val of minimum bin; input NULL to force start at 0) Return: na consisiting of histogram of integerized values, or null on error.

Note: (1) This simple interface is designed for integer data. The bins are of integer width and start on integer boundaries, so the results on float data will not have high precision. (2) Specify the max number of input bins. Then , the size of bins necessary to accommodate the input data, is returned. It is one of the sequence: {1, 2, 5, 10, 20, 50, ...}. (3) If &binstart is given, all values are accommodated, and the min value of the starting bin is returned. Otherwise, all negative values are discarded and the histogram bins start at 0.

Definition at line 687 of file numafunc2.c.

References BinSizeArray, ERROR_PTR, NBinSizes, NULL, numaConvertToInt(), numaCreate(), numaDestroy(), numaGetCount(), numaGetIValue(), numaGetMax(), numaGetMin(), numaSetCount(), numaSetValue(), numaSetXParameters(), and PROCNAME.

Referenced by main(), and numaGetRankBinValues().

NUMA* numaMakeHistogramAuto	(	NUMA *	na,
		l_int32	maxbins
	)

numaMakeHistogramAuto()

Input: na (numa of floats; these may be integers) maxbins (max number of histogram bins; >= 1) Return: na consisiting of histogram of quantized float values, or null on error.

Notes: (1) This simple interface is designed for accurate binning of both integer and float data. (2) If the array data is integers, and the range of integers is smaller than , they are binned as they fall, with binsize = 1. (3) If the range of data, (maxval - minval), is larger than , or if the data is floats, they are binned into exactly bins. (4) Unlike numaMakeHistogram(), these bins in general have non-integer location and width, even for integer data.

Definition at line 795 of file numafunc2.c.

References ERROR_PTR, L_MAX, L_MIN, NULL, numaAddNumber(), numaCreate(), numaGetCount(), numaGetFValue(), numaGetIValue(), numaGetMax(), numaGetMin(), numaHasOnlyIntegers(), numaSetCount(), numaSetValue(), numaSetXParameters(), and PROCNAME.

Referenced by main(), and numaGetStatsUsingHistogram().

NUMA* numaMakeHistogramClipped	(	NUMA *	na,
		l_float32	binsize,
		l_float32	maxsize
	)

numaMakeHistogramClipped()

Input: na binsize (typically 1.0) maxsize (of histogram ordinate) Return: na (histogram of bins of size , starting with the na[0] (x = 0.0) and going up to a maximum of x = , by increments of ), or null on error

Notes: (1) This simple function generates a histogram of values from na, discarding all values < 0.0 or greater than min(, maxval), where maxval is the maximum value in na. The histogram data is put in bins of size delx = , starting at x = 0.0. We use as many bins as are needed to hold the data.

Definition at line 877 of file numafunc2.c.

References ERROR_PTR, L_MIN, maxsize, NULL, numaCreate(), numaGetCount(), numaGetFValue(), numaGetIValue(), numaGetMax(), numaSetCount(), numaSetValue(), numaSetXParameters(), and PROCNAME.

Referenced by main(), and numaQuantizeCrossingsByWidth().

NUMA* numaRebinHistogram	(	NUMA *	nas,
		l_int32	newsize
	)

numaRebinHistogram()

Input: nas (input histogram) newsize (number of old bins contained in each new bin) Return: nad (more coarsely re-binned histogram), or null on error

Definition at line 926 of file numafunc2.c.

References ERROR_PTR, NULL, numaAddNumber(), numaCreate(), numaGetCount(), numaGetIValue(), numaGetXParameters(), numaSetXParameters(), and PROCNAME.

NUMA* numaNormalizeHistogram	(	NUMA *	nas,
		l_float32	area
	)

numaNormalizeHistogram()

Input: nas (input histogram) area (target sum of all numbers in dest histogram; e.g., use area = 1.0 if this represents a probability distribution) Return: nad (normalized histogram), or null on error

Definition at line 975 of file numafunc2.c.

References ERROR_PTR, NULL, numaAddNumber(), numaCopyXParameters(), numaCreate(), numaGetCount(), numaGetFValue(), numaGetSum(), and PROCNAME.

Referenced by main(), numaGetRankBinValues(), numaMakeRankFromHistogram(), pixCompareRankDifference(), pixGetDifferenceStats(), and pixGetRankColorArray().

l_int32 numaGetStatsUsingHistogram	(	NUMA *	na,
		l_int32	maxbins,
		l_float32 *	pmin,
		l_float32 *	pmax,
		l_float32 *	pmean,
		l_float32 *	pvariance,
		l_float32 *	pmedian,
		l_float32	rank,
		l_float32 *	prval,
		NUMA **	phisto
	)

numaGetStatsUsingHistogram()

Input: na (an arbitrary set of numbers; not ordered and not a histogram) maxbins (the maximum number of bins to be allowed in the histogram; use 0 for consecutive integer bins) &min (<optional return>=""> min value of set) &max (<optional return>=""> max value of set) &mean (<optional return>=""> mean value of set) &variance (<optional return>=""> variance) &median (<optional return>=""> median value of set) rank (in [0.0 ... 1.0]; median has a rank 0.5; ignored if &rval == NULL) &rval (<optional return>=""> value in na corresponding to ) &histo (<optional return>=""> Numa histogram; use NULL to prevent) Return: 0 if OK, 1 on error

Notes: (1) This is a simple interface for gathering statistics from a numa, where a histogram is used 'under the covers' to avoid sorting if a rank value is requested. In that case, by using a histogram we are trading speed for accuracy, because the values in are quantized to the center of a set of bins. (2) If the median, other rank value, or histogram are not requested, the calculation is all performed on the input Numa. (3) The variance is the average of the square of the difference from the mean. The median is the value in na with rank 0.5. (4) There are two situations where this gives rank results with accuracy comparable to computing stastics directly on the input data, without binning into a histogram: (a) the data is integers and the range of data is less than , and (b) the data is floats and the range is small compared to , so that the binsize is much less than 1. (5) If a histogram is used and the numbers in the Numa extend over a large range, you can limit the required storage by specifying the maximum number of bins in the histogram. Use == 0 to force the bin size to be 1. (6) This optionally returns the median and one arbitrary rank value. If you need several rank values, return the histogram and use numaHistogramGetValFromRank(nah, rank, &rval) multiple times.

Definition at line 1054 of file numafunc2.c.

References ERROR_INT, NULL, numaDestroy(), numaGetCount(), numaGetFValue(), numaGetMax(), numaGetMin(), numaHistogramGetValFromRank(), numaMakeHistogramAuto(), and PROCNAME.

Referenced by main().

l_int32 numaGetHistogramStats	(	NUMA *	nahisto,
		l_float32	startx,
		l_float32	deltax,
		l_float32 *	pxmean,
		l_float32 *	pxmedian,
		l_float32 *	pxmode,
		l_float32 *	pxvariance
	)

numaGetHistogramStats()

Input: nahisto (histogram: y(x(i)), i = 0 ... nbins - 1) startx (x value of first bin: x(0)) deltax (x increment between bins; the bin size; x(1) - x(0)) &xmean (<optional return>=""> mean value of histogram) &xmedian (<optional return>=""> median value of histogram) &xmode (<optional return>=""> mode value of histogram: xmode = x(imode), where y(xmode) >= y(x(i)) for all i != imode) &xvariance (<optional return>=""> variance of x) Return: 0 if OK, 1 on error

Notes: (1) If the histogram represents the relation y(x), the computed values that are returned are the x values. These are NOT the bucket indices i; they are related to the bucket indices by x(i) = startx + i * deltax

Definition at line 1141 of file numafunc2.c.

References ERROR_INT, numaGetHistogramStatsOnInterval(), and PROCNAME.

Referenced by numaSplitDistribution().

l_int32 numaGetHistogramStatsOnInterval	(	NUMA *	nahisto,
		l_float32	startx,
		l_float32	deltax,
		l_int32	ifirst,
		l_int32	ilast,
		l_float32 *	pxmean,
		l_float32 *	pxmedian,
		l_float32 *	pxmode,
		l_float32 *	pxvariance
	)

numaGetHistogramStatsOnInterval()

Input: nahisto (histogram: y(x(i)), i = 0 ... nbins - 1) startx (x value of first bin: x(0)) deltax (x increment between bins; the bin size; x(1) - x(0)) ifirst (first bin to use for collecting stats) ilast (last bin for collecting stats; use 0 to go to the end) &xmean (<optional return>=""> mean value of histogram) &xmedian (<optional return>=""> median value of histogram) &xmode (<optional return>=""> mode value of histogram: xmode = x(imode), where y(xmode) >= y(x(i)) for all i != imode) &xvariance (<optional return>=""> variance of x) Return: 0 if OK, 1 on error

Notes: (1) If the histogram represents the relation y(x), the computed values that are returned are the x values. These are NOT the bucket indices i; they are related to the bucket indices by x(i) = startx + i * deltax

Definition at line 1188 of file numafunc2.c.

References ERROR_INT, numaGetCount(), numaGetFValue(), and PROCNAME.

Referenced by numaGetHistogramStats().

l_int32 numaMakeRankFromHistogram	(	l_float32	startx,
		l_float32	deltax,
		NUMA *	nasy,
		l_int32	npts,
		NUMA **	pnax,
		NUMA **	pnay
	)

numaMakeRankFromHistogram()

Input: startx (xval corresponding to first element in nay) deltax (x increment between array elements in nay) nasy (input histogram, assumed equally spaced) npts (number of points to evaluate rank function) &nax (<optional return>=""> array of x values in range) &nay (<return> rank array of specified npts) Return: 0 if OK, 1 on error

Definition at line 1272 of file numafunc2.c.

References ERROR_INT, L_LINEAR_INTERP, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetCount(), numaGetFValue(), numaInterpolateEqxInterval(), numaNormalizeHistogram(), and PROCNAME.

Referenced by main().

l_int32 numaHistogramGetRankFromVal	(	NUMA *	na,
		l_float32	rval,
		l_float32 *	prank
	)

numaHistogramGetRankFromVal()

Input: na (histogram) rval (value of input sample for which we want the rank) &rank (<return> fraction of total samples below rval) Return: 0 if OK, 1 on error

Notes: (1) If we think of the histogram as a function y(x), normalized to 1, for a given input value of x, this computes the rank of x, which is the integral of y(x) from the start value of x to the input value. (2) This function only makes sense when applied to a Numa that is a histogram. The values in the histogram can be ints and floats, and are computed as floats. The rank is returned as a float between 0.0 and 1.0. (3) The numa parameters startx and binsize are used to compute x from the Numa index i.

Definition at line 1340 of file numafunc2.c.

References ERROR_INT, numaGetCount(), numaGetFValue(), numaGetSum(), numaGetXParameters(), PROCNAME, and total.

Referenced by main().

l_int32 numaHistogramGetValFromRank	(	NUMA *	na,
		l_float32	rank,
		l_float32 *	prval
	)

numaHistogramGetValFromRank()

Input: na (histogram) rank (fraction of total samples) &rval (<return> approx. to the bin value) Return: 0 if OK, 1 on error

Notes: (1) If we think of the histogram as a function y(x), this returns the value x such that the integral of y(x) from the start value to x gives the fraction 'rank' of the integral of y(x) over all bins. (2) This function only makes sense when applied to a Numa that is a histogram. The values in the histogram can be ints and floats, and are computed as floats. The val is returned as a float, even though the buckets are of integer width. (3) The numa parameters startx and binsize are used to compute x from the Numa index i.

Definition at line 1409 of file numafunc2.c.

References ERROR_INT, L_WARNING, numaGetCount(), numaGetFValue(), numaGetSum(), numaGetXParameters(), PROCNAME, and total.

Referenced by main(), numaGetStatsUsingHistogram(), and pixGetRankValueMasked().

l_int32 numaDiscretizeRankAndIntensity	(	NUMA *	na,
		l_int32	nbins,
		NUMA **	pnarbin,
		NUMA **	pnam,
		NUMA **	pnar,
		NUMA **	pnabb
	)

numaDiscretizeRankAndIntensity()

Input: na (normalized histogram of probability density vs intensity) nbins (number of bins at which the rank is divided) &pnarbin (<optional return>=""> rank bin value vs intensity) &pnam (<optional return>=""> median intensity in a bin vs rank bin value, with of discretized rank values) &pnar (<optional return>=""> rank vs intensity; this is a cumulative norm histogram) &pnabb (<optional return>=""> intensity at the right bin boundary vs rank bin) Return: 0 if OK, 1 on error

Notes: (1) We are inverting the rank(intensity) function to get the intensity(rank) function at equally spaced values of rank between 0.0 and 1.0. We save integer values for the intensity. (2) We are using the word "intensity" to describe the type of array values, but any array of non-negative numbers will work. (3) The output arrays give the following mappings, where the input is a normalized histogram of array values: array values --> rank bin number (narbin) rank bin number --> median array value in bin (nam) array values --> cumulative norm = rank (nar) rank bin number --> array value at right bin edge (nabb)

Definition at line 1487 of file numafunc2.c.

References ERROR_INT, FALSE, L_MAX, L_MIN, L_WARNING_INT2, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetCount(), numaGetFValue(), numaGetIValue(), numaSetValue(), PROCNAME, and TRUE.

Referenced by main(), numaGetRankBinValues(), and pixGetRankColorArray().

l_int32 numaGetRankBinValues	(	NUMA *	na,
		l_int32	nbins,
		NUMA **	pnarbin,
		NUMA **	pnam
	)

numaGetRankBinValues()

Input: na (just an array of values) nbins (number of bins at which the rank is divided) &pnarbin (<optional return>=""> rank bin value vs array value) &pnam (<optional return>=""> median intensity in a bin vs rank bin value, with of discretized rank values) Return: 0 if OK, 1 on error

Notes: (1) Simple interface for getting a binned rank representation of an input array of values. This returns two mappings: array value --> rank bin number (narbin) rank bin number --> median array value in each rank bin (nam)

Definition at line 1624 of file numafunc2.c.

References ERROR_INT, L_MIN, L_WARNING_FLOAT, NULL, numaDestroy(), numaDiscretizeRankAndIntensity(), numaGetCount(), numaGetMax(), numaGetXParameters(), numaMakeHistogram(), numaNormalizeHistogram(), and PROCNAME.

Referenced by main().

l_int32 numaSplitDistribution	(	NUMA *	na,
		l_float32	scorefract,
		l_int32 *	psplitindex,
		l_float32 *	pave1,
		l_float32 *	pave2,
		l_float32 *	pnum1,
		l_float32 *	pnum2,
		NUMA **	pnascore
	)

numaSplitDistribution()

Input: na (histogram) scorefract (fraction of the max score, used to determine the range over which the histogram min is searched) &splitindex (<optional return>=""> index for splitting) &ave1 (<optional return>=""> average of lower distribution) &ave2 (<optional return>=""> average of upper distribution) &num1 (<optional return>=""> population of lower distribution) &num2 (<optional return>=""> population of upper distribution) &nascore (<optional return>=""> for debugging; otherwise use null) Return: 0 if OK, 1 on error

Notes: (1) This function is intended to be used on a distribution of values that represent two sets, such as a histogram of pixel values for an image with a fg and bg, and the goal is to determine the averages of the two sets and the best splitting point. (2) The Otsu method finds a split point that divides the distribution into two parts by maximizing a score function that is the product of two terms: (a) the square of the difference of centroids, (ave1 - ave2)^2 (b) fract1 * (1 - fract1) where fract1 is the fraction in the lower distribution. (3) This works well for images where the fg and bg are each relatively homogeneous and well-separated in color. However, if the actual fg and bg sets are very different in size, and the bg is highly varied, as can occur in some scanned document images, this will bias the split point into the larger "bump" (i.e., toward the point where the (b) term reaches its maximum of 0.25 at fract1 = 0.5. To avoid this, we define a range of values near the maximum of the score function, and choose the value within this range such that the histogram itself has a minimum value. The range is determined by scorefract: we include all abscissa values to the left and right of the value that maximizes the score, such that the score stays above (1 - scorefract) * maxscore. The intuition behind this modification is to try to find a split point that both has a high variance score and is at or near a minimum in the histogram, so that the histogram slope is small at the split point. (4) We normalize the score so that if the two distributions were of equal size and at opposite ends of the numa, the score would be 1.0.

Definition at line 1717 of file numafunc2.c.

References ERROR_INT, GPLOT_PNG, gplotSimple1(), L_MIN, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaGetCount(), numaGetFValue(), numaGetHistogramStats(), numaGetSum(), and PROCNAME.

Referenced by GenerateSplitPlot(), and pixSplitDistributionFgBg().

NUMA* numaFindPeaks	(	NUMA *	nas,
		l_int32	nmax,
		l_float32	fract1,
		l_float32	fract2
	)

numaFindPeaks()

Input: source na max number of peaks to be found fract1 (min fraction of peak value) fract2 (min slope) Return: peak na, or null on error.

Notes: (1) The returned na consists of sets of four numbers representing the peak, in the following order: left edge; peak center; right edge; normalized peak area

Definition at line 1862 of file numafunc2.c.

References ERROR_PTR, NULL, numaAddNumber(), numaCopy(), numaCreate(), numaDestroy(), numaGetCount(), numaGetFValue(), numaGetMax(), numaGetSum(), numaSetValue(), PROCNAME, and total.

NUMA* numaFindExtrema	(	NUMA *	nas,
		l_float32	delta
	)

numaFindExtrema()

Input: nas (input values) delta (relative amount to resolve peaks and valleys) Return: nad (locations of extrema), or null on error

Notes: (1) This returns a sequence of extrema (peaks and valleys). (2) The algorithm is analogous to that for determining mountain peaks. Suppose we have a local peak, with bumps on the side. Under what conditions can we consider those 'bumps' to be actual peaks? The answer: if the bump is separated from the peak by a saddle that is at least 500 feet below the bump. (3) Operationally, suppose we are looking for a peak. We are keeping the largest value we've seen since the last valley, and are looking for a value that is delta BELOW our current peak. When we find such a value, we label the peak, use the current value to label the valley, and then do the same operation in reverse (looking for a valley).

Definition at line 1972 of file numafunc2.c.

References ERROR_PTR, FALSE, L_ABS, NULL, numaAddNumber(), numaCreate(), numaGetCount(), numaGetFValue(), PROCNAME, and TRUE.

Referenced by main(), numaCountReversals(), numaCrossingsByPeaks(), and pixMeasureEdgeSmoothness().

l_int32 numaCountReversals	(	NUMA *	nas,
		l_float32	minreversal,
		l_int32 *	pnr,
		l_float32 *	pnrpl
	)

numaCountReversals()

Input: nas (input values) minreversal (relative amount to resolve peaks and valleys) &nr (<optional return>=""> number of reversals &nrpl (<optional return>=""> reversal density: reversals/length) Return: 0 if OK, 1 on error

Notes: (1) The input numa is can be generated from pixExtractAlongLine(). If so, the x parameters can be used to find the reversal frequency along a line.

Definition at line 2061 of file numafunc2.c.

References ERROR_INT, NULL, numaDestroy(), numaFindExtrema(), numaGetCount(), numaGetXParameters(), and PROCNAME.

Referenced by pixReversalProfile().

l_int32 numaSelectCrossingThreshold	(	NUMA *	nax,
		NUMA *	nay,
		l_float32	estthresh,
		l_float32 *	pbestthresh
	)

numaSelectCrossingThreshold()

Input: nax (<optional> numa of abscissa values; can be NULL) nay (signal) estthresh (estimated pixel threshold for crossing: e.g., for images, white <--> black; typ. ~120) &bestthresh (<return> robust estimate of threshold to use) Return: 0 if OK, 1 on error

Note: (1) When a valid threshold is used, the number of crossings is a maximum, because none are missed. If no threshold intersects all the crossings, the crossings must be determined with numaCrossingsByPeaks(). (2) is an input estimate of the threshold that should be used. We compute the crossings with 41 thresholds (20 below and 20 above). There is a range in which the number of crossings is a maximum. Return a threshold in the center of this stable plateau of crossings. This can then be used with numaCrossingsByThreshold() to get a good estimate of crossing locations.

Definition at line 2121 of file numafunc2.c.

References ERROR_INT, FALSE, NULL, numaAddNumber(), numaCreate(), numaCrossingsByThreshold(), numaDestroy(), numaGetCount(), numaGetIValue(), numaGetMax(), numaGetMode(), numaWriteStream(), PROCNAME, and TRUE.

Referenced by pixExtractBarcodeCrossings().

NUMA* numaCrossingsByThreshold	(	NUMA *	nax,
		NUMA *	nay,
		l_float32	thresh
	)

numaCrossingsByThreshold()

Input: nax (<optional> numa of abscissa values; can be NULL) nay (numa of ordinate values, corresponding to nax) thresh (threshold value for nay) Return: nad (abscissa pts at threshold), or null on error

Notes: (1) If nax == NULL, we use startx and delx from nay to compute the crossing values in nad.

Definition at line 2242 of file numafunc2.c.

References ERROR_PTR, L_ABS, NULL, numaAddNumber(), numaCreate(), numaGetCount(), numaGetFValue(), numaGetXParameters(), and PROCNAME.

Referenced by numaSelectCrossingThreshold(), and pixExtractBarcodeCrossings().

NUMA* numaCrossingsByPeaks	(	NUMA *	nax,
		NUMA *	nay,
		l_float32	delta
	)

numaCrossingsByPeaks()

Input: nax (<optional> numa of abscissa values) nay (numa of ordinate values, corresponding to nax) delta (parameter used to identify when a new peak can be found) Return: nad (abscissa pts at threshold), or null on error

Notes: (1) If nax == NULL, we use startx and delx from nay to compute the crossing values in nad.

Definition at line 2305 of file numafunc2.c.

References ERROR_PTR, L_ABS, L_INFO_INT, NULL, numaAddNumber(), numaCreate(), numaDestroy(), numaFindExtrema(), numaGetCount(), numaGetFValue(), numaGetIValue(), numaGetXParameters(), and PROCNAME.

l_int32 numaEvalBestHaarParameters	(	NUMA *	nas,
		l_float32	relweight,
		l_int32	nwidth,
		l_int32	nshift,
		l_float32	minwidth,
		l_float32	maxwidth,
		l_float32 *	pbestwidth,
		l_float32 *	pbestshift,
		l_float32 *	pbestscore
	)

numaEvalBestHaarParameters()

Input: nas (numa of non-negative signal values) relweight (relative weight of (-1 comb) / (+1 comb) contributions to the 'convolution'. In effect, the convolution kernel is a comb consisting of alternating +1 and -weight.) nwidth (number of widths to consider) nshift (number of shifts to consider for each width) minwidth (smallest width to consider) maxwidth (largest width to consider) &bestwidth (<return> width giving largest score) &bestshift (<return> shift giving largest score) &bestscore (<optional return>=""> convolution with "Haar"-like comb) Return: 0 if OK, 1 on error

Notes: (1) This does a linear sweep of widths, evaluating at shifts for each width, computing the score from a convolution with a long comb, and finding the (width, shift) pair that gives the maximum score. The best width is the "half-wavelength" of the signal. (2) The convolving function is a comb of alternating values +1 and -1 * relweight, separated by the width and phased by the shift. This is similar to a Haar transform, except there the convolution is performed with a square wave. (3) The function is useful for finding the line spacing and strength of line signal from pixel sum projections. (4) The score is normalized to the size of nas divided by the number of half-widths. For image applications, the input is typically an array of pixel projections, so one should normalize by dividing the score by the image width in the pixel projection direction.

Definition at line 2420 of file numafunc2.c.

References ERROR_INT, numaEvalHaarSum(), and PROCNAME.

l_int32 numaEvalHaarSum	(	NUMA *	nas,
		l_float32	width,
		l_float32	shift,
		l_float32	relweight,
		l_float32 *	pscore
	)

numaEvalHaarSum()

Input: nas (numa of non-negative signal values) width (distance between +1 and -1 in convolution comb) shift (phase of the comb: location of first +1) relweight (relative weight of (-1 comb) / (+1 comb) contributions to the 'convolution'. In effect, the convolution kernel is a comb consisting of alternating +1 and -weight.) &score (<return> convolution with "Haar"-like comb) Return: 0 if OK, 1 on error

Notes: (1) This does a convolution with a comb of alternating values +1 and -relweight, separated by the width and phased by the shift. This is similar to a Haar transform, except that for Haar, (1) the convolution kernel is symmetric about 0, so the relweight is 1.0, and (2) the convolution is performed with a square wave. (2) The score is normalized to the size of nas divided by twice the "width". For image applications, the input is typically an array of pixel projections, so one should normalize by dividing the score by the image width in the pixel projection direction. (3) To get a Haar-like result, use relweight = 1.0. For detecting signals where you expect every other sample to be close to zero, as with barcodes or filtered text lines, you can use relweight > 1.0.

Definition at line 2500 of file numafunc2.c.

References ERROR_INT, numaGetCount(), numaGetFValue(), and PROCNAME.

Referenced by numaEvalBestHaarParameters().

Variable Documentation

const l_int32 BinSizeArray[] [static]

Initial value:

 {2, 5, 10, 20, 50, 100, 200, 500, 1000,
                      2000, 5000, 10000, 20000, 50000, 100000, 200000,
                      500000, 1000000, 2000000, 5000000, 10000000,
                      200000000, 50000000, 100000000}

Definition at line 102 of file numafunc2.c.

Referenced by numaMakeHistogram().

const l_int32 NBinSizes = 24 [static]

Definition at line 106 of file numafunc2.c.

Referenced by numaMakeHistogram().

numafunc2.c File Reference

Defines

Functions

Variables

Detailed Description

Define Documentation

Function Documentation

Variable Documentation