scipy.stats.chi2.ppf() without scipy

Allantools has a dependence on scipy because it uses scipy.stats.chi2.ppf() (inverse of the chi-squared cumulative distribution function) in the code for confidence intervals.

When testing scipy takes quite a long time to install and the whole package seems a bit overkill for just this one function.

So I tried implementing it using just numpy. It sort of works, but not for corner cases where p is very close to 1.0 or k is a big number. I get for example:

    top = pow(x, s)*math.exp(-x)*pow(x,k)
OverflowError: (34, 'Numerical result out of range')

Maybe there's something trick when numerically evaluating lower_gamma() using the series expansion that I am missing...

import scipy.stats
import math
import numpy
 
def bisection(function, k, p, a, b, tol):
	# http://code.activestate.com/recipes/578417-bisection-method-in-python/
	assert (function(a,k)-p)*(function(b,k)-p) < 0 # a, b must bracket root
	c = (a+b)/2.0
	while (b-a)/2.0 > tol:
		if (function(c, k)-p) == 0:
			return c
		elif (function(a,k)-p)*(function(c,k)-p) < 0:
			b = c
		else :
			a = c
		c = (a+b)/2.0
	return c
 
def lower_gamma(s,x):
	# lower incomplete gamma function
	# https://en.wikipedia.org/wiki/Incomplete_gamma_function#Evaluation_formulae
	g  = 0
	last_g = 1.0
	done = False
	tol = 1.0e-6
	k=0
	while not done:
		top = pow(x, s)*math.exp(-x)*pow(x,k)
		bot = numpy.prod( [float(s+j) for j in range(k+1) ] )
		dg = float(top)/float(bot)
		if dg == float("Inf"):
			break
		g += dg
		k += 1
		if k>100: # get at least 100 terms in the sum
			if g==0:
				break
			delta = abs(dg/g)
			if delta == float("Inf"):
				break
			if delta < tol:
				done = True
		last_g = g
	return g
 
def chi2_cdf(x, k):
	# chi-squared cumulative density function
	# cdf(x; k) = lower_gamma(k/2, x/2) / gamma(k/2)
	return lower_gamma(k/2.0, x/2.0) / math.gamma(k/2.0)
 
def chi2_ppf(p, k):
	# chi-squared Percent point function (inverse of cdf percentiles).
	# look for x such that
	# p = chi2_cdf( x=chi2_ppf(p, k), k)
	tol = 1e-8
	lolim = 0
	hilim = k
	while (chi2_cdf(lolim,k)-p)*(chi2_cdf(hilim,k)-p) > 0: 
		hilim *= 1.5
	return bisection( chi2_cdf, k, p, lolim, hilim, tol)
 
print "scipy cdf: ",scipy.stats.chi2.cdf(55, 33)	
print "own   cdf: ",chi2_cdf(55, 33)
 
 
print "scipy ppf ", scipy.stats.chi2.ppf(0.4, 33)
print "  own ppf ", chi2_ppf(0.4, 33)
 
# test that we really found the inverse
print scipy.stats.chi2.cdf(scipy.stats.chi2.ppf(0.4, 33), 33)
print chi2_cdf( chi2_ppf(0.4, 33), 33 )
 
# try to check the scipy function against our own function
# for some random input of (p, k)
for n in range(100):
	k = numpy.random.randint(20, 200)
	p = numpy.random.random()
	print k, p,
	a=scipy.stats.chi2.ppf(p, k)
	b=chi2_ppf(p, k)
	ok = numpy.isclose(a, b)
	if ok:
		print  ok
	else:
		print  ok, a, b
	assert ok

import scipy.stats import math import numpy def bisection(function, k, p, a, b, tol): # http://code.activestate.com/recipes/578417-bisection-method-in-python/ assert (function(a,k)-p)*(function(b,k)-p) < 0 # a, b must bracket root c = (a+b)/2.0 while (b-a)/2.0 > tol: if (function(c, k)-p) == 0: return c elif (function(a,k)-p)*(function(c,k)-p) < 0: b = c else : a = c c = (a+b)/2.0 return c def lower_gamma(s,x): # lower incomplete gamma function # https://en.wikipedia.org/wiki/Incomplete_gamma_function#Evaluation_formulae g = 0 last_g = 1.0 done = False tol = 1.0e-6 k=0 while not done: top = pow(x, s)*math.exp(-x)*pow(x,k) bot = numpy.prod( [float(s+j) for j in range(k+1) ] ) dg = float(top)/float(bot) if dg == float("Inf"): break g += dg k += 1 if k>100: # get at least 100 terms in the sum if g==0: break delta = abs(dg/g) if delta == float("Inf"): break if delta < tol: done = True last_g = g return g def chi2_cdf(x, k): # chi-squared cumulative density function # cdf(x; k) = lower_gamma(k/2, x/2) / gamma(k/2) return lower_gamma(k/2.0, x/2.0) / math.gamma(k/2.0) def chi2_ppf(p, k): # chi-squared Percent point function (inverse of cdf percentiles). # look for x such that # p = chi2_cdf( x=chi2_ppf(p, k), k) tol = 1e-8 lolim = 0 hilim = k while (chi2_cdf(lolim,k)-p)*(chi2_cdf(hilim,k)-p) > 0: hilim *= 1.5 return bisection( chi2_cdf, k, p, lolim, hilim, tol) print "scipy cdf: ",scipy.stats.chi2.cdf(55, 33) print "own cdf: ",chi2_cdf(55, 33) print "scipy ppf ", scipy.stats.chi2.ppf(0.4, 33) print " own ppf ", chi2_ppf(0.4, 33) # test that we really found the inverse print scipy.stats.chi2.cdf(scipy.stats.chi2.ppf(0.4, 33), 33) print chi2_cdf( chi2_ppf(0.4, 33), 33 ) # try to check the scipy function against our own function # for some random input of (p, k) for n in range(100): k = numpy.random.randint(20, 200) p = numpy.random.random() print k, p, a=scipy.stats.chi2.ppf(p, k) b=chi2_ppf(p, k) ok = numpy.isclose(a, b) if ok: print ok else: print ok, a, b assert ok

4 thoughts on “scipy.stats.chi2.ppf() without scipy”

Try computing dg using log-math. not sure if this will render in a comment or not:

        while not done:
                if x != 0:
                    log_top = math.log(x) * (s + k) - x
                else:
                    log_top = 0
                log_bot = sum(math.log(s+j) for j in range(k+1))
                dg = math.exp(log_top - log_bot)

however, this has some failures of the 'isclose' test e.g., at
190 0.9999 False 271.179788314 271.221549805
(relative error .00015)

full test program adapted from yours at https://emergent.unpythonic.net/files/sandbox/chi2_ppf_jepler.py

boost discusses their implementation: http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_gamma/igamma.html#math_toolkit.special.sf_gamma.igamma.implementation using several different methods depending on the values of (what they call) x and a, may be worth a gander.

admin says:

May 5, 2016 at 04:52

Thanks! I will take a look at the links.

HTML tags 'pre lang="python"' around the code makes it render using the WP-Syntax plugin.

Reply

You could maybe get away with exploiting the fact that the gamma function in the denominator of the pdf of a chi-squared random variable is just a normalizing constant. So, e.g., these three functions get me pretty close to the output of scipy.stats.chi2.ppf():

def chi2_pdf(x,k):
    bin = x[1]-x[0]
    fx = (x**((k/2)-1)*np.exp(-x/2))/(2**(k/2))
    fx = fx/(bin*np.sum(fx))
    return fx, bin
 
def chi2_cdf(x,k):
    pdf,bin = chi2_pdf(x,k)
    cdf = np.zeros(len(x))
    for i in range(len(x)):
        cdf[i] = bin*np.sum(pdf[:i])
    return cdf
 
def chi2_inv(p,x,k):
    cdf = chi2_cdf(x,k)
    ix = np.argmin(np.abs(cdf-p))
    return x[ix]

The "binsize" of x matters, both for speed and accuracy. And if you don't know what k is ahead of time, the fact that you have to define x is kind of a pain, though you can probably also exploit the fact that the expected value of a chi-squared random variable is k and the variance is 2k to define x over an appropriate range once k is known.

Here's hoping I used the "pre" tag correctly...

I have a feeling that the cdf(x,k) function is very flat at high x as we get closer to p=1.0. Conversely ppf(p,k) is very steep as we get closer to p=1.0

One improvement could be to change the bisection() function that looks for the right value of x to instead stop searching with a tolerance check on p (instead of x). Maybe?

4 thoughts on “scipy.stats.chi2.ppf() without scipy”

Leave a Reply Cancel reply