#+title: Exact Math to the rescue - with Guile Scheme
#+BEGIN_ABSTRACT
I needed to calculate the probability that for every [[http://freenetproject.org][freenet]] user there are at least 70 others in a distance of at most 0.01. That needs [[http://en.wikipedia.org/wiki/Binomial_coefficient][binomial coefficients]] with n and k on the order of 4000. My old Python script failed me with an =OverflowError: integer division result too large for a float=. So I turned to [[http://gnu.org/s/guile][Guile Scheme]] and exact math.
#+END_ABSTRACT
* The challenge
I need the probability that within 4000 random numbers between 0 and 1, at least 70 are below 0.02.
Then I need the probability that within 4000 random numbers, at most 5 find less than 70 others to which the distance is at most 0.02.
Or more exactly: I need to find the right maximum length to replace the 0.02.
* The old script
I had a Python-script lying around which I once wrote for estimating [[http://1w6.org/english/statistical-constraints-design-rpgs-campaigns][the probability that a roleplaying group will have enough people to play in a given gaming night]].
It’s called [[https://bitbucket.org/ArneBab/1w6/src/57189e02b9f6/sonstiges/Skripte/spielfaehig.py][spielfaehig.py]] (german for “able to play”).
It just does this:
#+BEGIN_SRC python
from math import factorial
fac = factorial
def nük(n, k):
if k > n: return 0
return fac(n) / (fac(k)*fac(n-k))
def binom(p, n, k):
return nük(n, k) * p** k * (1-p)**(n-k)
def spielfähig(p, n, min_spieler):
try:
return sum([binom(p, n, k) for k in range(min_spieler, n+1)])
except ValueError: return 1.0
#+END_SRC
Now when I run this with p=0.02, n=4000 and min_spieler=70, it returns
: OverflowError: integer division result too large for a float
The reason is simple: There are some intermediate numbers which are much larger than what a float can represent.
* Solution with Guile
To fix this, I rewrote the script in Guile Scheme:
#+BEGIN_SRC scheme
#!/usr/bin/env guile-2.0
!#
(define-module (spielfaehig)
#:export (spielfähig))
(use-modules (srfi srfi-1)) ; for iota with count and start
(define (factorial n)
(if (zero? n) 1
(* n (factorial (1- n)))))
(define (nük n k)
(if (> k n) 0
(/ (factorial n)
(factorial k)
(factorial (- n k)))))
(define (binom p n k)
(* (nük n k)
(expt p k)
(expt (- 1 p) (- n k))))
(define (spielfähig p n min_spieler)
(apply +
(map (lambda (k) (binom p n k))
(iota (1+ (- n min_spieler)) min_spieler))))
#+END_SRC
To use this with [[https://www.gnu.org/software/guile/manual/html_node/Exactness.html#Exactness][exact math]], I just need to call it with p as exact number:
#+BEGIN_SRC scheme
(use-modules (spielfaehig))
(spielfähig #e.03 4000 70)
; ^ note the #e - this means to use an exact representation
; of the number
; To make Guile show a float instead of some huge division, just
; convert the number to an inexact representation before showing it.
(format #t "~A\n" (exact->inexact (spielfähig #e.03 4000 70)))
#+END_SRC
And that’s it. Automagic hassle-free exact math is at my fingertips.
It just works and uses less then 200 MiB of memory - even though the intermediate factorials return huge numbers. And huge means *huge*. It effortlessly handles numbers with a size on the order of 10^{8000}. That is 10 to the power of 8000 - a number with 8000 digits.
* The Answer
42! :)
The real answer is 0.0125: That’s the maximum length we need to choose for short links to get more than a 95% probability that in a network of 4000 nodes there are at most 5 nodes for which there are less than 70 peers with a distance of at most the maximum length.
If we can assume 5000 nodes, then 0.01 is enough. And since this is the number we directly got from an [[https://d6.gnutella2.info/freenet/USK@ZLwcSLwqpM1527Tw1YmnSiXgzITU0neHQ11Cyl0iLmk,f6FLo3TvsEijIcJq-X3BTjjtm0ErVZwAPO7AUd9V7lY,AQACAAE/fix-link-length/1/][analysis of our link length distribution]], it is the better choice, though it will mean that people with huge bandwidth cannot always max out their 100 connections.
* Conclusion
Most of the time, floats are OK. But there are the times when you simply need exact math.
In these situations *Guile Scheme is a lifesaver*.
Dear GNU Hackers, thank you for this masterpiece!
And if you were crazy enough to read till here, *Happy Hacking* to you!