Exact Math to the rescue - with Guile Scheme

I needed to calculate the probability that for every freenet user there are at least 70 others in a distance of at most 0.01. That needs binomial coefficients with n and k on the order of 4000. My old Python script failed me with an OverflowError: integer division result too large for a float. So I turned to Guile Scheme and exact math.

1 The challenge

I need the probability that within 4000 random numbers between 0 and 1, at least 70 are below 0.02.

Then I need the probability that within 4000 random numbers, at most 5 find less than 70 others to which the distance is at most 0.02.

Or more exactly: I need to find the right maximum length to replace the 0.02.

2 The old script

I had a Python-script lying around which I once wrote for estimating the probability that a roleplaying group will have enough people to play in a given gaming night.

It’s called spielfaehig.py (german for “able to play”).

It just does this:

from math import factorial
fac = factorial
def nük(n, k): 
   if k > n: return 0
   return fac(n) / (fac(k)*fac(n-k))

def binom(p, n, k): 
   return nük(n, k) * p** k * (1-p)**(n-k)

def spielfähig(p, n, min_spieler): 
   try: 
      return sum([binom(p, n, k) for k in range(min_spieler, n+1)])
   except ValueError: return 1.0

Now when I run this with p=0.02, n=4000 and minspieler=70, it returns

OverflowError: integer division result too large for a float

The reason is simple: There are some intermediate numbers which are much larger than what a float can represent.

3 Solution with Guile

To fix this, I rewrote the script in Guile Scheme:

#!/usr/bin/env guile-2.0
!#

(define-module (spielfaehig)
  #:export (spielfähig))
(use-modules (srfi srfi-1)) ; for iota with count and start

(define (factorial n)
  (if (zero? n) 1 
      (* n (factorial (1- n)))))

(define (nük n k)
  (if (> k n) 0
      (/ (factorial n) 
         (factorial k) 
         (factorial (- n k)))))

(define (binom p n k)
  (* (nük n k) 
     (expt p k) 
     (expt (- 1 p) (- n k))))

(define (spielfähig p n min_spieler) 
  (apply + 
         (map (lambda (k) (binom p n k)) 
              (iota (1+ (- n min_spieler)) min_spieler))))

To use this with exact math, I just need to call it with p as exact number:

(use-modules (spielfaehig))
(spielfähig #e.03 4000 70)
;           ^ note the #e - this means to use an exact representation
;                           of the number

; To make Guile show a float instead of some huge division, just
; convert the number to an inexact representation before showing it.
(format #t "~A\n" (exact->inexact (spielfähig #e.03 4000 70)))

And that’s it. Automagic hassle-free exact math is at my fingertips.

It just works and uses less then 200 MiB of memory - even though the intermediate factorials return huge numbers. And huge means huge. It effortlessly handles numbers with a size on the order of 108000. That is 10 to the power of 8000 - a number with 8000 digits.

4 The Answer

42! :)

The real answer is 0.0125: That’s the maximum length we need to choose for short links to get more than a 95% probability that in a network of 4000 nodes there are at most 5 nodes for which there are less than 70 peers with a distance of at most the maximum length.

If we can assume 5000 nodes, then 0.01 is enough. And since this is the number we directly got from an analysis of our link length distribution, it is the better choice, though it will mean that people with huge bandwidth cannot always max out their 100 connections.

5 Conclusion

Most of the time, floats are OK. But there are the times when you simply need exact math.

In these situations Guile Scheme is a lifesaver.

Dear GNU Hackers, thank you for this masterpiece!

And if you were crazy enough to read till here, Happy Hacking to you!

AnhangGröße
2014-07-21-Mo-exact-math-to-the-rescue-guile-scheme.org4.41 KB
Inhalt abgleichen
Willkommen im Weltenwald!
((λ()'Dr.ArneBab))



Beliebte Inhalte

sn.1w6.org news