Di, 12/16/2014 - 16:42 — Draketo
## Table of Contents

# Intro

# The test

## Numpy

## Native lists

## Array module

# The results

# Summary

Easily answering the question: “How much space does this need?”

We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.

If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.

We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.

All our tests are done in Python3.

For numpy we just create an array of random values cast to floats:

```
import numpy as np
a = np.array(np.random.random((100, 100, 10000)), dtype="float")
```

Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.

For the native lists, we use the same array, but convert it to a list of lists of lists:

```
import numpy as np
a = [[[float(i) for i in j] for j in k]
for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]
```

Instead of using the full-blown numpy, we can also turn the inner list into an array.

```
import numpy as np
a = [[array.array("d", [float(i) for i in j]) for j in k]
for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]
```

With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4.

Using an inner array (via array module) instead of the innermost list provides roughly the same gains.

I would have expected factor 3: The value plus a pointer to the next and to the previous entry.

The details are in the following table.

total memory | per value | |
---|---|---|

list of floats | 3216.6 MiB | 32.166 Bytes |

numpy array of floats | 776.7 MiB | 7.767 Bytes |

np f4 | 395.2 MiB | 3.95 Bytes |

np f2 | 283.4 MiB | 2.834 Bytes |

inner array | 779.1 MiB | 7.791 Bytes |

This test was conducted on a 64 bit system, so floats are equivalent to doubles.

The scipy documentation provides a list of all the possible dtype definitions cast to C-types.

In Python large numpy arrays require 4 times less memory than a linked list structure with the same data. Using an inner array from the array module instead of the innermost list provides roughly the same gains.

- Druckversion
- Login to post comments

- drak: 3rd Law of Computing: Anything that can go wr fortune: Segmentation violation -- Core dumped — thanks to fortune-mod!
- drak: RT @clacke @drak I love the power of stdin+stdout that allows so many things to trivially tunnel over ssh!
- drak: Two references (the first with a book recommendation for further reading): - http://psychologyofwellbeing.com/201110/men-are-better-at-math.html - https://thesocietypages.org/socimages/2013/03/07/the-truth-about-gender-and-math/