Di, 12/16/2014 - 16:42 — Draketo
## Table of Contents

# Intro

# The test

## Numpy

## Native lists

## Array module

# The results

# Summary

Easily answering the question: “How much space does this need?”

We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.

If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.

We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.

All our tests are done in Python3.

For numpy we just create an array of random values cast to floats:

```
import numpy as np
a = np.array(np.random.random((100, 100, 10000)), dtype="float")
```

Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.

For the native lists, we use the same array, but convert it to a list of lists of lists:

```
import numpy as np
a = [[[float(i) for i in j] for j in k]
for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]
```

Instead of using the full-blown numpy, we can also turn the inner list into an array.

```
import numpy as np
a = [[array.array("d", [float(i) for i in j]) for j in k]
for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]
```

With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4.

Using an inner array (via array module) instead of the innermost list provides roughly the same gains.

I would have expected factor 3: The value plus a pointer to the next and to the previous entry.

The details are in the following table.

total memory | per value | |
---|---|---|

list of floats | 3216.6 MiB | 32.166 Bytes |

numpy array of floats | 776.7 MiB | 7.767 Bytes |

np f4 | 395.2 MiB | 3.95 Bytes |

np f2 | 283.4 MiB | 2.834 Bytes |

inner array | 779.1 MiB | 7.791 Bytes |

This test was conducted on a 64 bit system, so floats are equivalent to doubles.

The scipy documentation provides a list of all the possible dtype definitions cast to C-types.

In Python large numpy arrays require 4 times less memory than a linked list structure with the same data. Using an inner array from the array module instead of the innermost list provides roughly the same gains.

- Druckversion
- Login to post comments

Use Node:

⚙ Babcom is trying to load the comments ⚙

This textbox will disappear when the comments have been loaded.

If the box below shows an error-page, you need to install Freenet with the Sone-Plugin or set the node-path to your freenet node and click the `Reload Comments` button (or return).

If you see something like `Invalid key: java.net.MalformedURLException: There is no @ in that URI! (Sone/search.html)`

, you need to setup Sone and the Web of Trust

If you had Javascript enabled, you would see comments for this page instead of the Sone page of the sites author.

*Note: To make a comment which isn’t a reply visible to others here, include a link to this site somewhere in the text of your comment. It will then show up here. To ensure that I get notified of your comment, also include my Sone-ID.*

Link to this site and my Sone ID: ` sone://6~ZDYdvAgMoUfG6M5Kwi7SQqyS-gTcyFeaNN1Pf3FvY`

This spam-resistant comment-field is made with babcom.