Memory requirement of Python datastructures: numpy array, list of floats and inner array

Easily answering the question: “How much space does this need?”

Intro

We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.

If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.

The test

We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.

All our tests are done in Python3.

Numpy

For numpy we just create an array of random values cast to floats:

import numpy as np
a = np.array(np.random.random((100, 100, 10000)), dtype="float")

Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.

Native lists

For the native lists, we use the same array, but convert it to a list of lists of lists:

import numpy as np
a = [[[float(i) for i in j] for j in k] 
     for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

Array module

Instead of using the full-blown numpy, we can also turn the inner list into an array.

import numpy as np
a = [[array.array("d", [float(i) for i in j]) for j in k] 
     for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

The results

With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4.

Using an inner array (via array module) instead of the innermost list provides roughly the same gains.

I would have expected factor 3: The value plus a pointer to the next and to the previous entry.

The details are in the following table.

Table 1: Memory requirement of different ways to store values in Python
  total memory per value
list of floats 3216.6 MiB 32.166 Bytes
numpy array of floats 776.7 MiB 7.767 Bytes
np f4 395.2 MiB 3.95 Bytes
np f2 283.4 MiB 2.834 Bytes
inner array 779.1 MiB 7.791 Bytes

This test was conducted on a 64 bit system, so floats are equivalent to doubles.

The scipy documentation provides a list of all the possible dtype definitions cast to C-types.

Summary

In Python large numpy arrays require 4 times less memory than a linked list structure with the same data. Using an inner array from the array module instead of the innermost list provides roughly the same gains.

Inhalt abgleichen
Willkommen im Weltenwald!
((λ()'Dr.ArneBab))



Beliebte Inhalte

sn.1w6.org news