Easily answering the question: “How much space does this need?”
We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.
If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.
We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.
All our tests are done in Python3.
For numpy we just create an array of random values cast to floats:
import numpy as np a = np.array(np.random.random((100, 100, 10000)), dtype="float")
Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.
For the native lists, we use the same array, but convert it to a list of lists of lists:
import numpy as np a = [[[float(i) for i in j] for j in k] for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]
Instead of using the full-blown numpy, we can also turn the inner list into an array.
import numpy as np a = [[array.array("d", [float(i) for i in j]) for j in k] for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]
With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4.
Using an inner array (via array module) instead of the innermost list provides roughly the same gains.
I would have expected factor 3: The value plus a pointer to the next and to the previous entry.
The details are in the following table.
|total memory||per value|
|list of floats||3216.6 MiB||32.166 Bytes|
|numpy array of floats||776.7 MiB||7.767 Bytes|
|np f4||395.2 MiB||3.95 Bytes|
|np f2||283.4 MiB||2.834 Bytes|
|inner array||779.1 MiB||7.791 Bytes|
This test was conducted on a 64 bit system, so floats are equivalent to doubles.
The scipy documentation provides a list of all the possible dtype definitions cast to C-types.
In Python large numpy arrays require 4 times less memory than a linked list structure with the same data. Using an inner array from the array module instead of the innermost list provides roughly the same gains.
⚙ Babcom is trying to load the comments ⚙
This textbox will disappear when the comments have been loaded.
Note: To make a comment which isn’t a reply visible to others here, include a link to this site somewhere in the text of your comment. It will then show up here. To ensure that I get notified of your comment, also include my Sone-ID.
Link to this site and my Sone ID:
This spam-resistant comment-field is made with babcom.
The European Copyright directive threatens online communication in Europe.
But thanks to massive shared action earlier this year, the European parliament can still prevent the problems. For each of the articles there are proposals which fix them. The parliamentarians (MEPs) just have to vote for them. And since they are under massive pressure from large media companies, that went as far as defaming those who took action as fake people, the MEPs need to hear your voice to know that your are real.
If you care about the future of the Internet in the EU, please Call your MEPs.