I have a list of data points. For the full run of my program, I'll use all of the data points, but for testing of the code, I want to use only a small percentage of them in order that the program run in a short time. I do not want simply to take the first n elements of the list, though; I want to select an even distribution of the elements from the list. So, if I'm using 50% of the data points, I might want to select from the list of data points every second data points.
Basically, I want to have a function that takes as arguments a list and a percentage and returns a list consisting of an even distribution of elements from the input list, the number of which corresponds as closely as possible to the percentage requested.
What would be a good way to do this?
解决方案
This can trivially be achieved by setting a slice with a step:
def select_elements(seq, perc):
"""Select a defined percentage of the elements of seq."""
return seq[::int(100.0/perc)]
In use:
>>> select_elements(range(10), 50)
[0, 2, 4, 6, 8]
>>> select_elements(range(10), 33)
[0, 3, 6, 9]
>>> select_elements(range(10), 25)
[0, 4, 8]
You could also add round, as int will truncate:
>>> int(3.6)
3
>>> int(round(3.6))
4
If you want to use a proportion rather than a percentage (e.g. 0.5 instead of 50), simply replace 100.0 with 1.