As @Roger Fan mentioned, applying a function row-wise should really be done in a vectorized fashion on the entire array. The canonical way to filter is to construct a boolean mask and apply it on the array. That said, if it happens that the function is so complex that vectorization is not possible, it's better/faster to convert the array into a Python list (especially if it uses Python functions such as sum()
) and apply the function on it.
msk = arr.sum(axis=1)>10 # best way to create a boolean maskmsk = [f(row) for row in arr.tolist()] # second best way# ^^^^^^^^ <---- convert to listfiltered_arr = arr[msk] # filtered via boolean indexing
A working example and a performance test
As you can see from the timeit test below, looping over a list (arr.tolist()
) is much faster than looping over a numpy array (arr
), partly because Python's sum()
and not np.sum()
is called in the function f()
. That said, the vectorized method is much faster than both.
def f(row): if sum(row)>10: return True else: return Falsearr = np.random.rand(10000, 200)%timeit arr[[f(row) for row in arr]]# 260 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit arr[[f(row) for row in arr.tolist()]]# 114 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit arr[arr.sum(axis=1)>10]# 10.8 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)