Quantcast
Channel: How to filter rows of a numpy array - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Answer by cottontail for How to filter rows of a numpy array

$
0
0

As @Roger Fan mentioned, applying a function row-wise should really be done in a vectorized fashion on the entire array. The canonical way to filter is to construct a boolean mask and apply it on the array. That said, if it happens that the function is so complex that vectorization is not possible, it's better/faster to convert the array into a Python list (especially if it uses Python functions such as sum()) and apply the function on it.

msk = arr.sum(axis=1)>10                # best way to create a boolean maskmsk = [f(row) for row in arr.tolist()]  # second best way#                            ^^^^^^^^   <---- convert to listfiltered_arr = arr[msk]                 # filtered via boolean indexing
A working example and a performance test

As you can see from the timeit test below, looping over a list (arr.tolist()) is much faster than looping over a numpy array (arr), partly because Python's sum() and not np.sum() is called in the function f(). That said, the vectorized method is much faster than both.

def f(row):    if sum(row)>10: return True    else: return Falsearr = np.random.rand(10000, 200)%timeit arr[[f(row) for row in arr]]# 260 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit arr[[f(row) for row in arr.tolist()]]# 114 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit arr[arr.sum(axis=1)>10]# 10.8 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Viewing all articles
Browse latest Browse all 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>