找出熊猫系列中的元素指数

我知道这是一个非常基本的问题，但出于某种原因，我找不到答案。我怎样才能得到在python熊猫系列的某些元素的索引?(第一种情况就足够了)

例如，我想要这样的东西:

import pandas as pd
myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
print myseries.find(7) # should output 3

当然，可以用循环来定义这样的方法:

def find(s, el):
    for i in s.index:
        if s[i] == el: 
            return i
    return None

print find(myseries, 7)

但我想应该有更好的办法。是吗?

当前回答

>>> myseries[myseries == 7]
3    7
dtype: int64
>>> myseries[myseries == 7].index[0]
3

虽然我承认应该有更好的方法来做到这一点，但这至少避免了迭代和遍历对象，并将其移动到C级别。

2013-08-20 05:52:43

其他回答

In [92]: (myseries==7).argmax()
Out[92]: 3

如果你提前知道7在那里，这个方法是可行的。你可以用 (myseries = = 7) .any ()

另一种方法(与第一个答案非常相似)也解释了多个7(或没有)

In [122]: myseries = pd.Series([1,7,0,7,5], index=['a','b','c','d','e'])
In [123]: list(myseries[myseries==7].index)
Out[123]: ['b', 'd']

2015-04-08 08:12:33

转换为索引时，可以使用get_loc

In [1]: myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])

In [3]: Index(myseries).get_loc(7)
Out[3]: 3

In [4]: Index(myseries).get_loc(10)
KeyError: 10

重复处理

In [5]: Index([1,1,2,2,3,4]).get_loc(2)
Out[5]: slice(2, 4, None)

将返回一个布尔数组，如果不连续的返回

In [6]: Index([1,1,2,1,3,2,4]).get_loc(2)
Out[6]: array([False, False,  True, False, False,  True, False], dtype=bool)

内部使用哈希表，非常快

In [7]: s = Series(randint(0,10,10000))

In [9]: %timeit s[s == 5]
1000 loops, best of 3: 203 µs per loop

In [12]: i = Index(s)

In [13]: %timeit i.get_loc(5)
1000 loops, best of 3: 226 µs per loop

正如Viktor所指出的，创建索引有一个一次性的创建开销(当你实际对索引做一些事情时，例如is_unique)

In [2]: s = Series(randint(0,10,10000))

In [3]: %timeit Index(s)
100000 loops, best of 3: 9.6 µs per loop

In [4]: %timeit Index(s).is_unique
10000 loops, best of 3: 140 µs per loop

2013-08-20 11:37:59

这里所有的答案都让我印象深刻。这不是一个新的答案，只是试图总结所有这些方法的时机。我考虑了一个有25个元素的序列的情况，并假设一般情况下，索引可以包含任何值，并且您希望索引值对应于序列末尾的搜索值。

以下是在2012年的Mac Mini上使用Python 3.9.10和Pandas版本1.4.0进行的速度测试。

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = [406400, 203200, 101600, 76100, 50800, 25400, 19050, 12700, 950
   ...: 0, 6700, 4750, 3350, 2360, 1700, 1180, 850, 600, 425, 300, 212, 150, 1
   ...: 06, 75, 53, 38]

In [4]: myseries = pd.Series(data, index=range(1,26))

In [5]: assert(myseries[21] == 150)

In [6]: %timeit myseries[myseries == 150].index[0]
179 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit myseries[myseries == 150].first_valid_index()
205 µs ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: %timeit myseries.where(myseries == 150).first_valid_index()
597 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit myseries.index[np.where(myseries == 150)[0][0]]
110 µs ± 872 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: %timeit pd.Series(myseries.index, index=myseries)[150]
125 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [11]: %timeit myseries.index[pd.Index(myseries).get_loc(150)]
49.5 µs ± 814 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %timeit myseries.index[list(myseries).index(150)]
7.75 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit myseries.index[myseries.tolist().index(150)]
2.55 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit dict(zip(myseries.values, myseries.index))[150]
9.89 µs ± 79.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [15]: %timeit {v: k for k, v in myseries.items()}[150]
9.99 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@Jeff的回答似乎是最快的——尽管它不处理副本。

更正:抱歉，我错过了一个，@Alex Spangher使用列表索引方法的解决方案是目前为止最快的。

更新:添加了@EliadL的答案。

希望这能有所帮助。

令人惊讶的是，这么简单的操作需要如此复杂的解决方案，许多解决方案如此缓慢。在某些情况下，需要超过半毫秒才能在25的序列中找到一个值。

2022-02-18更新

使用最新的Pandas版本和Python 3.9更新了所有计时。即使在较旧的计算机上，与以前的测试(版本0.25.3)相比，所有的计时都显著减少了(10%到70%)。

增加:增加了两个更多的方法利用字典。

2019-08-31 22:13:30

通常你的价值会出现在多个指标上:

>>> myseries = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1])
>>> myseries.index[myseries == 1]
Int64Index([3, 4, 5, 6, 10, 11], dtype='int64')

2018-08-21 09:49:15

>>> myseries[myseries == 7]
3    7
dtype: int64
>>> myseries[myseries == 7].index[0]
3

虽然我承认应该有更好的方法来做到这一点，但这至少避免了迭代和遍历对象，并将其移动到C级别。

2013-08-20 05:52:43

找出熊猫系列中的元素指数

推荐文章

最新文章

标签