iloc和loc有什么不同?

有人能解释一下这两种切片方法有什么不同吗? 我看过医生了我看到了这些答案，但我仍然无法理解这三个答案有什么不同。对我来说，它们在很大程度上是可以互换的，因为它们处于较低的切片水平。

例如，假设我们想获取一个DataFrame的前五行。这两者是如何运作的呢?

df.loc[:5]
df.iloc[:5]

谁能举出三种用法上的区别更清楚的例子?

以前，我还想知道这两个函数和df有什么不同。ix[:5]但是ix已经从pandas 1.0中删除了，所以我不再关心了。

当前回答

.loc和.iloc用于索引，即提取部分数据。本质上，区别在于.loc允许基于标签的索引，而.iloc允许基于位置的索引。

如果你对.loc和.iloc感到困惑，请记住。iloc基于索引(从i开始)位置，而.loc基于标签(从l开始)。

.loc

.loc应该基于索引标签而不是位置，因此它类似于Python基于字典的索引。但是，它可以接受布尔数组、切片和标签列表(这些都不能用于Python字典)。

iloc

.iloc基于索引位置进行查找，也就是说，pandas的行为类似于Python列表。如果在该位置没有索引，pandas将引发IndexError。

例子

下面的例子说明了.iloc和.loc之间的区别。让我们考虑以下系列:

>>> s = pd.Series([11, 9], index=["1990", "1993"], name="Magic Numbers")
>>> s
1990    11
1993     9
Name: Magic Numbers , dtype: int64

.iloc例子

>>> s.iloc[0]
11
>>> s.iloc[-1]
9
>>> s.iloc[4]
Traceback (most recent call last):
    ...
IndexError: single positional indexer is out-of-bounds
>>> s.iloc[0:3] # slice
1990 11
1993  9
Name: Magic Numbers , dtype: int64
>>> s.iloc[[0,1]] # list
1990 11
1993  9
Name: Magic Numbers , dtype: int64

.loc例子

>>> s.loc['1990']
11
>>> s.loc['1970']
Traceback (most recent call last):
    ...
KeyError: ’the label [1970] is not in the [index]’
>>> mask = s > 9
>>> s.loc[mask]
1990 11
Name: Magic Numbers , dtype: int64
>>> s.loc['1990':] # slice
1990    11
1993     9
Name: Magic Numbers, dtype: int64

因为s有字符串索引值，.loc将失败用整数进行索引:

>>> s.loc[0]
Traceback (most recent call last):
    ...
KeyError: 0

2020-12-27 00:56:04

其他回答

Iloc基于整数定位工作。所以不管你的行标签是什么，你总是可以，例如，通过做

df.iloc[0]

或者最后五行

df.iloc[-5:]

您也可以在列上使用它。这将检索第3列:

df.iloc[:, 2]    # the : in the first position indicates all rows

你可以把它们组合起来得到行和列的交点:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

另一方面，.loc使用命名索引。让我们建立一个以字符串作为行和列标签的数据帧:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

然后我们可以得到第一行

df.loc['a']     # equivalent to df.iloc[0]

“date”列的后两行by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

等等。现在，可能值得指出的是，DataFrame的默认行索引和列索引是从0开始的整数，在这种情况下iloc和loc将以相同的方式工作。这就是为什么你的三个例子是等价的。如果您有一个非数字索引，如字符串或日期时间，df。Loc[:5]将引发一个错误。

同样，你也可以通过使用数据帧的__getitem__来进行列检索:

df['time']    # equivalent to df.loc[:, 'time']

现在假设您想混合使用位置索引和命名索引，也就是说，在行上使用名称，在列上使用位置(澄清一下，我指的是从数据帧中选择，而不是创建一个行索引中有字符串、列索引中有整数的数据帧)。这就是.ix的用武之地:

df.ix[:2, 'time']    # the first two rows of the 'time' column

值得一提的是，你也可以将布尔向量传递给loc方法。例如:

 b = [True, False, True]
 df.loc[b]

将返回df的第1行和第3行。这相当于df[b]的选择，但它也可以用于通过布尔向量赋值:

df.loc[b, 'name'] = 'Mary', 'John'

2015-07-23 17:17:27

DataFrame.loc():根据索引值选择行 DataFrame.iloc():按行数选择行

例子:

选择表的前5行，df1是你的数据帧

df1.iloc[:5]

选择表的前A, B行，df1是你的数据帧

df1.loc['A','B']

2020-12-10 05:21:56

.loc和.iloc用于索引，即提取部分数据。本质上，区别在于.loc允许基于标签的索引，而.iloc允许基于位置的索引。

如果你对.loc和.iloc感到困惑，请记住。iloc基于索引(从i开始)位置，而.loc基于标签(从l开始)。

.loc

.loc应该基于索引标签而不是位置，因此它类似于Python基于字典的索引。但是，它可以接受布尔数组、切片和标签列表(这些都不能用于Python字典)。

iloc

.iloc基于索引位置进行查找，也就是说，pandas的行为类似于Python列表。如果在该位置没有索引，pandas将引发IndexError。

例子

下面的例子说明了.iloc和.loc之间的区别。让我们考虑以下系列:

>>> s = pd.Series([11, 9], index=["1990", "1993"], name="Magic Numbers")
>>> s
1990    11
1993     9
Name: Magic Numbers , dtype: int64

.iloc例子

>>> s.iloc[0]
11
>>> s.iloc[-1]
9
>>> s.iloc[4]
Traceback (most recent call last):
    ...
IndexError: single positional indexer is out-of-bounds
>>> s.iloc[0:3] # slice
1990 11
1993  9
Name: Magic Numbers , dtype: int64
>>> s.iloc[[0,1]] # list
1990 11
1993  9
Name: Magic Numbers , dtype: int64

.loc例子

>>> s.loc['1990']
11
>>> s.loc['1970']
Traceback (most recent call last):
    ...
KeyError: ’the label [1970] is not in the [index]’
>>> mask = s > 9
>>> s.loc[mask]
1990 11
Name: Magic Numbers , dtype: int64
>>> s.loc['1990':] # slice
1990    11
1993     9
Name: Magic Numbers, dtype: int64

因为s有字符串索引值，.loc将失败用整数进行索引:

>>> s.loc[0]
Traceback (most recent call last):
    ...
KeyError: 0

2020-12-27 00:56:04

下面的例子将说明其中的区别:

df = pd.DataFrame({'col1': [1,2,3,4,5], 'col2': ["foo", "bar", "baz", "foobar", "foobaz"]})
  col1  col2
0   1   foo
1   2   bar
2   3   baz
3   4   foobar
4   5   foobaz

df = df.sort_values('col1', ascending = False)
      col1  col2
    4   5   foobaz
    3   4   foobar
    2   3   baz
    1   2   bar
    0   1   foo

基于索引的访问:

df.iloc[0, 0:2]
col1         5
col2    foobaz
Name: 4, dtype: object

我们得到排序后的数据框架的第一行。(这不是索引为0的行，而是索引为4的行)。

基于职位的访问:

df.loc[0, 'col1':'col2']
col1      1
col2    foo
Name: 0, dtype: object

我们得到下标为0的行，即使df已经排序。

2022-05-27 12:01:11

标签vs.位置

这两种方法的主要区别是:

Loc获取带有特定标签的行(和/或列)。 Iloc在整数位置获取行(和/或列)。

为了演示，考虑一系列具有非单调整数索引的字符s:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 
49    a
48    b
47    c
0     d
1     e
2     f

>>> s.loc[0]    # value at index label 0
'd'

>>> s.iloc[0]   # value at index location 0
'a'

>>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive)
0    d
1    e

>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49    a

下面是s.loc和s.iloc在传递各种对象时的一些不同/相似之处:

<object>	description	`s.loc[<object>]`	`s.iloc[<object>]`
`0`	single item	Value at index label `0` (the string `'d'`)	Value at index location 0 (the string `'a'`)
`0:1`	slice	Two rows (labels `0` and `1`)	One row (first row at location 0)
`1:47`	slice with out-of-bounds end	Zero rows (empty Series)	Five rows (location 1 onwards)
`1:47:-1`	slice with negative step	three rows (labels `1` back to `47`)	Zero rows (empty Series)
`[2, 0]`	integer list	Two rows with given labels	Two rows with given locations
`s > 'e'`	Bool series (indicating which values have the property)	One row (containing `'f'`)	`NotImplementedError`
`(s>'e').values`	Bool array	One row (containing `'f'`)	Same as `loc`
`999`	int object not in index	`KeyError`	`IndexError` (out of bounds)
`-1`	int object not in index	`KeyError`	Returns last value in `s`
`lambda x: x.index[3]`	callable applied to series (here returning 3^rd item in index)	`s.loc[s.index[3]]`	`s.iloc[s.index[3]]`

Loc的标签查询功能远远超出了整数索引的范围，有必要再举几个例子。

下面是一个索引包含字符串对象的Series:

>>> s2 = pd.Series(s.index, index=s.values)
>>> s2
a    49
b    48
c    47
d     0
e     1
f     2

由于loc是基于标签的，它可以使用s2.loc['a']获取Series中的第一个值。它也可以对非整数对象进行切片:

>>> s2.loc['c':'e']  # all rows lying between 'c' and 'e' (inclusive)
c    47
d     0
e     1

对于DateTime索引，我们不需要通过标签来获取准确的日期/时间。例如:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) 
>>> s3
2021-01-31 16:41:31.879768    a
2021-02-28 16:41:31.879768    b
2021-03-31 16:41:31.879768    c
2021-04-30 16:41:31.879768    d
2021-05-31 16:41:31.879768    e

然后获取2021年3月/ 4月的行，我们只需要:

>>> s3.loc['2021-03':'2021-04']
2021-03-31 17:04:30.742316    c
2021-04-30 17:04:30.742316    d

行和列

loc和iloc处理dataframe的方式与处理Series的方式相同。值得注意的是，这两种方法都可以同时寻址列和行。

给定一个元组时，第一个元素用于索引行，如果存在，第二个元素用于索引列。

考虑下面定义的数据帧:

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

然后举个例子:

>>> df.loc['c': , :'z']  # rows 'c' and onwards AND columns up to 'z'
    x   y   z
c  10  11  12
d  15  16  17
e  20  21  22

>>> df.iloc[:, 3]        # all rows, but only the column at index location 3
a     3
b     8
c    13
d    18
e    23

有时，我们希望将行和列的标签和位置索引方法混合使用，以某种方式结合loc和iloc的功能。

例如，考虑以下DataFrame。如何最好地将行切片到并包括“c”并取前四列?

>>> import numpy as np 
>>> df = pd.DataFrame(np.arange(25).reshape(5, 5),  
                      index=list('abcde'), 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a   0   1   2   3   4
b   5   6   7   8   9
c  10  11  12  13  14
d  15  16  17  18  19
e  20  21  22  23  24

我们可以使用iloc和另一个方法来实现这个结果:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a   0   1   2   3
b   5   6   7   8
c  10  11  12  13

Get_loc()是一个索引方法，意思是“获取标签在这个索引中的位置”。注意，由于使用iloc进行切片不包含它的端点，如果我们也想要行'c'，我们必须在这个值上加上1。

2015-07-23 16:59:47

iloc和loc有什么不同?

推荐文章

最新文章

标签