如何在Pandas中迭代DataFrame中的行

我有一个熊猫数据帧，df：

如何迭代此数据帧的行？对于每一行，我希望能够通过列的名称访问其元素（单元格中的值）。例如：

for row in df.rows:
    print(row['c1'], row['c2'])

我发现了一个类似的问题，建议使用以下任一项：

for date, row in df.T.iteritems():

for row in df.iterrows():

但我不知道row对象是什么，以及如何使用它。

当前回答

我正在寻找如何迭代行和列，并在这里结束：

for i, row in df.iterrows():
    for j, column in row.iteritems():
        print(column)

2018-01-17 09:41:29

其他回答

如何高效迭代

如果您真的需要迭代Pandas数据帧，您可能希望避免使用iterrows（）。有不同的方法，通常的iterrows（）远远不是最好的。itertples（）可以快100倍。

简而言之：

作为一般规则，使用df.itertuples（name=None）。特别是当列数固定且少于255列时。见第（3）点否则，请使用df.itertuples（），除非您的列包含空格或“-”等特殊字符。见第（2）点使用上一个示例，即使数据帧中有奇怪的列，也可以使用itertples（）。见第（4）点如果无法使用前面的解决方案，请仅使用iterrows（）。见第（1）点

对Pandas数据帧中的行进行迭代的不同方法：

生成具有百万行和4列的随机数据帧：

    df = pd.DataFrame(np.random.randint(0, 100, size=(1000000, 4)), columns=list('ABCD'))
    print(df)

1）通常的iterrows（）很方便，但速度很慢：

start_time = time.clock()
result = 0
for _, row in df.iterrows():
    result += max(row['B'], row['C'])

total_elapsed_time = round(time.clock() - start_time, 2)
print("1. Iterrows done in {} seconds, result = {}".format(total_elapsed_time, result))

2）默认的itertples（）已经快得多，但它不适用于列名称，例如My Col Name is very Strange（我的列名称非常奇怪）（如果列重复或列名称不能简单地转换为Python变量名称，则应避免使用此方法）

start_time = time.clock()
result = 0
for row in df.itertuples(index=False):
    result += max(row.B, row.C)

total_elapsed_time = round(time.clock() - start_time, 2)
print("2. Named Itertuples done in {} seconds, result = {}".format(total_elapsed_time, result))

3）使用name=None的默认itertples（）甚至更快，但并不方便，因为您必须为每列定义一个变量。

start_time = time.clock()
result = 0
for(_, col1, col2, col3, col4) in df.itertuples(name=None):
    result += max(col2, col3)

total_elapsed_time = round(time.clock() - start_time, 2)
print("3. Itertuples done in {} seconds, result = {}".format(total_elapsed_time, result))

4）最后，命名的itertples（）比上一点慢，但您不必为每列定义变量，它可以处理列名称，例如My Col Name is very Strange。

start_time = time.clock()
result = 0
for row in df.itertuples(index=False):
    result += max(row[df.columns.get_loc('B')], row[df.columns.get_loc('C')])

total_elapsed_time = round(time.clock() - start_time, 2)
print("4. Polyvalent Itertuples working even with special characters in the column name done in {} seconds, result = {}".format(total_elapsed_time, result))

输出：

         A   B   C   D
0       41  63  42  23
1       54   9  24  65
2       15  34  10   9
3       39  94  82  97
4        4  88  79  54
...     ..  ..  ..  ..
999995  48  27   4  25
999996  16  51  34  28
999997   1  39  61  14
999998  66  51  27  70
999999  51  53  47  99

[1000000 rows x 4 columns]

1. Iterrows done in 104.96 seconds, result = 66151519
2. Named Itertuples done in 1.26 seconds, result = 66151519
3. Itertuples done in 0.94 seconds, result = 66151519
4. Polyvalent Itertuples working even with special characters in the column name done in 2.94 seconds, result = 66151519

本文是iterrows和itertules之间的一个非常有趣的比较

2019-12-19 16:02:14

有时，有用的模式是：

# Borrowing @KutalmisB df example
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])
# The to_dict call results in a list of dicts
# where each row_dict is a dictionary with k:v pairs of columns:value for that row
for row_dict in df.to_dict(orient='records'):
    print(row_dict)

结果是：

{'col1':1.0, 'col2':0.1}
{'col1':2.0, 'col2':0.2}

2018-06-27 18:48:28

您应该使用df.iterrows（）。虽然逐行迭代不是特别有效，因为必须创建Series对象。

2012-05-24 14:24:52

可能是最优雅的解决方案（但肯定不是最有效的）：

for row in df.values:
    c2 = row[1]
    print(row)
    # ...

for c1, c2 in df.values:
    # ...

注意：

文档明确建议改用.to_numpy（）在最坏的情况下，生成的NumPy数组将具有适合所有列的dtype对象首先有充分的理由不使用循环

尽管如此，我认为这个选项应该包含在这里，作为一个（人们应该认为）微不足道的问题的直接解决方案。

2021-07-28 14:47:17

DataFrame.iterrows是一个生成索引和行（作为一个系列）的生成器：

import pandas as pd

df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index()  # make sure indexes pair with number of rows

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

10 100
11 110
12 120

2013-05-10 07:07:58

如何在Pandas中迭代DataFrame中的行

推荐文章

最新文章

标签