我有以下索引DataFrame命名列和行不连续的数字:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

我想添加一个新列,'e',到现有的数据帧,并不想改变数据帧中的任何东西(即,新列始终具有与DataFrame相同的长度)。

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

如何将列e添加到上面的例子中?


当前回答

简单明了的:

df.loc[:, 'NewCol'] = 'New_Val'

例子:

df = pd.DataFrame(data=np.random.randn(20, 4), columns=['A', 'B', 'C', 'D'])

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
3  -0.147354  0.778707  0.479145  2.284143
4  -0.529529  0.000571  0.913779  1.395894
5   2.592400  0.637253  1.441096 -0.631468
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
8   0.606985 -2.232903 -1.358107 -2.855494
9  -0.692013  0.671866  1.179466 -1.180351
10 -1.093707 -0.530600  0.182926 -1.296494
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
18  0.693458  0.144327  0.329500 -0.655045
19  0.104425  0.037412  0.450598 -0.923387


df.drop([3, 5, 8, 10, 18], inplace=True)

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
4  -0.529529  0.000571  0.913779  1.395894
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
9  -0.692013  0.671866  1.179466 -1.180351
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
19  0.104425  0.037412  0.450598 -0.923387

df.loc[:, 'NewCol'] = 0

df
           A         B         C         D  NewCol
0  -0.761269  0.477348  1.170614  0.752714       0
1   1.217250 -0.930860 -0.769324 -0.408642       0
2  -0.619679 -1.227659 -0.259135  1.700294       0
4  -0.529529  0.000571  0.913779  1.395894       0
6   0.757178  0.240012 -0.553820  1.177202       0
7  -0.986128 -1.313843  0.788589 -0.707836       0
9  -0.692013  0.671866  1.179466 -1.180351       0
11 -0.143273 -0.503199 -1.328728  0.610552       0
12 -0.923110 -1.365890 -1.366202 -1.185999       0
13 -2.026832  0.273593 -0.440426 -0.627423       0
14 -0.054503 -0.788866 -0.228088 -0.404783       0
15  0.955298 -1.430019  1.434071 -0.088215       0
16 -0.227946  0.047462  0.373573 -0.111675       0
17  1.627912  0.043611  1.743403 -0.012714       0
19  0.104425  0.037412  0.450598 -0.923387       0

其他回答

在最近的Pandas版本中,似乎应该使用df.assign:

df1 = df1.assign(e=np.random)

它不会产生SettingWithCopyWarning。

如果数据帧和Series对象具有相同的索引,则为pandas。Concat也在这里工作:

import pandas as pd
df
#          a            b           c           d
#0  0.671399     0.101208   -0.181532    0.241273
#1  0.446172    -0.243316    0.051767    1.577318
#2  0.614758     0.075793   -0.451460   -0.012493

e = pd.Series([-0.335485, -1.166658, -0.385571])    
e
#0   -0.335485
#1   -1.166658
#2   -0.385571
#dtype: float64

# here we need to give the series object a name which converts to the new  column name 
# in the result
df = pd.concat([df, e.rename("e")], axis=1)
df

#          a            b           c           d           e
#0  0.671399     0.101208   -0.181532    0.241273   -0.335485
#1  0.446172    -0.243316    0.051767    1.577318   -1.166658
#2  0.614758     0.075793   -0.451460   -0.012493   -0.385571

以防它们没有相同的索引:

e.index = df.index
df = pd.concat([df, e.rename("e")], axis=1)

为了完整起见-使用DataFrame.eval()方法的另一个解决方案:

数据:

In [44]: e
Out[44]:
0    1.225506
1   -1.033944
2   -0.498953
3   -0.373332
4    0.615030
5   -0.622436
dtype: float64

In [45]: df1
Out[45]:
          a         b         c         d
0 -0.634222 -0.103264  0.745069  0.801288
4  0.782387 -0.090279  0.757662 -0.602408
5 -0.117456  2.124496  1.057301  0.765466
7  0.767532  0.104304 -0.586850  1.051297
8 -0.103272  0.958334  1.163092  1.182315
9 -0.616254  0.296678 -0.112027  0.679112

解决方案:

In [46]: df1.eval("e = @e.values", inplace=True)

In [47]: df1
Out[47]:
          a         b         c         d         e
0 -0.634222 -0.103264  0.745069  0.801288  1.225506
4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
5 -0.117456  2.124496  1.057301  0.765466 -0.498953
7  0.767532  0.104304 -0.586850  1.051297 -0.373332
8 -0.103272  0.958334  1.163092  1.182315  0.615030
9 -0.616254  0.296678 -0.112027  0.679112 -0.622436

当您将Series对象作为新列添加到现有DF时,您需要确保它们都具有相同的索引。 然后添加到DF中

e_series = pd.Series([-0.335485, -1.166658,-0.385571])
print(e_series)
e_series.index = d_f.index
d_f['e'] = e_series
d_f

在分配新列之前,如果已经索引了数据,则需要对索引进行排序。至少在我的情况下,我必须:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])