我有以下索引DataFrame命名列和行不连续的数字:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

我想添加一个新列,'e',到现有的数据帧,并不想改变数据帧中的任何东西(即,新列始终具有与DataFrame相同的长度)。

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

如何将列e添加到上面的例子中?


当前回答

这是添加新列的简单方法:df['e'] = e

其他回答

我得到了可怕的SettingWithCopyWarning,它没有通过使用iloc语法修复。我的DataFrame是由read_sql从ODBC源创建的。根据上面low - tech的建议,以下方法对我来说是有效的:

df.insert(len(df.columns), 'e', pd.Series(np.random.randn(sLength),  index=df.index))

This worked fine to insert the column at the end. I don't know if it is the most efficient, but I don't like warning messages. I think there is a better solution, but I can't find it, and I think it depends on some aspect of the index. Note. That this only works once and will give an error message if trying to overwrite and existing column. Note As above and from 0.16.0 assign is the best solution. See documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign Works well for data flow type where you don't overwrite your intermediate values.

x=pd.DataFrame([1,2,3,4,5])

y=pd.DataFrame([5,4,3,2,1])

z=pd.concat([x,y],axis=1)

在分配新列之前,如果已经索引了数据,则需要对索引进行排序。至少在我的情况下,我必须:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])

我正在寻找一种添加numpy列的通用方法。nans到一个数据帧而不得到愚蠢的SettingWithCopyWarning。

从以下方面:

答案在这里 关于将变量作为关键字参数传递的问题 此方法用于生成一个numpy数组的NaNs

我想到了这个:

col = 'column_name'
df = df.assign(**{col:numpy.full(len(df), numpy.nan)})

向现有数据帧中添加一个新列'e'

 df1.loc[:,'e'] = Series(np.random.randn(sLength))