我有以下索引DataFrame命名列和行不连续的数字:
a b c d
2 0.671399 0.101208 -0.181532 0.241273
3 0.446172 -0.243316 0.051767 1.577318
5 0.614758 0.075793 -0.451460 -0.012493
我想添加一个新列,'e',到现有的数据帧,并不想改变数据帧中的任何东西(即,新列始终具有与DataFrame相同的长度)。
0 -0.335485
1 -1.166658
2 -0.385571
dtype: float64
如何将列e添加到上面的例子中?
如果数据帧和Series对象具有相同的索引,则为pandas。Concat也在这里工作:
import pandas as pd
df
# a b c d
#0 0.671399 0.101208 -0.181532 0.241273
#1 0.446172 -0.243316 0.051767 1.577318
#2 0.614758 0.075793 -0.451460 -0.012493
e = pd.Series([-0.335485, -1.166658, -0.385571])
e
#0 -0.335485
#1 -1.166658
#2 -0.385571
#dtype: float64
# here we need to give the series object a name which converts to the new column name
# in the result
df = pd.concat([df, e.rename("e")], axis=1)
df
# a b c d e
#0 0.671399 0.101208 -0.181532 0.241273 -0.335485
#1 0.446172 -0.243316 0.051767 1.577318 -1.166658
#2 0.614758 0.075793 -0.451460 -0.012493 -0.385571
以防它们没有相同的索引:
e.index = df.index
df = pd.concat([df, e.rename("e")], axis=1)
但有一点需要注意,如果你这样做了
df1['e'] = Series(np.random.randn(sLength), index=df1.index)
这实际上是df1.index上的左连接。因此,如果您希望具有外部连接效果,我的解决方案可能并不完美,即创建一个包含所有数据的索引值的数据框架,然后使用上面的代码。例如,
data = pd.DataFrame(index=all_possible_values)
df1['e'] = Series(np.random.randn(sLength), index=df1.index)
这是向pandas数据框架添加新列的特殊情况。在这里,我基于数据框架的现有列数据添加了一个新特性/列。
因此,让我们的dataFrame有列'feature_1', 'feature_2', 'probability_score',我们必须根据'probability_score'列中的数据添加一个new_column 'predicted_class'。
我将使用来自python的map()函数,并定义一个我自己的函数,该函数将实现如何给dataFrame中的每一行一个特定的class_label的逻辑。
data = pd.read_csv('data.csv')
def myFunction(x):
//implement your logic here
if so and so:
return a
return b
variable_1 = data['probability_score']
predicted_class = variable_1.map(myFunction)
data['predicted_class'] = predicted_class
// check dataFrame, new column is included based on an existing column data for each row
data.head()
我得到了可怕的SettingWithCopyWarning,它没有通过使用iloc语法修复。我的DataFrame是由read_sql从ODBC源创建的。根据上面low - tech的建议,以下方法对我来说是有效的:
df.insert(len(df.columns), 'e', pd.Series(np.random.randn(sLength), index=df.index))
This worked fine to insert the column at the end. I don't know if it is the most efficient, but I don't like warning messages. I think there is a better solution, but I can't find it, and I think it depends on some aspect of the index.
Note. That this only works once and will give an error message if trying to overwrite and existing column.
Note As above and from 0.16.0 assign is the best solution. See documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign
Works well for data flow type where you don't overwrite your intermediate values.