我需要从sklearn.ensemble中适合RandomForestRegressor。

forest = ensemble.RandomForestRegressor(**RF_tuned_parameters)
model = forest.fit(train_fold, train_y)
yhat = model.predict(test_fold)

这段代码一直工作,直到我对数据(train_y)进行了一些预处理。 错误信息如下:

DataConversionWarning:当期望一个1d数组时,传递一个列向量y。请将y的形状更改为(n_samples,),例如使用ravel()。 模型=森林。fit (train_fold train_y)

之前train_y是一个Series,现在它是numpy array(它是一个列向量)。如果我应用train_y.ravel(),那么它就变成了一个行向量,并且没有出现错误消息,因为预测步骤需要很长时间(实际上它永远不会结束……)

在RandomForestRegressor的文档中,我发现train_y应该定义为y: array-like, shape = [n_samples]或[n_samples, n_outputs] 你知道怎么解决这个问题吗?


更改这一行:

model = forest.fit(train_fold, train_y)

to:

model = forest.fit(train_fold, train_y.values.ravel())

解释:

.values将给出numpy数组中的值(shape: (n,1))

.ravel将数组形状转换为(n,)(即将其压平)


使用以下代码:

model = forest.fit(train_fold, train_y.ravel())

如果你仍然被错误的巴掌一样,如下?

Unknown label type: %r" % y

使用以下代码:

y = train_y.ravel()
train_y = np.array(y).astype(int)
model = forest.fit(train_fold, train_y)

我在尝试训练KNN分类器时也遇到过这种情况。但似乎在我改变后,警告就消失了: knn.fit (X_train y_train) 来 资讯。适合(X_train np.ravel (y_train、订单= ' C '))

在这一行之前,我使用import numpy作为np。


另一种方法是用拉威尔

model = forest.fit(train_fold, train_y.values.reshape(-1,))

我也有同样的问题。问题是标签是列格式的,而它期望它是行格式的。 使用np.ravel ()

knn.score(training_set, np.ravel(training_labels))

希望这能解决问题。


使用neuraxle,您可以轻松解决这个问题:

p = Pipeline([
   # expected outputs shape: (n, 1)
   OutputTransformerWrapper(NumpyRavel()), 
   # expected outputs shape: (n, )
   RandomForestRegressor(**RF_tuned_parameters)
])

p, outputs = p.fit_transform(data_inputs, expected_outputs)

Neuraxle是一个类似sklearn的框架,用于深度学习项目中的超参数调优和AutoML !


format_train_y=[]
for n in train_y:
    format_train_y.append(n[0])

Y = y.values[:,0]

Y - formated_train_y

y - train_y

博士TL; 使用

y = np.squeeze(y)

而不是

y = y.ravel()

As Python's ravel() may be a valid way to achieve the desired results in this particular case, I would, however, recommend using numpy.squeeze(). The problem here is, that if the shape of your y (numpy array) is e.g. (100, 2), then y.ravel() will concatenate the two variables on the second axis along the first axis, resulting in a shape like (200,). This might not be what you want when dealing with independent variables that have to be regarded on their own. On the other hand, numpy.squeeze() will just trim any redundant dimensions (i.e. which are of size 1). So, if your numpy array's shape is (100, 1), this will result in an array of shape (100,), whereas the result for a numpy array of shape (100, 2) will not change, as none of the dimensions have size 1.