轴在熊猫中是什么意思?

下面是我生成一个数据框架的代码:

import pandas as pd
import numpy as np

dff = pd.DataFrame(np.random.randn(1,2),columns=list('AB'))

然后我得到了数据框架:

+------------+---------+--------+
|            |  A      |  B     |
+------------+---------+---------
|      0     | 0.626386| 1.52325|
+------------+---------+--------+

当我输入命令时:

dff.mean(axis=1)

我得到:

0    1.074821
dtype: float64

根据pandas的参考，axis=1代表列，我希望命令的结果是

A    0.626386
B    1.523255
dtype: float64

我的问题是:轴在熊猫中是什么意思?

当前回答

在过去的一个小时里，我也一直在试着求出坐标轴。上述所有答案中的语言，以及文档都没有任何帮助。

要回答我现在理解的问题，在Pandas中，axis = 1或0意味着在应用函数时希望保持哪个轴头不变。

注意:当我说标题时，我指的是索引名

扩展你的例子:

+------------+---------+--------+
|            |  A      |  B     |
+------------+---------+---------
|      X     | 0.626386| 1.52325|
+------------+---------+--------+
|      Y     | 0.626386| 1.52325|
+------------+---------+--------+

对于axis=1=columns:我们保持列标题不变，并通过改变数据应用平均值函数。为了演示，我们保持列标题为常量:

+------------+---------+--------+
|            |  A      |  B     |

现在我们填充A和B值的一个集合，然后找到平均值

|            | 0.626386| 1.52325|

然后我们填充下一组A和B值，并找到平均值

|            | 0.626386| 1.52325|

类似地，对于axis=rows，我们保持行标题不变，并不断更改数据: 为了演示，首先修复行标题:

+------------+
|      X     |
+------------+
|      Y     |
+------------+

现在填充第一组X和Y值，然后求平均值

+------------+---------+
|      X     | 0.626386
+------------+---------+
|      Y     | 0.626386
+------------+---------+

然后填充下一组X和Y值，然后找到平均值:

+------------+---------+
|      X     | 1.52325 |
+------------+---------+
|      Y     | 1.52325 |
+------------+---------+

总之,

当axis=columns时，将修复列标题并更改数据，这些数据将来自不同的行。

当axis=rows时，您将修复行标题并更改数据，这些数据将来自不同的列。

2020-04-07 15:51:32

其他回答

让我们想象一下(你会永远记住)，

熊猫:

轴=0表示沿着“索引”。这是一个行运算。

假设，要对dataframe1和dataframe2执行concat()操作，我们将从dataframe1中取出第一行并放入新的DF中，然后我们从dataframe1中取出另一行并放入新的DF中，我们重复这个过程，直到我们到达dataframe1的底部。然后，我们对dataframe2执行相同的过程。

基本上，将dataframe2堆叠在dataframe1之上，反之亦然。

在桌子或地板上堆一堆书

轴=1表示沿着“列”。这是一个按列的运算。

假设，要对dataframe1和dataframe2执行concat()操作，我们将取出第一个完整的列(a.k.)。第一个系列)的dataframe1，并放置到新的DF，然后我们拿出dataframe1的第二列，并保持相邻的(侧)，我们必须重复这个操作，直到所有列完成。然后，我们在dataframe2上重复相同的过程。基本上, 横向堆叠dataframe2。

把书摆放在书架上。

更重要的是，与矩阵相比，数组更好地表示嵌套的n维结构!所以下面可以帮助你更直观地看到轴是如何在一维以上的情况下发挥重要作用的。此外，你实际上可以打印/写入/绘制/可视化任何n-dim数组，但在矩阵表示(3-dim)中书写或可视化相同的内容在超过3维的纸张上是不可能的。

2018-04-17 18:02:16

这里的许多答案对我帮助很大!

如果你对Python中的axis和R中的MARGIN的不同行为感到困惑(比如在apply函数中)，你可以找到我写的一篇感兴趣的博客文章:https://accio.github.io/programming/2020/05/19/numpy-pandas-axis.html。

从本质上讲:

Their behaviours are, intriguingly, easier to understand with three-dimensional array than with two-dimensional arrays. In Python packages numpy and pandas, the axis parameter in sum actually specifies numpy to calculate the mean of all values that can be fetched in the form of array[0, 0, ..., i, ..., 0] where i iterates through all possible values. The process is repeated with the position of i fixed and the indices of other dimensions vary one after the other (from the most far-right element). The result is a n-1-dimensional array. In R, the MARGINS parameter let the apply function calculate the mean of all values that can be fetched in the form of array[, ... , i, ... ,] where i iterates through all possible values. The process is not repeated when all i values have been iterated. Therefore, the result is a simple vector.

2020-05-19 14:27:06

我认为，正确答案应该是“这很复杂”。

“轴”这个词本身在不同的人心中会产生不同的形象假设y轴，它应该是一个垂直的图像。但是，现在考虑一条垂直线x=0。这也是一条垂直线，但是x轴上的值为0。

类似地，当我们说axis='index'(意思是axis=0)时，我们说的是索引所在的“垂直”方向吗?或者由索引值处理的一系列数据?熊猫往往意味着第一个意思，垂直方向。

熊猫本身也不是100%一致的，看看下面的例子，它们几乎有相同的共同主题:

# [1] piling dfs
pd.concat([df0, df1], axis='index') 
    # adding dfs on top of each other vertically like pilling up a column, 
    # but, we will use the word 'index'

# [2] for every column in df: operate on it
df.apply(foo, axis='index')
df.mean('A', axis='index')
a_boolean_df.all(axis='index')
    # apply an operation to a vertical slice of data, ie. a column, 
    # then apply the same operation to the next column on the right 
    # then to the right again... until the last column
    # but, we will use the word 'index'

# [3] delete a column or row of data
df.drop(axis='index', ...)
df.dropna(axis='index', ...)
    # this time, we are droping an index/row, a horizontal slice of data.
    # so OBVIOUSLY we will use the word 'index'

# [4] drop duplicate
df.drop_duplicates(subset=['mycolumn0', 'mycolumn1']...)
    # thank God we don't need to deal with the "axis" bs in this

2021-10-22 15:14:41

The easiest way for me to understand is to talk about whether you are calculating a statistic for each column (axis = 0) or each row (axis = 1). If you calculate a statistic, say a mean, with axis = 0 you will get that statistic for each column. So if each observation is a row and each variable is in a column, you would get the mean of each variable. If you set axis = 1 then you will calculate your statistic for each row. In our example, you would get the mean for each observation across all of your variables (perhaps you want the average of related measures).

轴= 0:按列=按列=沿行

轴= 1:按行=按行=沿列

2017-02-16 19:34:33

数组被设计为坐标轴=0，行被垂直放置，而坐标轴=1，列被水平放置。Axis指的是数组的尺寸。

2018-03-17 20:19:48

轴在熊猫中是什么意思?

推荐文章

最新文章

标签