Pandas中map、applymap和应用方法的区别

你能告诉我什么时候使用这些矢量化方法和基本的例子吗?

我看到map是一个系列方法，而其余的是DataFrame方法。我对apply和applymap方法感到困惑。为什么我们有两个方法来应用一个函数到一个数据帧?再一次，简单的例子说明用法将是伟大的!

当前回答

我的理解:

从功能上看:

如果函数具有需要在列/行内进行比较的变量，请使用适用。

例如:lambda x: x.max()-x.mean()。

如果将函数应用于每个元素:

1>如果已定位某列/行，使用apply

2>如果应用于整个数据帧，使用applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)

2018-06-08 01:29:33

其他回答

apply和applymap之间的区别可能是最简单的解释:

Apply将整个列作为参数，然后将结果分配给该列

Applymap将单独的单元格值作为参数，并将结果分配回该单元格。

注意:如果apply返回单个值，你将得到这个值而不是赋值后的列，最终将只有一行而不是矩阵。

2016-05-20 02:10:04

我只是想指出一点，我在这方面有过一些挣扎

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

这不会修改数据帧本身，必须重新分配:

df = df.applymap(f)
df.describe()

2015-09-26 01:30:24

apply工作在数据帧的行/列基础上 applymap在DataFrame上按元素工作 map在Series上按元素工作

直接摘自Wes McKinney的Python for Data Analysis一书，第132页(我强烈推荐这本书):

另一个常见操作是将一维数组上的函数应用到每一列或行。DataFrame的apply方法是这样做的:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

许多最常见的数组统计(如sum和mean)是DataFrame方法，所以没有必要使用apply。

也可以使用元素级Python函数。假设您希望从帧中的每个浮点值计算一个格式化字符串。你可以用applymap:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

命名为applymap的原因是Series有一个map方法用于应用一个按元素划分的函数:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

2013-11-05 20:40:33

为了增加上下文和直观感受，这里有一个明确而具体的例子来说明它们的区别。

假设您有如下所示的函数。（这个标签函数，将根据你提供的参数(x)的阈值，将值任意地分为'High'和'Low'。

def label(element, x):
    if element > x:
        return 'High'
    else:
        return 'Low'

在这个例子中，假设我们的数据框架有一列是随机数。

如果你尝试用map映射label函数:

df['ColumnName'].map(label, x = 0.8)

您将得到以下错误:

TypeError: map() got an unexpected keyword argument 'x'

现在使用相同的函数并使用apply，你会看到它是有效的:

df['ColumnName'].apply(label, x=0.8)

Series.apply()可以按元素接受额外的参数，而Series.map()方法将返回一个错误。

现在，如果您试图同时将相同的函数应用到数据帧中的多个列，则使用dataframe .applymap()。

df[['ColumnName','ColumnName2','ColumnName3','ColumnName4']].applymap(label)

最后，您还可以在dataframe上使用apply()方法，但dataframe .apply()方法具有不同的功能。df.apply()方法不是按元素应用函数，而是沿轴(按列或行)应用函数。在创建用于df.apply()的函数时，我们将其设置为接受一个序列，最常见的是一个列。

这里有一个例子:

df.apply(pd.value_counts)

当我们应用pd。函数，它计算了所有列的值计数。

注意，当我们使用df.apply()方法转换多个列时，这一点非常重要。这是可能的，因为pd。Value_counts函数作用于一个序列。如果我们尝试使用df.apply()方法将一个按元素工作的函数应用到多个列，则会得到一个错误:

例如:

def label(element):
    if element > 1:
        return 'High'
    else:
        return 'Low'

df[['ColumnName','ColumnName2','ColumnName3','ColumnName4']].apply(label)

这将导致以下错误:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index Economy')

一般来说，只有当向量化函数不存在时，才应该使用apply()方法。回忆一下pandas使用向量化，即一次将操作应用到整个系列的过程，以优化性能。当我们使用apply()方法时，我们实际上是在遍历行，因此向量化方法可以比apply()方法更快地执行等效任务。

下面是一些已经存在的向量化函数的例子，你不想使用任何类型的apply/map方法重新创建它们:

Series.str.split() Splits each element in the Series Series.str.strip() Strips whitespace from each string in the Series. Series.str.lower() Converts strings in the Series to lowercase. Series.str.upper() Converts strings in the Series to uppercase. Series.str.get() Retrieves the ith element of each element in the Series. Series.str.replace() Replaces a regex or string in the Series with another string Series.str.cat() Concatenates strings in a Series. Series.str.extract() Extracts substrings from the Series matching a regex pattern.

2020-12-23 19:26:00

基于cs95的答案

map在Series ONLY上定义 applymap只在DataFrames上定义 apply定义在BOTH上

举几个例子

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

2020-05-05 03:56:59

Pandas中map、applymap和应用方法的区别

推荐文章

最新文章

标签