我相信尼斯的

np。einsum工作吗?

给定数组A和B，它们的矩阵乘法和转置是使用(A @ B).T计算的，或者等效地，使用:

np.einsum("ij, jk -> ki", A, B)

当前回答

(注:这个答案是基于我不久前写的一篇关于einsum的简短博客文章。)

einsum是做什么的?

假设我们有两个多维数组A和b。现在让我们假设我们想要…

以特定的方式将A与B相乘，以产生新的产品数组;然后也许这个新数组沿着特定的轴求和;然后也许将新数组的坐标轴按特定顺序转置。

比起NumPy函数的组合，比如乘法、求和和转置，einsum很有可能帮助我们更快、更高效地完成这项工作。

einsum是如何工作的?

这里有一个简单(但并非完全无关紧要)的例子。取以下两个数组:

A = np.array([0, 1, 2])

B = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])

我们将A和B元素相乘，然后沿着新数组的行求和。在“normal”NumPy中，我们可以这样写:

>>> (A[:, np.newaxis] * B).sum(axis=1)
array([ 0, 22, 76])

在这里，A上的索引操作将两个数组的第一个轴对齐，这样乘法运算就可以传播。然后将产品数组的行相加以返回答案。

现在如果我们想用einsum代替，我们可以这样写:

>>> np.einsum('i,ij->i', A, B)
array([ 0, 22, 76])

签名字符串'i,ij->i'是这里的关键，需要一点解释。你可以把它想成两半。在左边(->的左边)，我们标记了两个输入数组。在->的右边，我们已经标记了我们最终想要的数组。

接下来的事情是这样的:

A has one axis; we've labelled it i. And B has two axes; we've labelled axis 0 as i and axis 1 as j. By repeating the label i in both input arrays, we are telling einsum that these two axes should be multiplied together. In other words, we're multiplying array A with each column of array B, just like A[:, np.newaxis] * B does. Notice that j does not appear as a label in our desired output; we've just used i (we want to end up with a 1D array). By omitting the label, we're telling einsum to sum along this axis. In other words, we're summing the rows of the products, just like .sum(axis=1) does.

这基本上就是使用einsum所需要知道的全部内容。玩一会儿会有帮助;如果我们在输出中保留两个标签，'i,ij->ij'，我们会得到一个2D的乘积数组(与a [:， np。如果我们说没有输出标签，'i,ij->，我们将返回一个单一的数字(与做(a [:， np。* B).sum())。

然而，einsum的伟大之处在于，它并不首先构建一个临时的产品数组;它只是把反应过程中的产物加起来。这可以大大节省内存使用。

一个更大的例子

为了解释点积，这里有两个新数组:

A = array([[1, 1, 1],
           [2, 2, 2],
           [5, 5, 5]])

B = array([[0, 1, 0],
           [1, 1, 0],
           [1, 1, 1]])

我们将用np来计算点积。einsum('ij,jk->ik'， A, B).下面的图片显示了A和B的标记以及我们从函数中得到的输出数组:

你可以看到标签j是重复的-这意味着我们将A的行与b的列相乘。此外，标签j不包括在输出中-我们将这些乘积相加。标签i和k是为输出保留的，所以我们得到了一个2D数组。

将这个结果与标签j没有求和的数组进行比较可能会更清楚。下面，在左边你可以看到写入np后的3D数组。einsum('ij,jk->ijk'， A, B)(即我们保留了标签j):

求和轴j给出了期望的点积，如右侧所示。

一些练习

为了更好地理解einsum，可以使用下标表示法实现熟悉的NumPy数组操作。任何涉及乘轴和轴的组合都可以使用einsum来编写。

设A和B是两个长度相同的一维数组。例如，A = np.arange(10)和B = np.arange(10)。(5、15)不等。

A的和可以写成: np。einsum(“我- - - >”) 元素的乘法，A * B，可以写成: np。einsum('i,i->i'， A, B) 内积或点积，np。内部的(A, B)或np。点(A, B)，可以写成: np。einsum('i,i->'， A, B) #或直接使用'i,i' 外积，np。外层(A, B)，可以写成: np。einsum('i,j->ij'， A, B)

对于2D数组C和D，假设轴的长度是兼容的(两者长度相同或其中一个长度为1)，这里有一些例子:

C的迹(主对角线的和)np.trace(C)可以写成: np。einsum(“二世”,C) C与D的转置的逐元素乘法，C * D.T，可以写成: np。einsum('ij,ji->ij'， C, D) C中的每个元素乘以数组D(得到一个4D数组)，C[:，:， None, None] * D，可以写成: np。einsum('ij,kl->ijkl'， C, D)

2015-11-10 23:10:37

其他回答

熟悉了爱因斯坦求和(einsum)中的虚索引(公共索引或重复索引)和沿虚索引求和，输出->整形就很容易了。因此，重点关注:

虚索引，np中的公共索引j。Einsum ("ij,jk->ki"， a, b) 沿虚指标j求和

哑指标

对于einsum("…"，a, b)，无论是否有公共索引，在矩阵a和b之间总是发生逐元素乘法。我们可以有einsum('xy,wz'， a, b)它在下标'xy,wz'中没有公共下标。

如果有一个共同的指标，如“ij,jk->ki”中的j，那么它在爱因斯坦求和中被称为哑指标。

爱因斯坦总结

被求和的索引是求和索引，在这里是i。它也被称为虚拟索引，因为任何符号都可以替换“i”而不改变表达式的含义，只要它不与同一项中的索引符号冲突。

沿虚指标求和

np。图中绿色矩形的Einsum ("ij,j"， a, b)，j为虚拟索引。逐元素的乘法a[i][j] * b[j]沿j轴求和为Σ (a[i][j] * b[j])。

它是np的点积。对于每个i, inner(a[i]， b)是特定的np.inner()而避免np。因为它不是严格意义上的点积运算。

爱因斯坦求和惯例:简介

虚拟索引可以出现在任何地方，只要规则满足(详情请参阅youtube)。

对于np中的虚索引i。Einsum (" ik,il"， a, b)，它是矩阵a和b的行索引，因此提取a的列和b的列来生成点积。

输出形式

由于求和是沿着假指标进行的，所以假指标在结果矩阵中消失，因此i从“ik,il”中去掉，形成形状(k,l)。我们可以告诉np.einsum(“…-> <shape>")通过带有->标识符的输出下标标签指定输出表单。

参见numpy中的显式模式。详情请参阅Einsum。

In explicit mode the output can be directly controlled by specifying output subscript labels. This requires the identifier ‘->’ as well as the list of output subscript labels. This feature increases the flexibility of the function since summing can be disabled or forced when required. The call np.einsum('i->', a) is like np.sum(a, axis=-1), and np.einsum('ii->i', a) is like np.diag(a). The difference is that einsum does not allow broadcasting by default. Additionally np.einsum('ij,jh->ih', a, b) directly specifies the order of the output subscript labels and therefore returns matrix multiplication, unlike the example above in implicit mode.

没有虚索引

一个在einsum中没有虚索引的例子。

一个术语(下标索引，例如:"ij")选择每个数组中的一个元素。左边的每个元素都应用到右边的元素上，用于按元素进行乘法(因此总是会发生乘法)。

A具有形状(2,3)，其中每个元素都应用于形状(2,2)的b。因此，它创建了一个形状为(2,3,2,2)而没有求和的矩阵，因为(i,j)， (k.l)都是自由指标。

# --------------------------------------------------------------------------------
# For np.einsum("ij,kl", a, b)
# 1-1: Term "ij" or (i,j), two free indices, selects selects an element a[i][j].
# 1-2: Term "kl" or (k,l), two free indices, selects selects an element b[k][l].
# 2:   Each a[i][j] is applied on b[k][l] for element-wise multiplication a[i][j] * b[k,l]
# --------------------------------------------------------------------------------
# for (i,j) in a:
#    for(k,l) in b:
#        a[i][j] * b[k][l]
np.einsum("ij,kl", a, b)

array([[[[ 0,  0],
         [ 0,  0]],

        [[10, 11],
         [12, 13]],

        [[20, 22],
         [24, 26]]],


       [[[30, 33],
         [36, 39]],

        [[40, 44],
         [48, 52]],

        [[50, 55],
         [60, 65]]]])

例子

矩阵A行和矩阵B列的点积

A = np.matrix('0 1 2; 3 4 5')
B = np.matrix('0 -3; -1 -4; -2 -5');
np.einsum('ij,ji->i', A, B)

# Same with
np.diagonal(np.matmul(A,B))
(A*B).diagonal()
---
[ -5 -50]
[ -5 -50]
[[ -5 -50]]

2021-02-15 10:39:48

我发现NumPy: The tricks of The trade (Part II)很有教育意义

我们使用->表示输出数组的顺序。把'ij, i->j'看成左手边(LHS)和右手边(RHS)。LHS上的任何重复标签都将按乘积元素计算，然后求和。通过改变RHS(输出)端上的标签，我们可以定义我们想要相对于输入数组进行处理的轴，即沿着轴0,1和等等进行求和。

import numpy as np

>>> a
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])
>>> b
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> d = np.einsum('ij, jk->ki', a, b)

注意这里有三个轴，i, j, k, j是重复的(在左边)。I j表示a的行和列j k表示b。

为了计算乘积并对齐j轴，我们需要向a添加一个轴(b将沿第一个轴广播)。

a[i, j, k]
   b[j, k]

>>> c = a[:,:,np.newaxis] * b
>>> c
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 0,  2,  4],
        [ 6,  8, 10],
        [12, 14, 16]],

       [[ 0,  3,  6],
        [ 9, 12, 15],
        [18, 21, 24]]])

右边没有J所以我们对J求和J是3x3x3数组的第二个轴

>>> c = c.sum(1)
>>> c
array([[ 9, 12, 15],
       [18, 24, 30],
       [27, 36, 45]])

最后，右边的下标(按字母顺序)是颠倒的，所以我们转置。

>>> c.T
array([[ 9, 18, 27],
       [12, 24, 36],
       [15, 30, 45]])

>>> np.einsum('ij, jk->ki', a, b)
array([[ 9, 18, 27],
       [12, 24, 36],
       [15, 30, 45]])
>>>

2014-09-29 12:17:27

让我们创建两个数组，它们具有不同但兼容的尺寸，以突出它们的相互作用

In [43]: A=np.arange(6).reshape(2,3)
Out[43]: 
array([[0, 1, 2],
       [3, 4, 5]])


In [44]: B=np.arange(12).reshape(3,4)
Out[44]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

你的计算，取(2,3)与(3,4)的乘积的“点”(和)来生成(4,2)数组。i是A的第一个，C的最后一个;k B的最后一个，c的第一个j被求和'消耗'。

In [45]: C=np.einsum('ij,jk->ki',A,B)
Out[45]: 
array([[20, 56],
       [23, 68],
       [26, 80],
       [29, 92]])

这和np。dot(A,B)是一样的。T -它是转置后的最终输出。

要了解更多关于j的内容，请将C下标更改为ijk:

In [46]: np.einsum('ij,jk->ijk',A,B)
Out[46]: 
array([[[ 0,  0,  0,  0],
        [ 4,  5,  6,  7],
        [16, 18, 20, 22]],

       [[ 0,  3,  6,  9],
        [16, 20, 24, 28],
        [40, 45, 50, 55]]])

这也可以用:

A[:,:,None]*B[None,:,:]

也就是说，在a的末尾增加一个k维，在B的前面增加一个i维，得到一个(2,3,4)数组。

0 + 4 + 16 = 20,9 + 28 + 55 = 92，等等;对j和转置求和得到前面的结果:

np.sum(A[:,:,None] * B[None,:,:], axis=1).T

# C[k,i] = sum(j) A[i,j (,k) ] * B[(i,)  j,k]

2014-09-29 22:06:37

我认为最简单的例子是在tensorflow文档中

将方程转换为einsum符号有四个步骤。我们以这个方程为例C[i,k] = sum_j A[i,j] * B[j,k]

首先，我们删除变量名。得到ik = sum_j ij * jk 我们删除sum_j项，因为它是隐式的。得到ik = ij * jk 我们用，替换*。我们得到ik = ij jk 输出在RHS上，用->符号分隔。得到ij jk - >ik

einsum解释器只是反向运行这4个步骤。对结果中缺失的所有指标求和。

下面是文档中的更多示例

# Matrix multiplication
einsum('ij,jk->ik', m0, m1)  # output[i,k] = sum_j m0[i,j] * m1[j, k]

# Dot product
einsum('i,i->', u, v)  # output = sum_i u[i]*v[i]

# Outer product
einsum('i,j->ij', u, v)  # output[i,j] = u[i]*v[j]

# Transpose
einsum('ij->ji', m)  # output[j,i] = m[i,j]

# Trace
einsum('ii', m)  # output[j,i] = trace(m) = sum_i m[i, i]

# Batch matrix multiplication
einsum('aij,ajk->aik', s, t)  # out[a,i,k] = sum_j s[a,i,j] * t[a, j, k]

2020-07-08 04:50:08

在阅读einsum方程时，我发现能够这样做最有帮助在心里把它们归结为它们的命令版本。

让我们从下面的陈述开始:

C = np.einsum('bhwi,bhwj->bij', A, B)

首先看标点符号，我们看到箭头前面有两个4个逗号分隔的斑点——bhwi和bhwj，后面还有一个3个字母的bij。因此，该方程由两个秩4张量输入产生一个秩3张量结果。

现在，让每个blob中的每个字母都是一个范围变量的名称。字母出现在圆点中的位置是它在这个张量中所覆盖的轴的下标。因此，生成C的每个元素的命令式求和必须从三个嵌套的for循环开始，每个for循环对应C的每个索引。

for b in range(...):
    for i in range(...):
        for j in range(...):
            # the variables b, i and j index C in the order of their appearance in the equation
            C[b, i, j] = ...

所以，本质上，对于c的每个输出索引都有一个for循环，我们暂时不确定范围。

接下来我们看左边，有没有范围变量没有出现在右边?在我们的例子中，是的，h和w。为每个这样的变量添加一个内部嵌套的for循环:

for b in range(...):
    for i in range(...):
        for j in range(...):
            C[b, i, j] = 0
            for h in range(...):
                for w in range(...):
                    ...

在最里面的循环中，我们现在已经定义了所有的指标，所以我们可以写出实际的和翻译完成:

# three nested for-loops that index the elements of C
for b in range(...):
    for i in range(...):
        for j in range(...):

            # prepare to sum
            C[b, i, j] = 0

            # two nested for-loops for the two indexes that don't appear on the right-hand side
            for h in range(...):
                for w in range(...):
                    # Sum! Compare the statement below with the original einsum formula
                    # 'bhwi,bhwj->bij'

                    C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]

如果到目前为止您已经能够遵循代码，那么恭喜您!这就是阅读einsum方程所需要的。请特别注意原始einsum公式如何映射到上面代码片段中的最终求和语句。for循环和范围界限只是无用的，最后的语句是你真正需要理解发生了什么。

为了完整起见，让我们看看如何确定每个范围变量的范围。好吧，每个变量的范围就是它索引的维度的长度。显然，如果一个变量在一个或多个张量中表示多个维度，那么每个维度的长度必须相等。下面是上面的完整范围的代码:

# C's shape is determined by the shapes of the inputs
# b indexes both A and B, so its range can come from either A.shape or B.shape
# i indexes only A, so its range can only come from A.shape, the same is true for j and B
assert A.shape[0] == B.shape[0]
assert A.shape[1] == B.shape[1]
assert A.shape[2] == B.shape[2]
C = np.zeros((A.shape[0], A.shape[3], B.shape[3]))
for b in range(A.shape[0]): # b indexes both A and B, or B.shape[0], which must be the same
    for i in range(A.shape[3]):
        for j in range(B.shape[3]):
            # h and w can come from either A or B
            for h in range(A.shape[1]):
                for w in range(A.shape[2]):
                    C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]

2020-01-22 11:35:30

我相信尼斯的

推荐文章

最新文章

标签