np。einsum工作吗?
给定数组A和B,它们的矩阵乘法和转置是使用(A @ B).T计算的,或者等效地,使用:
np.einsum("ij, jk -> ki", A, B)
np。einsum工作吗?
给定数组A和B,它们的矩阵乘法和转置是使用(A @ B).T计算的,或者等效地,使用:
np.einsum("ij, jk -> ki", A, B)
当前回答
熟悉了爱因斯坦求和(einsum)中的虚索引(公共索引或重复索引)和沿虚索引求和,输出->整形就很容易了。因此,重点关注:
虚索引,np中的公共索引j。Einsum ("ij,jk->ki", a, b) 沿虚指标j求和
哑指标
对于einsum("…",a, b),无论是否有公共索引,在矩阵a和b之间总是发生逐元素乘法。我们可以有einsum('xy,wz', a, b)它在下标'xy,wz'中没有公共下标。
如果有一个共同的指标,如“ij,jk->ki”中的j,那么它在爱因斯坦求和中被称为哑指标。
爱因斯坦总结
被求和的索引是求和索引,在这里是i。它也被称为虚拟索引,因为任何符号都可以替换“i”而不改变表达式的含义,只要它不与同一项中的索引符号冲突。
沿虚指标求和
np。图中绿色矩形的Einsum ("ij,j", a, b),j为虚拟索引。逐元素的乘法a[i][j] * b[j]沿j轴求和为Σ (a[i][j] * b[j])。
它是np的点积。对于每个i, inner(a[i], b)是特定的np.inner()而避免np。因为它不是严格意义上的点积运算。
爱因斯坦求和惯例:简介
虚拟索引可以出现在任何地方,只要规则满足(详情请参阅youtube)。
对于np中的虚索引i。Einsum (" ik,il", a, b),它是矩阵a和b的行索引,因此提取a的列和b的列来生成点积。
输出形式
由于求和是沿着假指标进行的,所以假指标在结果矩阵中消失,因此i从“ik,il”中去掉,形成形状(k,l)。我们可以告诉np.einsum(“…-> <shape>")通过带有->标识符的输出下标标签指定输出表单。
参见numpy中的显式模式。详情请参阅Einsum。
In explicit mode the output can be directly controlled by specifying output subscript labels. This requires the identifier ‘->’ as well as the list of output subscript labels. This feature increases the flexibility of the function since summing can be disabled or forced when required. The call np.einsum('i->', a) is like np.sum(a, axis=-1), and np.einsum('ii->i', a) is like np.diag(a). The difference is that einsum does not allow broadcasting by default. Additionally np.einsum('ij,jh->ih', a, b) directly specifies the order of the output subscript labels and therefore returns matrix multiplication, unlike the example above in implicit mode.
没有虚索引
一个在einsum中没有虚索引的例子。
一个术语(下标索引,例如:"ij")选择每个数组中的一个元素。 左边的每个元素都应用到右边的元素上,用于按元素进行乘法(因此总是会发生乘法)。
A具有形状(2,3),其中每个元素都应用于形状(2,2)的b。因此,它创建了一个形状为(2,3,2,2)而没有求和的矩阵,因为(i,j), (k.l)都是自由指标。
# --------------------------------------------------------------------------------
# For np.einsum("ij,kl", a, b)
# 1-1: Term "ij" or (i,j), two free indices, selects selects an element a[i][j].
# 1-2: Term "kl" or (k,l), two free indices, selects selects an element b[k][l].
# 2: Each a[i][j] is applied on b[k][l] for element-wise multiplication a[i][j] * b[k,l]
# --------------------------------------------------------------------------------
# for (i,j) in a:
# for(k,l) in b:
# a[i][j] * b[k][l]
np.einsum("ij,kl", a, b)
array([[[[ 0, 0],
[ 0, 0]],
[[10, 11],
[12, 13]],
[[20, 22],
[24, 26]]],
[[[30, 33],
[36, 39]],
[[40, 44],
[48, 52]],
[[50, 55],
[60, 65]]]])
例子
矩阵A行和矩阵B列的点积
A = np.matrix('0 1 2; 3 4 5')
B = np.matrix('0 -3; -1 -4; -2 -5');
np.einsum('ij,ji->i', A, B)
# Same with
np.diagonal(np.matmul(A,B))
(A*B).diagonal()
---
[ -5 -50]
[ -5 -50]
[[ -5 -50]]
其他回答
如果您直观地理解numpy.einsum(),那么掌握它的思想是非常容易的。作为一个例子,让我们从涉及矩阵乘法的简单描述开始。
要使用numpy.einsum(),只需将所谓的下标字符串作为参数传递,后面跟着输入数组。
假设你有两个二维数组,A和B,你想做矩阵乘法。所以,你会:
np.einsum("ij, jk -> ik", A, B)
Here the subscript string ij corresponds to array A while the subscript string jk corresponds to array B. Also, the most important thing to note here is that the number of characters in each subscript string must match the dimensions of the array (i.e., two chars for 2D arrays, three chars for 3D arrays, and so on). And if you repeat the chars between subscript strings (j in our case), then that means you want the einsum to happen along those dimensions. Thus, they will be sum-reduced (i.e., that dimension will be gone).
The subscript string after this -> symbol represent the dimensions of our resultant array. If you leave it empty, then everything will be summed and a scalar value is returned as the result. Else the resultant array will have dimensions according to the subscript string. In our example, it'll be ik. This is intuitive because we know that for the matrix multiplication to work, the number of columns in array A has to match the number of rows in array B which is what is happening here (i.e., we encode this knowledge by repeating the char j in the subscript string)
下面是一些例子,简洁地说明了np.einsum()在实现一些常见的张量或nd-array操作时的使用/功能。
输入
# a vector
In [197]: vec
Out[197]: array([0, 1, 2, 3])
# an array
In [198]: A
Out[198]:
array([[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44]])
# another array
In [199]: B
Out[199]:
array([[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
1)矩阵乘法(类似np。matmul (arr1 arr2))
In [200]: np.einsum("ij, jk -> ik", A, B)
Out[200]:
array([[130, 130, 130, 130],
[230, 230, 230, 230],
[330, 330, 330, 330],
[430, 430, 430, 430]])
2)沿主对角线提取元素(类似于np.diag(arr))
In [202]: np.einsum("ii -> i", A)
Out[202]: array([11, 22, 33, 44])
3) Hadamard积(即两个数组的元素积)(类似于arr1 * arr2)
In [203]: np.einsum("ij, ij -> ij", A, B)
Out[203]:
array([[ 11, 12, 13, 14],
[ 42, 44, 46, 48],
[ 93, 96, 99, 102],
[164, 168, 172, 176]])
4)元素平方(类似于np.square(arr)或arr ** 2)
In [210]: np.einsum("ij, ij -> ij", B, B)
Out[210]:
array([[ 1, 1, 1, 1],
[ 4, 4, 4, 4],
[ 9, 9, 9, 9],
[16, 16, 16, 16]])
5) Trace(即主对角元素的和)(类似于np.trace(arr))
In [217]: np.einsum("ii -> ", A)
Out[217]: 110
6)矩阵转置(类似于np.转置(arr))
In [221]: np.einsum("ij -> ji", A)
Out[221]:
array([[11, 21, 31, 41],
[12, 22, 32, 42],
[13, 23, 33, 43],
[14, 24, 34, 44]])
7)向量的外积(类似于np。外(vec1 vec2))
In [255]: np.einsum("i, j -> ij", vec, vec)
Out[255]:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
8)(向量的)内积(类似于np。内部(vec1 vec2))
In [256]: np.einsum("i, i -> ", vec, vec)
Out[256]: 14
9)沿0轴求和(类似于np。sum (arr轴= 0))
In [260]: np.einsum("ij -> j", B)
Out[260]: array([10, 10, 10, 10])
10)沿轴1求和(类似于np。sum (arr轴= 1))
In [261]: np.einsum("ij -> i", B)
Out[261]: array([ 4, 8, 12, 16])
11)批量矩阵乘法
In [287]: BM = np.stack((A, B), axis=0)
In [288]: BM
Out[288]:
array([[[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44]],
[[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
[ 3, 3, 3, 3],
[ 4, 4, 4, 4]]])
In [289]: BM.shape
Out[289]: (2, 4, 4)
# batch matrix multiply using einsum
In [292]: BMM = np.einsum("bij, bjk -> bik", BM, BM)
In [293]: BMM
Out[293]:
array([[[1350, 1400, 1450, 1500],
[2390, 2480, 2570, 2660],
[3430, 3560, 3690, 3820],
[4470, 4640, 4810, 4980]],
[[ 10, 10, 10, 10],
[ 20, 20, 20, 20],
[ 30, 30, 30, 30],
[ 40, 40, 40, 40]]])
In [294]: BMM.shape
Out[294]: (2, 4, 4)
12)沿轴2求和(类似于np。sum (arr轴= 2))
In [330]: np.einsum("ijk -> ij", BM)
Out[330]:
array([[ 50, 90, 130, 170],
[ 4, 8, 12, 16]])
13)对数组中的所有元素求和(类似于np.sum(arr))
In [335]: np.einsum("ijk -> ", BM)
Out[335]: 480
14)多轴求和(即边缘化) (类似于np。Sum (arr, axis=(axis0, axis1, axis2, axis3, axis4, axis6, axis7)))
# 8D array
In [354]: R = np.random.standard_normal((3,5,4,6,8,2,7,9))
# marginalize out axis 5 (i.e. "n" here)
In [363]: esum = np.einsum("ijklmnop -> n", R)
# marginalize out axis 5 (i.e. sum over rest of the axes)
In [364]: nsum = np.sum(R, axis=(0,1,2,3,4,6,7))
In [365]: np.allclose(esum, nsum)
Out[365]: True
15)双点积(类似于np.sum(hadamard-product) cf. 3)
In [772]: A
Out[772]:
array([[1, 2, 3],
[4, 2, 2],
[2, 3, 4]])
In [773]: B
Out[773]:
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
In [774]: np.einsum("ij, ij -> ", A, B)
Out[774]: 124
16)二维和三维数组乘法
这样的乘法在求解线性方程组(Ax = b)时非常有用,当你想验证结果时。
# inputs
In [115]: A = np.random.rand(3,3)
In [116]: b = np.random.rand(3, 4, 5)
# solve for x
In [117]: x = np.linalg.solve(A, b.reshape(b.shape[0], -1)).reshape(b.shape)
# 2D and 3D array multiplication :)
In [118]: Ax = np.einsum('ij, jkl', A, x)
# indeed the same!
In [119]: np.allclose(Ax, b)
Out[119]: True
相反,如果必须使用np.matmul()进行验证,我们必须做一些重塑操作来实现相同的结果,例如:
# reshape 3D array `x` to 2D, perform matmul
# then reshape the resultant array to 3D
In [123]: Ax_matmul = np.matmul(A, x.reshape(x.shape[0], -1)).reshape(x.shape)
# indeed correct!
In [124]: np.allclose(Ax, Ax_matmul)
Out[124]: True
奖励:阅读更多的数学在这里:爱因斯坦求和和绝对在这里:张量符号
我发现NumPy: The tricks of The trade (Part II)很有教育意义
我们使用->表示输出数组的顺序。把'ij, i->j'看成左手边(LHS)和右手边(RHS)。LHS上的任何重复标签都将按乘积元素计算,然后求和。通过改变RHS(输出)端上的标签,我们可以定义我们想要相对于输入数组进行处理的轴,即沿着轴0,1和等等进行求和。
import numpy as np
>>> a
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> b
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> d = np.einsum('ij, jk->ki', a, b)
注意这里有三个轴,i, j, k, j是重复的(在左边)。I j表示a的行和列j k表示b。
为了计算乘积并对齐j轴,我们需要向a添加一个轴(b将沿第一个轴广播)。
a[i, j, k]
b[j, k]
>>> c = a[:,:,np.newaxis] * b
>>> c
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]],
[[ 0, 3, 6],
[ 9, 12, 15],
[18, 21, 24]]])
右边没有J所以我们对J求和J是3x3x3数组的第二个轴
>>> c = c.sum(1)
>>> c
array([[ 9, 12, 15],
[18, 24, 30],
[27, 36, 45]])
最后,右边的下标(按字母顺序)是颠倒的,所以我们转置。
>>> c.T
array([[ 9, 18, 27],
[12, 24, 36],
[15, 30, 45]])
>>> np.einsum('ij, jk->ki', a, b)
array([[ 9, 18, 27],
[12, 24, 36],
[15, 30, 45]])
>>>
我认为最简单的例子是在tensorflow文档中
将方程转换为einsum符号有四个步骤。我们以这个方程为例C[i,k] = sum_j A[i,j] * B[j,k]
首先,我们删除变量名。得到ik = sum_j ij * jk 我们删除sum_j项,因为它是隐式的。得到ik = ij * jk 我们用,替换*。我们得到ik = ij jk 输出在RHS上,用->符号分隔。得到ij jk - >ik
einsum解释器只是反向运行这4个步骤。对结果中缺失的所有指标求和。
下面是文档中的更多示例
# Matrix multiplication
einsum('ij,jk->ik', m0, m1) # output[i,k] = sum_j m0[i,j] * m1[j, k]
# Dot product
einsum('i,i->', u, v) # output = sum_i u[i]*v[i]
# Outer product
einsum('i,j->ij', u, v) # output[i,j] = u[i]*v[j]
# Transpose
einsum('ij->ji', m) # output[j,i] = m[i,j]
# Trace
einsum('ii', m) # output[j,i] = trace(m) = sum_i m[i, i]
# Batch matrix multiplication
einsum('aij,ajk->aik', s, t) # out[a,i,k] = sum_j s[a,i,j] * t[a, j, k]
在阅读einsum方程时,我发现能够这样做最有帮助 在心里把它们归结为它们的命令版本。
让我们从下面的陈述开始:
C = np.einsum('bhwi,bhwj->bij', A, B)
首先看标点符号,我们看到箭头前面有两个4个逗号分隔的斑点——bhwi和bhwj, 后面还有一个3个字母的bij。因此,该方程由两个秩4张量输入产生一个秩3张量结果。
现在,让每个blob中的每个字母都是一个范围变量的名称。字母出现在圆点中的位置 是它在这个张量中所覆盖的轴的下标。 因此,生成C的每个元素的命令式求和必须从三个嵌套的for循环开始,每个for循环对应C的每个索引。
for b in range(...):
for i in range(...):
for j in range(...):
# the variables b, i and j index C in the order of their appearance in the equation
C[b, i, j] = ...
所以,本质上,对于c的每个输出索引都有一个for循环,我们暂时不确定范围。
接下来我们看左边,有没有范围变量没有出现在右边?在我们的例子中,是的,h和w。 为每个这样的变量添加一个内部嵌套的for循环:
for b in range(...):
for i in range(...):
for j in range(...):
C[b, i, j] = 0
for h in range(...):
for w in range(...):
...
在最里面的循环中,我们现在已经定义了所有的指标,所以我们可以写出实际的和 翻译完成:
# three nested for-loops that index the elements of C
for b in range(...):
for i in range(...):
for j in range(...):
# prepare to sum
C[b, i, j] = 0
# two nested for-loops for the two indexes that don't appear on the right-hand side
for h in range(...):
for w in range(...):
# Sum! Compare the statement below with the original einsum formula
# 'bhwi,bhwj->bij'
C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]
如果到目前为止您已经能够遵循代码,那么恭喜您!这就是阅读einsum方程所需要的。请特别注意原始einsum公式如何映射到上面代码片段中的最终求和语句。for循环和范围界限只是无用的,最后的语句是你真正需要理解发生了什么。
为了完整起见,让我们看看如何确定每个范围变量的范围。好吧,每个变量的范围就是它索引的维度的长度。 显然,如果一个变量在一个或多个张量中表示多个维度,那么每个维度的长度必须相等。 下面是上面的完整范围的代码:
# C's shape is determined by the shapes of the inputs
# b indexes both A and B, so its range can come from either A.shape or B.shape
# i indexes only A, so its range can only come from A.shape, the same is true for j and B
assert A.shape[0] == B.shape[0]
assert A.shape[1] == B.shape[1]
assert A.shape[2] == B.shape[2]
C = np.zeros((A.shape[0], A.shape[3], B.shape[3]))
for b in range(A.shape[0]): # b indexes both A and B, or B.shape[0], which must be the same
for i in range(A.shape[3]):
for j in range(B.shape[3]):
# h and w can come from either A or B
for h in range(A.shape[1]):
for w in range(A.shape[2]):
C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]
让我们创建两个数组,它们具有不同但兼容的尺寸,以突出它们的相互作用
In [43]: A=np.arange(6).reshape(2,3)
Out[43]:
array([[0, 1, 2],
[3, 4, 5]])
In [44]: B=np.arange(12).reshape(3,4)
Out[44]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
你的计算,取(2,3)与(3,4)的乘积的“点”(和)来生成(4,2)数组。i是A的第一个,C的最后一个;k B的最后一个,c的第一个j被求和'消耗'。
In [45]: C=np.einsum('ij,jk->ki',A,B)
Out[45]:
array([[20, 56],
[23, 68],
[26, 80],
[29, 92]])
这和np。dot(A,B)是一样的。T -它是转置后的最终输出。
要了解更多关于j的内容,请将C下标更改为ijk:
In [46]: np.einsum('ij,jk->ijk',A,B)
Out[46]:
array([[[ 0, 0, 0, 0],
[ 4, 5, 6, 7],
[16, 18, 20, 22]],
[[ 0, 3, 6, 9],
[16, 20, 24, 28],
[40, 45, 50, 55]]])
这也可以用:
A[:,:,None]*B[None,:,:]
也就是说,在a的末尾增加一个k维,在B的前面增加一个i维,得到一个(2,3,4)数组。
0 + 4 + 16 = 20,9 + 28 + 55 = 92,等等;对j和转置求和得到前面的结果:
np.sum(A[:,:,None] * B[None,:,:], axis=1).T
# C[k,i] = sum(j) A[i,j (,k) ] * B[(i,) j,k]