Regular operators

1. Mask

mask= (targets[:,None] == class_set[None,:]).any(dim=-1)

targets: tensor([19, 29, 0, …, 51, 42, 70]):共60000个数字
class_set: tensor([19, 29, 0, …, 51, 42, 70]) : 共80个数字

targets[:].shape = 60000
targets[:,None].shape = torch.Size([60000, 1])

class_set[:].shape = 80
class_set[None,:].shape = torch.Size([1, 80])

mask的最终大小为:torch.Size([60000])
!!使用None的这种方式,可以让两个互不相等的tensor互相找出相同值,并且不会报错。比如将上述的code变成:

b = (targets[:] == class_set[:]).any(dim=-1)

报错:The size of tensor a (60000) must match the size of tensor b (80) at non-singleton dimension 0

b = (targets[None,:] == class_set[None,:]).any(dim=-1)

报错:The size of tensor a (60000) must match the size of tensor b (80) at non-singleton dimension 1

b = (targets[None,:] == class_set[:,None]).any(dim=-1)

b的最终大小为:torch.Size([80]) 以class_set为基准

b = (targets[None,:] == class_set[:,None]).any(dim=0)

b的最终大小为:torch.Size([60000]) 以targets为基准,与原有的代码一样。

2. Image transformation

方法1: 使用 “/255”

最终数据不会centered by 0. 如何要用PCA的算法进行分析,那么这个方法处理的数据则不适用。

方法2: 使用 mean & std

这个处理后的数据属于 centred by 0. (推荐!!)

3. Quartile Data

例子

10, 30, 5, 12, 20, 40, 25, 15, 18

  1. 排序:5, 10, 12, 15, 18, 20, 25, 30, 40
    • F i r s t Q u a r t i l e = ( n + 1 4 ) t h First Quartile = (\frac{n + 1}{4})th FirstQuartile=(4n+1)th 即 1/4
    • S e c o n d Q u a r t i l e = ( n + 1 2 ) t h Second Quartile = (\frac{n + 1}{2})th SecondQuartile=(2n+1)th 即 2/4
    • T h i r d Q u a r t i l e = ( 3 ( n + 1 ) 4 ) t h Third Quartile = (\frac{3(n + 1)}{4})th ThirdQuartile=(43(n+1))th 即 3/4

First Quartile = ((9 + 1)/4)th term
= (10/4)th term
= 2.5th term

2.5th term = 2nd term + (0.5) (3rd term - 2nd term)
= (10) + (0.5) (12 - 10)
= 10+1
= 11
The First Quartile value is 11.

Second Quartile = ( 9 + 1 ) 2 t h \frac{(9 + 1)}{2}th 2(9+1)th term
= (10/2)th term
= 5th term
5th term is 18

So the second Quartile value is 18.

Third Quartile = 3 ( 9 + 1 ) 4 t h \frac{3(9 + 1)}{4}th 43(9+1)th term

= 3 ( 10 ) 4 t h \frac{3(10)}{4}th 43(10)th term

= 7.5th term

7.5th term is average result of 7th and 8th term = (25 + 30)/2 = 27.5

3. __iter__ & __next__

当构建class时,使用`__iter__` 内置method,返回值是个iterable 的数据,并且可通过内置的 `__next__`获取下一个值

4. defaultdict

特点:

  • 和 Java init 一个list 使用default的值类似
  • 不会有 KeyError
  • 也因为它独特的设计,不会有KeyError,它比传统的dictionary 好用(不需要判断key存不存在)
    • 例子:
 d = defaultdict(list)
 for i in range(5):
    d[i].append(i)

5. @staticmethod

当作一个class里附属的统计小工具method。因为有了这个decorator,则不能在method的parameters里添加‘‘self’’,也因此不能访问classe的attributes.

6. log_softmax

比Softmax更好,因为可以凸显结果。

7. save dataloader

使用pkl

torch.save(dls, ‘fname.pkl’)

8. torch nn.functional vs nn

nn.functional 不存任何 state. 因此例如 nn.functional.Linear,需要人工输入weight是多少,bias是多少。
nn.Linear returns : class
nn.functional.Linear returns (N,∗,out_features).
也许 nn.functional在那种需要对weight和bias有特别骚操作的时候,比较user friendly。

9. preds = F.linear(feats, output_weight, output_bias)

F 即为通常在“8”里提到的nn.functional

10. output_weight.grad.fill_(0)

autograd by default frees the intermediate gradients that are not needed anymore, so that the memory usage is minimal.
retain_variables will only prevent autograd from freeing some buffers needed for backward (e.g. when you want to backprop multiple times through a graph). Use hooks to access intermediate gradients.
当使用了nn.functional, 需要Reset gradient的操作,套路一般如下:

# Reset gradients
            local_optim.zero_grad()
            output_weight.grad.fill_(0)
            output_bias.grad.fill_(0)

11. init_weight.detach().requires_grad_()

  • detach() detaches the output from the computationnal graph. So no gradient will be backproped along this variable.
  • torch.no_grad says that no operation should build the graph.

The difference is that one refers to only a given variable on which it’s called. The other affects all operations taking place within the with statement.
torch.no_grad yes you can use in eval phase in general.

  • detach() on the other hand should not be used if you’re doing classic cnn like architectures. It is usually used for more tricky operations.
  • detach() is useful when you want to compute something that you can’t / don’t want to differentiate.
    • Like for example if you’re computing some indices from the output of the network and then want to use that to index a tensor. The indexing operation is not differentiable wrt the indices. So you should detach() the indices before providing them.
  • the version with torch.no_grad will use less memory

12. torch Vs torchvision

torchvision 相当于一个基础工具包,有一系列支持pytorch项目的包,减少重复造轮子。例如DenseNet 这个模型(有别于dense layer)就可以直接从torchvision 调用。

Dense layer, also called fully-connected layer, refers to the layer whose inside neurons connect to every neuron in the preceding layer.

torchvision.models.DenseNet

13. np.lib.stride_tricks.as_strided

模拟了sliding window 获取经过了filter的图片结果输出。function 里 的变量 ‘shape’ 改变了原始图像的尺寸
case:

def conv2d(image, ftr):
    s = ftr.shape + tuple(np.subtract(image.shape, ftr.shape)+1)
    sub_image = np.lib.stride_tricks.as_strided(image, shape = s, strides = image.strides * 2)
    return np.einsum('ij,ijkl->kl', ftr, sub_image) 

Filter = 3x3, Image = 100 x 100

  • Ftr.shape: tuple type tuple(np.subtract(image.shape, ftr.shape) + 1)
  • 把numpy array 变成tuple type

s = ftr.shape + tuple(np.subtract(image.shape, ftr.shape) + 1)
=> [ 3 , 3 ] + ( n p a r r a y [ ( 100 , 100 ) − ( 3 , 3 ) ] + 1 ) [3,3]+(nparray[(100,100)-(3,3)]+1) [3,3]+(nparray[(100,100)(3,3)]+1)
=> [ 3 , 3 ] + [ ( 97 , 97 ) + 1 ] [3,3]+[(97,97)+1] [3,3]+[(97,97)+1]
=> [ 3 , 3 ] + [ 98 , 98 ] [3,3]+[98,98] [3,3]+[98,98]
=> [ 3 , 3 , 98 , 98 ] [3,3,98, 98] [339898]
这里最后一步相当于两个list 相加的操作。可借鉴用于设定新的shape。

14. np.einsum

  • Torch 和 numpy 都可以用这个operant
  • 两个matrices做multiplication ,需要用到for loop
    去模拟 summation的计算规则,但写代码可以用Einsum简洁的表示,并且不用考虑
    两个matrix相乘,第二个要用transpose 形式

basic 应用:
np.einsum('ij,ijkl->kl', ftr, sub_image)

ij,ijkl->kl: ijftr的shape; ijklsub_image的shape;ij,ijkl指当ij的shape matrix 与 ijkl的shape的matrix相乘;->kl 指为ij,ijkl的相乘设定规则,并指定输出kl.
这里的kl是outer loop, ij是inter loop
其它应用:
将x的所有值相加

X = np.ones(3)
Sum_x = np.einsum('i->',x) 

将 X 从顺序 ‘ijk’ 转换到顺序 ‘kji’,相当于reverse

X= np.ones((5,4,3))
np.einsum('ijk-<kji',x) 

Element-wise

'ij,ij->ij'

Batch multiplication:

A = torch.rand((3,2,5))
B= torch.rand((3,5,3))
'ijk,ikl->ijl'

i : batch。实际上是 jk 与 kl相乘

Matrix diagonal:

X = torch.rand((3,3))
torch.einsum(‘ii->I’,x)

输出仅是对角线上的值

14. x.flatten()

将要合并拉平的维度放到括号:原始:x.shape = [4,2,2,3,16,16]

x = x.flatten(1,2) # X.SHAPE = [4,4,3,16,16] 将第1和第2结合拉平
x = x.flatten(0).shape # X.SHAPE = [12288] 将第0到第N结合拉平
x = x.flatten(3).shape # X.SHAPE = [4, 2, 2, 768] 将第3到第N结合拉平

括号里只能填两个数,放平开始的index,和放平结束的index

x = x.flatten(0,1,2).shape # 报错!
x = x.flatten(0,2).shape # 达到以上要要的 X.SHAPE = [16, 3, 16, 16] 将第0到第2结合拉平

不能跳着填,如果要结合的拉平的index中间有gap:原始:x.shape = [4,2,2,3,16,16]
想要结合index (1,3,5)=> x.shape = [4,(2x3x16),2,16] =>[4,96,2,16]

x = x.flatten(1,3,5).shape # 报错!
# 改正:
x = x.permute(0, 1, 3, 5, 2, 4) # torch.Size([4, 2, 3, 16, 2, 16])
x = x.flatten(1,3).shape #torch.Size([4, 96, 2, 16])

15. nn.embedding

用于定义一个数据结构,相当于一个备用的tensor。比如在NLP或者Transformer里,需要根据输入初始化一个Tensor用于后续训练。
比如Transformer Decoder的输入:Query。一开始就是根据所需定义的一个由random数字组成的初始化Tensor, 随着训练,这个Tensor会被继续更新。

···
embedding = nn.Embedding(10, 3) # 定义要怎么样的Tensor
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) #假设Transformer Encoder 的 input 数据
embedding(input) # 相当于 numpy 里 带自定义形状的np.zeros_like((input))
···

16. pytorch_lightning

pl.Trainer

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值