这里写目录标题
- 1. Mask
- 2. Image transformation
- 3. Quartile Data
- 3. `__iter__` & `__next__`
- 4. defaultdict
- 5. `@staticmethod`
- 6. log_softmax
- 7. save dataloader
- 8. torch nn.functional vs nn
- 9. `preds = F.linear(feats, output_weight, output_bias)`
- 10. output_weight.grad.fill_(0)
- 11. `init_weight.detach().requires_grad_()`
- 12. torch Vs torchvision
- 13. ` np.lib.stride_tricks.as_strided`
- 14. `np.einsum `
- 14. `x.flatten()`
- 15. `nn.embedding`
- 16. pytorch_lightning
1. Mask
mask= (targets[:,None] == class_set[None,:]).any(dim=-1)
targets: tensor([19, 29, 0, …, 51, 42, 70]):共60000个数字
class_set: tensor([19, 29, 0, …, 51, 42, 70]) : 共80个数字
targets[:].shape = 60000
targets[:,None].shape = torch.Size([60000, 1])
class_set[:].shape = 80
class_set[None,:].shape = torch.Size([1, 80])
mask的最终大小为:torch.Size([60000])
!!使用None的这种方式,可以让两个互不相等的tensor互相找出相同值,并且不会报错。比如将上述的code变成:
b = (targets[:] == class_set[:]).any(dim=-1)
报错:The size of tensor a (60000) must match the size of tensor b (80) at non-singleton dimension 0
b = (targets[None,:] == class_set[None,:]).any(dim=-1)
报错:The size of tensor a (60000) must match the size of tensor b (80) at non-singleton dimension 1
b = (targets[None,:] == class_set[:,None]).any(dim=-1)
b的最终大小为:torch.Size([80]) 以class_set为基准
b = (targets[None,:] == class_set[:,None]).any(dim=0)
b的最终大小为:torch.Size([60000]) 以targets为基准,与原有的代码一样。
2. Image transformation
方法1: 使用 “/255”
最终数据不会centered by 0. 如何要用PCA的算法进行分析,那么这个方法处理的数据则不适用。
方法2: 使用 mean & std
这个处理后的数据属于 centred by 0. (推荐!!)
3. Quartile Data
例子
10, 30, 5, 12, 20, 40, 25, 15, 18
- 排序:5, 10, 12, 15, 18, 20, 25, 30, 40
-
- F i r s t Q u a r t i l e = ( n + 1 4 ) t h First Quartile = (\frac{n + 1}{4})th FirstQuartile=(4n+1)th 即 1/4
- S e c o n d Q u a r t i l e = ( n + 1 2 ) t h Second Quartile = (\frac{n + 1}{2})th SecondQuartile=(2n+1)th 即 2/4
- T h i r d Q u a r t i l e = ( 3 ( n + 1 ) 4 ) t h Third Quartile = (\frac{3(n + 1)}{4})th ThirdQuartile=(43(n+1))th 即 3/4
First Quartile = ((9 + 1)/4)th term
= (10/4)th term
= 2.5th term
2.5th term = 2nd term + (0.5) (3rd term - 2nd term)
= (10) + (0.5) (12 - 10)
= 10+1
= 11
The First Quartile value is 11.
Second Quartile =
(
9
+
1
)
2
t
h
\frac{(9 + 1)}{2}th
2(9+1)th term
= (10/2)th term
= 5th term
5th term is 18
So the second Quartile value is 18.
Third Quartile = 3 ( 9 + 1 ) 4 t h \frac{3(9 + 1)}{4}th 43(9+1)th term
= 3 ( 10 ) 4 t h \frac{3(10)}{4}th 43(10)th term
= 7.5th term
7.5th term is average result of 7th and 8th term = (25 + 30)/2 = 27.5
3. __iter__
& __next__
当构建class时,使用`__iter__` 内置method,返回值是个iterable 的数据,并且可通过内置的 `__next__`获取下一个值
4. defaultdict
特点:
- 和 Java init 一个list 使用default的值类似
- 不会有 KeyError
- 也因为它独特的设计,不会有KeyError,它比传统的dictionary 好用(不需要判断key存不存在)
- 例子:
d = defaultdict(list)
for i in range(5):
d[i].append(i)
5. @staticmethod
当作一个class里附属的统计小工具method。因为有了这个decorator,则不能在method的parameters里添加‘‘self’’,也因此不能访问classe的attributes.
6. log_softmax
比Softmax更好,因为可以凸显结果。
7. save dataloader
使用pkl
torch.save(dls, ‘fname.pkl’)
8. torch nn.functional vs nn
nn.functional 不存任何 state. 因此例如 nn.functional.Linear,需要人工输入weight是多少,bias是多少。
nn.Linear returns : class
nn.functional.Linear returns (N,∗,out_features).
也许 nn.functional在那种需要对weight和bias有特别骚操作的时候,比较user friendly。
9. preds = F.linear(feats, output_weight, output_bias)
F 即为通常在“8”里提到的nn.functional
10. output_weight.grad.fill_(0)
autograd by default frees the intermediate gradients that are not needed anymore, so that the memory usage is minimal.
retain_variables will only prevent autograd from freeing some buffers needed for backward (e.g. when you want to backprop multiple times through a graph). Use hooks to access intermediate gradients.
当使用了nn.functional, 需要Reset gradient的操作,套路一般如下:
# Reset gradients
local_optim.zero_grad()
output_weight.grad.fill_(0)
output_bias.grad.fill_(0)
11. init_weight.detach().requires_grad_()
- detach() detaches the output from the computationnal graph. So no gradient will be backproped along this variable.
- torch.no_grad says that no operation should build the graph.
The difference is that one refers to only a given variable on which it’s called. The other affects all operations taking place within the with statement.
torch.no_grad yes you can use in eval phase in general.
- detach() on the other hand should not be used if you’re doing classic cnn like architectures. It is usually used for more tricky operations.
- detach() is useful when you want to compute something that you can’t / don’t want to differentiate.
- Like for example if you’re computing some indices from the output of the network and then want to use that to index a tensor. The indexing operation is not differentiable wrt the indices. So you should detach() the indices before providing them.
- the version with torch.no_grad will use less memory
12. torch Vs torchvision
torchvision 相当于一个基础工具包,有一系列支持pytorch项目的包,减少重复造轮子。例如DenseNet 这个模型(有别于dense layer)就可以直接从torchvision 调用。
Dense layer, also called fully-connected layer, refers to the layer whose inside neurons connect to every neuron in the preceding layer.
torchvision.models.DenseNet
13. np.lib.stride_tricks.as_strided
模拟了sliding window 获取经过了filter的图片结果输出。function 里 的变量 ‘shape’ 改变了原始图像的尺寸
case:
def conv2d(image, ftr):
s = ftr.shape + tuple(np.subtract(image.shape, ftr.shape)+1)
sub_image = np.lib.stride_tricks.as_strided(image, shape = s, strides = image.strides * 2)
return np.einsum('ij,ijkl->kl', ftr, sub_image)
Filter = 3x3, Image = 100 x 100
- Ftr.shape: tuple type tuple(np.subtract(image.shape, ftr.shape) + 1)
- 把numpy array 变成tuple type
s = ftr.shape + tuple(np.subtract(image.shape, ftr.shape) + 1)
=>
[
3
,
3
]
+
(
n
p
a
r
r
a
y
[
(
100
,
100
)
−
(
3
,
3
)
]
+
1
)
[3,3]+(nparray[(100,100)-(3,3)]+1)
[3,3]+(nparray[(100,100)−(3,3)]+1)
=>
[
3
,
3
]
+
[
(
97
,
97
)
+
1
]
[3,3]+[(97,97)+1]
[3,3]+[(97,97)+1]
=>
[
3
,
3
]
+
[
98
,
98
]
[3,3]+[98,98]
[3,3]+[98,98]
=>
[
3
,
3
,
98
,
98
]
[3,3,98, 98]
[3,3,98,98]
这里最后一步相当于两个list 相加的操作。可借鉴用于设定新的shape。
14. np.einsum
- Torch 和 numpy 都可以用这个operant
- 两个matrices做multiplication ,需要用到for loop
去模拟 summation的计算规则,但写代码可以用Einsum简洁的表示,并且不用考虑
两个matrix相乘,第二个要用transpose 形式
basic 应用:
np.einsum('ij,ijkl->kl', ftr, sub_image)
ij,ijkl->kl
: ij是ftr的shape; ijkl是 sub_image的shape;ij,ijkl指当ij的shape matrix 与 ijkl的shape的matrix相乘;->kl 指为ij,ijkl的相乘设定规则,并指定输出为 kl.
这里的kl是outer loop, ij是inter loop
其它应用:
将x的所有值相加
X = np.ones(3)
Sum_x = np.einsum('i->',x)
将 X 从顺序 ‘ijk’ 转换到顺序 ‘kji’,相当于reverse
X= np.ones((5,4,3))
np.einsum('ijk-<kji',x)
Element-wise
'ij,ij->ij'
Batch multiplication:
A = torch.rand((3,2,5))
B= torch.rand((3,5,3))
'ijk,ikl->ijl'
i : batch。实际上是 jk 与 kl相乘
Matrix diagonal:
X = torch.rand((3,3))
torch.einsum(‘ii->I’,x)
输出仅是对角线上的值
14. x.flatten()
将要合并拉平的维度放到括号:原始:x.shape = [4,2,2,3,16,16]
x = x.flatten(1,2) # X.SHAPE = [4,4,3,16,16] 将第1和第2结合拉平
x = x.flatten(0).shape # X.SHAPE = [12288] 将第0到第N结合拉平
x = x.flatten(3).shape # X.SHAPE = [4, 2, 2, 768] 将第3到第N结合拉平
括号里只能填两个数,放平开始的index,和放平结束的index
x = x.flatten(0,1,2).shape # 报错!
x = x.flatten(0,2).shape # 达到以上要要的 X.SHAPE = [16, 3, 16, 16] 将第0到第2结合拉平
不能跳着填,如果要结合的拉平的index中间有gap:原始:x.shape = [4,2,2,3,16,16]
想要结合index (1,3,5)=> x.shape = [4,(2x3x16),2,16] =>[4,96,2,16]
x = x.flatten(1,3,5).shape # 报错!
# 改正:
x = x.permute(0, 1, 3, 5, 2, 4) # torch.Size([4, 2, 3, 16, 2, 16])
x = x.flatten(1,3).shape #torch.Size([4, 96, 2, 16])
15. nn.embedding
用于定义一个数据结构,相当于一个备用的tensor。比如在NLP或者Transformer里,需要根据输入初始化一个Tensor用于后续训练。
比如Transformer Decoder的输入:Query。一开始就是根据所需定义的一个由random数字组成的初始化Tensor, 随着训练,这个Tensor会被继续更新。
···
embedding = nn.Embedding(10, 3) # 定义要怎么样的Tensor
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) #假设Transformer Encoder 的 input 数据
embedding(input) # 相当于 numpy 里 带自定义形状的np.zeros_like((input))
···
16. pytorch_lightning
pl.Trainer