芝麻挞-CSDN博客

原创 [Paper Summary] The Future of Computational Linguistics: On Beyond Alchemy

1950s empiricism: Info Theory AI as applied statistics1970s rationalism: formal language theory and logic1990s empiricism: stochastic grammars (probability & preference)2010s: empiricism: deep netsPastCL is an interdisciplinary topic that has b.

2021-07-28 05:11:14 314

原创 [Paper Summary: Modesty is the Formula for Success] Good applications for crummy machine translation

There is a risk that eval can devolve into mindless metrics.Good Applications for Crummy MT — Kenneth W. Church & Eduard H. Hovy, AT&T Bell Laboratories, USC ISIThe success of the eval often depends very strongly on the selection of an appropriat

2021-07-26 10:27:54 257

原创 [Paper Summary] Optimal Brain Damage

Optimal Brain Damage— LeCun, Denker and Solla, 1989, Advances in Neural Information Processing SystemsWe introduce OBD for reducing the size of a learning network by selectively deleting weights based on second-derivative information.We show that OBD

2021-07-26 08:45:29 257

原创互相成就：Massive quantities of data； The new norm of eval

The data-intensive approach to language, which is becoming known as Text Analysis, takes a pragmatic approach that is well suited to meet the recent emphasis on numerical evaluations and concrete delieverables [A Pendulum Swung Too Far]. Text Analysis focu

2021-07-26 08:04:00 487

原创 Whither Speech Recognition: 25年又一个25年

Pierce’s harsh criticismWhither Speech Recognition — J.R. Pierce, 1969In deception, studied and artful deceit is apt to succeed better and more quickly than science. Indeed, a wag has proposed that computers are becoming so nearly human that they can act

2021-07-22 06:23:45 292

原创 [Paper Summary: My Machine sucks, or eval sucks? ... or both?] A Survey of 25 Years of Evaluation

A Survey of 25 Years of Evaluation — Kenneth Ward Church & Joel Hestness, 2019Sometimes the numbers are too good to be true, and sometimes the truth is better than the numbers. Sometimes the problem is not with the numbers but with the interpretation.

2021-07-22 04:51:50 146

原创 [Paper Summary] A Pendulum Swung Too Far

There is a trend of oscillation between Rationalism and Empiricism and back with a switch every couple decades.1950s: Empiricism (Shannon, Skinner, Firth, Harris)1970s: Ratinalism (Chomsky, Minsky)1990s: Empiricism (IBM Speech Group, AT&T Bell Labs.

2021-07-21 08:49:16 352

原创 Explainability & Reviewing: The responsbility finally goes back to the audience, i.e EVERYONE

A call for Explanation — Insights should be much more valued than numbersEmerging trends: I did it, I did it, I did it, but … — Kenneth Ward Church, 2017Does it make it ok for machines to do bad things if no one knows what's happening and why, including

2021-07-21 08:48:07 92

原创 What we (I) do on surface, what we (I) do by heart

Emerging trends: A tribute to Charles Wayne — Kenneth Ward Church, 2017Charles Wayne restarted funding in speech and language in the mid-1980s, after a funding winter brought on by Pierce’s glamour-and-deceit criticisms in the ALPAC report and Whither Spe

2021-07-21 08:46:32 93

原创 MT from winter to spring, the spring has lasted till now.

When is the next ALPAC report due? — Margaret King,1984, University of Geneva, SwitzerlandMachine translation has a somewhat chequered history — Margaret KingThere were already proposals for automatic translation systems in the 30’s, but it was not unt.

2021-07-16 06:14:06 192

原创 The AI Winter, ALPAC, and an Interesting History an AI‘er has to know

ALPACThe report is entitled: Languages and machines: computers in translation and linguistics. It was supposedly concerned, therefore, not just with MT but with the broader field of computational linguistics. [Hutchins 1996]It might be simpler and more

2021-07-15 04:02:26 308

原创 [Meta Summary] A stream of paper in probing Transformer Language Models

一开始知道Probing这个方向大概就是从Voita的NLP with friends talk吧，当时模糊地理解为是一种"neural network interpretation"的方式，其实这种理解没有错误，只是它只说中了一半，这周读了若干篇paper along this stream, 发现probing还有一个目的是serve as evaluation metrics for representation learning. 不过这就更玄学了，比NLG的evaluation还要玄学。起码NLG

2021-07-11 07:58:10 116

原创 [Paper Summary] Pareto Probing: Trading Off Acc for Complexity [Pimental 2020]

Pareto Probing: Trading Off Acc for Complexity [Pimental 2020]KeypointsCall for harder probing tasksToy probing tasks, such as POS labeling and dependency arc labeling are inadequate to evaluate the linguistic feature encoded in contextual word repres

2021-07-11 07:05:08 105

原创 [Paper Summary] Information-theoretic probing for linguistic structure [Pimental 2020]

Information-theoretic probing for linguistic structure [Pimental 2020]Teaser… under our operationalization, the endeavour of finding syntax in contextualized embeddings sentences is nonsensical. This is because, under Assumption 1, we know the answer a

2021-07-10 22:29:06 131

原创 [Paper Summary] Designing and Interpreting Probes with Control Tasks [Hewitt & Liang 2019]

Designing and Interpreting Probes with Control Tasks [[Hewitt & Liang 2019](https://arxiv.org/abs/1909.03368)]tl;drA good probe should be selective, achieving high linguistic task acc and low control task acc.Motivation for control tasksFavor ‘

2021-07-10 22:13:34 136

原创 [Paper Summary] When Do You Need Billions of Words of Pretraining Data? [Zhang 2020]

When Do You Need Billions of Words of Pretraining Data? [Zhang 2020]很棒很棒总结完了才意识到这篇的discussion都是彩蛋都很有深意！Core research question: What exact knowledge or skills do Transformer LMs learn from large-scale pretraining that they cannot learn from less data?

2021-07-10 07:13:12 263

原创 [Paper Summary] Frustratingly Short Attn Spans in Neural LM [Daniluk 2017]

Frustratingly Short Attn Spans in Neural LM这是2017年的ICLR很老的文而且主要的贡献是用了不同的representation for key, value and next-word distribution，当时命名还不太一样，他的key就是现在的query，他的value是现在的key，他的next-word distribution是现在的value。读的时候我一度认为现在的标准做法 projection to Q, K, V就是一开始源自这篇，

2021-07-10 05:01:59 63

原创 [Paper Summary] What Will it Take to Fix Benchmarking in NLU? [Bowman 2021]

What Will it Take to Fix Benchmarking in NLU? [Bowman 2021]tl;drWe lay out 4 criteria that we argue NLU benchmarks should meet. Most current benchmarks fail at these critieria, and adversarial data collection does not meaningfully address the causes of

2021-07-10 04:20:00 137

原创 [Paper Summary] A Primer in BERTology: What We Know About How BERT Works [Rogers 2020]

A Primer in BERTology: What We Know About How BERT Works [Rogers 2020]Probing works strive to learn about the types of of linguistic (e.g. POS, dependency, [Warstadt2019] - Five Analysis with NPIs, [Warstadt 2020] - RoBERTa acquires a preference for lingu

2021-07-08 08:05:23 279

原创 [Paper Summary] Information-Theoretic Probing with Minimum Description Length [Voita & Titov 2020]

Information-Theoretic Probing with Minimum Description Length

2021-07-08 04:28:30 114

原创 [Paper Summary] Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [Elazar 2020]

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactualstl;drProbing results cannot infer behavioral conclusions. Probing provides no evidence for or against the actual use of this information by the model.We focus on how the information is

2021-07-07 22:39:55 123

原创 [Paper Summary] oLMpics - On what LM Pre-training Captures [Talmor 2019]

oLMpics - On what LM Pre-training CapturesKeypointsWe propose a diverse set of probing tasks for types of symbolic reasoning that are potentially difficult to capture using a LM objective.We provide an analysis of skills that current LMs possess. Their

2021-07-07 08:43:48 77

原创 [Paper Summary] Evaluating repres. by the complexity of learning low-loss predictors [Whitney 2020]

Evaluating representations by the complexity of learning low-loss predictorstl;dr上一个thread都是acc - complexity trade-off. 然而这篇认为As the eval dataset size changes, the dynamics of acc - complexity trade-off also changes. And if you use metrics that handles

2021-07-04 05:41:27 191

原创关于nan, 梯度爆炸，loss爆炸不完全总结笔记（自用）

1. 加自动监测函数找异常值import torch# 正向传播时：开启自动求导的异常侦测torch.autograd.set_detect_anomaly(True)# 反向传播时：在求导时开启侦测with torch.autograd.detect_anomaly(): loss.backward()2. 自定义的loss注意除法时分母不是0注意分母加eps保证计算稳定性3. log(0), sqrt(0) 会导致nan，0 * inf也会变成nan判断nan不能用 == 或 i

2021-05-19 04:34:55 2403

原创 Linux | create git repo from existing local folder (自用)

cd到你要建立仓库的目录下，git initInitialized empy Git repository in <path>git add .如果你这个目录是clone下来的，可能还保存着原先那个仓库的一些信息，会出现类似这种报错adding embedded git repository: <repo_name>这时可以git remote remove origingit commit在网页上登录你的github然后创建一个新的repo命名为&l.

2021-05-10 09:00:34 138

原创 doskey: Windows Command Alias(别名)设置

数据下载说明初赛阶段，我们的数据如下:amap_traffic_train.zip: 训练集标注数据，以文件夹名称为key，每个文件夹中包含一段3-5帧的jpg格式的图像序列,并按照顺序以1-5.jpg的顺序依次存放。amap_traffic_annotations_train.json: 训练集标注数据，按照顺序依次存放,以文件夹名称为key，序列的标注为value，标注格式说明请参考赛题页面描述。amap_traffic_test.zip: 初赛测试集视频序列。amap_traffic_an

2020-11-04 14:06:13 859

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（长文）

好早之前就发现逻辑斯蒂回归好想和朴素贝叶斯里面的后验概率公式还有最大似然、信息熵、交叉熵、伯努利分布、回归分析、几率（odds）等等有着千丝万缕CZFZ（错综复杂）、PSML（扑朔迷离）的关系。一直感觉逻辑斯蒂分布好像在很多地方都比较受宠并且有一些优良的性质经常被提及，不过又说不太清楚到底是怎么回事。于是，我终于，去翻了教材讲义知乎CSDN之后，把这些东西给理清楚了!这是一篇非非非非常长的文章也可以戳下面链接看分节的版本哟~细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（一）细讲逻辑

2020-06-18 11:41:26 1542 2

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（六）

第六节 —— 理顺逻辑斯蒂回归和最大熵模型千丝万缕的联系最大熵原理学习概率模型时，在所有可能的，满足约束条件的模型中，熵最大的模型最好。概率分布 P 的熵 =H(P)=−∑xP(x)logP(x)=H(P)=-\displaystyle \sum_xP(x)logP(x)=H(P)=−x∑P(x)logP(x)熵满足这个不等式：0≤H(P)≤log∣X∣0\leq H(P)\leq log|X|0≤H(P)≤log∣X∣. ∣X∣|X|∣X∣ 代表随机变量 X 所有可能的取值的个

2020-06-17 20:32:11 215

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（五）

第五节：分类器中的天真小弟 —— 朴素贝叶斯朴素贝叶斯文本分类模型考虑如下文本分类模型：P(yi,di)P(y_i, d_i)P(yi,di) 表示一篇文章以及它的 label 的联合概率。did_idi 指第 i 条训练数据中的文本。假设 did_idi 中每个词都是一个特征。条件独立分布假设：已知文本标签的条件下，特征的分布是相互独立的。（已知标签后 yiy_iyi，did_idi 的概率等于该文本中每个词出现的概率乘积。利用贝叶斯条件概率公式：P(yi,di)=P(y=yi)

2020-06-17 20:17:08 294

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（四）

第四节：神奇的吻合 —— 逻辑斯蒂回归的损失函数1. Logistic Loss —— Negative sum of log accuracy假设预测对得1分，否则0分，label ∈\in∈ {1, -1}那么，对于第i条训练数据，若真实 label = 1，得1分的概率为 11+exp(−w⃗Txi⃗)\frac{1}{1+exp(-\vec{w}^T\vec{x_i})}1+exp(−wTxi)1 若真实 label = -1，得1分的概率为 exp(−w⃗Txi⃗)1+exp(−

2020-06-17 20:05:45 227

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（三）

第三节 —— 视角切换到机器学习我们把刚才统计中的各种符号和术语渐变到机器学习中来。需要转换的术语统计回归分析机器学习x⃗i\vec x_ixiIndependent variable of the ithi^{th}ith sample第 i 条数据的特征向量β⃗\vec{\beta}βRegression coefficients权重（参数）w⃗\vec{w}w，需要学习的东西PiP_iPiDependent variable输出，要预测的东西

2020-06-17 20:02:06 333

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（二）

第二节 —— 统计回归分析中的逻辑斯蒂逻辑斯蒂分布设X是随机变量。逻辑分布指满足如下累计分布函数和概率密度函数的分布：F(x)=P(X≤x)=11+e−(x−μ)sF(x) = P(X \leq x) = \frac{1}{1+e^{ \frac{-(x- \mu )}{s}}}F(x)=P(X≤x)=1+es−(x−μ)1f(x)=F′(x)=e−(x−μ)ss(1+e−(x−μ)s)2f(x) = F'(x) = \frac{e^{ \frac{-(x- \mu )}{s}}}{s(1+

2020-06-17 20:00:22 188

原创细讲逻辑斯蒂回归与朴素贝叶斯、最大熵原理的爱恨交织（一）

好早之前就发现逻辑斯蒂回归好想和朴素贝叶斯里面的后验概率公式还有最大似然、信息熵、交叉熵、伯努利分布、回归分析、几率（odds）等等有着千丝万缕CZFZ（错综复杂）、PSML（扑朔迷离）的关系。一直感觉逻辑斯蒂分布好像在很多地方都比较受宠并且有一些优良的性质经常被提及，不过又说不太清楚到底是怎么回事。于是，我终于，去翻了教材讲义知乎CSDN之后，把这些东西给理清楚了！序幕 —— 另辟蹊径从生态学入手种群生态学考虑单一物种的生存模式和状态。其中，关于种群个体数量的变化，有所谓的马尔萨斯模型和逻辑斯

2020-06-14 09:06:36 383

原创《非对称风险》读书笔记（三）

第一卷，第二卷，第三卷部分的笔记欢迎阅读（一）~第四卷，第五卷部分的笔记欢迎阅读（二）~注：这篇读书笔记大体按照原书中的目录顺序，黑色字是原文摘录，蓝色字是我自己读后的想法。标黄的是我认为具有总结性或者非常有分量的句子。有少量摘录句子会因为考虑到内容的关联性放在了其他章节的笔记中，会有备注。·第六卷 · 再探代理人问题总的来说这一卷并没有提出什么新的论点，而且内容有点杂乱。所以逻辑上跟前...

2020-04-19 09:02:45 732

原创《非对称风险》读书笔记（二）

绪论，第一卷，第二卷部分的笔记欢迎阅读上一篇博客~注：这篇读书笔记大体按照原书中的目录顺序，黑色字是原文摘录，蓝色字是我自己读后的想法。标黄的是我认为具有总结性或者非常有分量的句子。有少量摘录句子会因为考虑到内容的关联性放在了其他章节的笔记中，会有备注。第三卷 · 狗群中的狼——奴役比我们想象得更普遍除了黑手党和修道院，其他行业会用更温和更微妙的手段让员工参与到“风险共担”之中。如何合...

2020-04-18 18:38:41 494

原创《非对称风险》读书笔记（一）

最近宅家读完了塔勒布的《非对称风险》一书。主要讲的是现实世界中不确定性带来的后果，用“风险共担”诠释了很多经济学，社会科学，政治，销售，宗教等领域的现象。个人比较佩服作者自圆其说的能力，因为并不是所有的观点都非常显而易见。但是作者总有办法把各种看起来不相关的领域和社会现象用风险共担这一共同的主线穿起来。这其中不免有牵强附会的点，但依旧有不少看待问题的新角度和新思路值得品味。注：这篇读书笔记大体按...

2020-04-17 21:19:32 1878

原创 Bit Manipulation —— LeetCode题解Single-number系列

Bit Manipulation需要把每一个变量都想象成一串二进制编码，用and,or, not等逻辑进行运算。在single-number系列的题中，需要把int想象成一串32位的二进制编码。所以所这种题要先转换成二进制的思维。解法的确需要“取巧”，如果没有见过此类题的话还挺难想出来的。Single-numberGiven an array of integers, every eleme...

2020-03-16 17:23:51 894

原创 LeetCode题解+代码——Implement strStr

在我之前的一篇文章中详细解读了String Matching的算法。下面是对KMP算法实现的代码：再来简要回顾一下算法的思路：算每一个状态 i 的影子向状态 shadow[i]. (注意shadow[i] < i)其中，shadow[i] = σ\sigmaσ(text), where text = P[0,…,i-1], pattern = P[0,…,i-2]. σ\sigma...

2020-03-05 09:31:41 153

原创 String Matching 字符串匹配算法——干货从头放到尾

写这篇技术博客的动机是因为做 Leetcode “Implement strStr” 一题学会了KMP算法，觉得这个第一次学还挺绕的就想记录一下解题思路，不过后来又补充了好多好多前前后后关于字符串匹配的算法知识，这篇文章就变成了...

2020-02-25 21:39:01 1215

翻译 Language Models are Unsupervised Multitask Learners 论文纪要

GPT-2一出来就被刷屏，然而官方称害怕被用来生成恶意语料就没有开源，而是release了一个小很多的模型（并且有人实践证明小模型要辣鸡很多）。不过除了津津乐道吃瓜，以及暗暗羡慕财大气粗的openAI砸钱搞出这个巨无霸模型，甩其他学术界小喽喽几百条街之外，还是要把paper拿过来好好盘一盘。

2019-05-13 10:46:30 1487

空空如也

空空如也