Vim advanced usage

好久木有更新了。

今天在做文本隐写的时候,有一个需求:

Mazhuang ⁤‍⁤‌‌‍⁤⁡⁢‌⁤‍‌⁡‍‍‍⁡‌⁡‍⁢‍‌‍‍⁣‌⁡‍⁣⁢⁣‍⁣⁢⁣⁢⁡⁢⁣⁢⁣⁢‍⁣⁡is here

复制这个字符串,你会发现里面有隐写内容。那么是什么样呢?保存成文件用vim打开后发现

因为一般的文本编辑器软件无法显示这些character,这些是什么?

Zero-Width Characters(零宽空格),是一种不可打印的unicode字符,原本是为了排版美观用的,但是正好为我们对文本文件,或者文本字符串进行隐写操作,好极了。

U+200B | Zero-width space

The zero-width space can be used to enable line wrapping in long words, when using languages that don’t use spaces to separate words, or after certain characters like a slash /. Most applications treat the zero-width space like a regular space for word wrapping purposes, even though it is not visible.


U+200C | Zero-width non-joiner

The zero-width non-joiner is a non-printing character used in writing systems that make use of ligatures. When placed between two characters that would otherwise be combined into a ligature, a zero-width non-joiner tells the font engine not to combine them.

U+200D | Zero-width joiner

The zero-width joiner is essentially the opposite of the zero-width non-joiner. When placed between two characters that would otherwise not be connected, a zero-width joiner causes them to be printed in their connected forms (if they have one).

U+2060   |  Word joiner

The word joiner, used in language scripts that don’t use spaces, prevents a line break between two characters, which can be useful to manually control sentence breaking in applications that don’t know how to properly wrap text in some languages. Its function is identical to the zero-width no-break space.

Many applications don’t have support for the word joiner, including Internet Explorer. Chrome and Firefox support the word joiner however.

U+FEFF  |  Zero-width no-break space

The zero-width no-break space is no longer to be used for its original purpose. Character U+FEFF should solely be used as a Byte Order Mark at the start of a Unicode text file. To keep lines from breaking between two characters, the word joiner (above) should be used instead.

跑题了。

我的需求是能够在最短的时间内去除所有零宽空格。可以使用sed -i来做,可自行查询,这里只说vim。

:%s/\%u200c\|\%u200d//gc

提示:

replace with  (y/n/a/q/l/^E/^Y)?

输入 'a'

就行了,%u其实是unicode的意思  \| 代表的是分支,可匹配多个pattern,/gc 代表全局,并且确认询问,当然也可以不加c。

vim的资料真的很少,Vim: change.txt,还是老老实实看官网。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值