eol python_使用Python如何识别特殊的eol字符？

最新推荐文章于 2024-05-29 20:12:12 发布

weixin_39525300

最新推荐文章于 2024-05-29 20:12:12 发布

阅读量389

点赞数

文章标签： eol python

I'm scraping a set of originally pdf files, using Python. Having gotten them to text, I had a lot of trouble getting the line endings out. I couldn't figure out what the line separator was. The trouble is, I still don't know.

It's not a '\n', or, I don't think, '\r\n'. However, I've managed to isolate one of these special characters. I literally have it in memory, and by doing a call to my_str.replace(eol, ''), I can remove all of these characters from one of my files.

So my question is open-ended. I'm a bit lost when it comes to unicode and such. How can I identify this character in my files without resorting to something ridiculous, like serializing it and then reading it in? Is there a way I can refer to it as a code, perhaps? I can't get Python to yield what it actually IS. All I ever see if I print it, or call unicode(special_eol) is the character in its functional usage as a newline.

Please help! Thanks, and sorry if I'm missing something obvious.

解决方案

To determine what specific character that is, you can use str.encode('unicode_escape') or repr() to get (in Python 2) a ASCII-printable representation of the character:

>>> print u'☃'.encode('unicode_escape')

\u2603

>>> print repr(u'☃')

u'\u2603'

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39525300

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
eol python_使用Python如何识别特殊的eol字符？

I'm scraping a set of originally pdf files, using Python. Having gotten them to text, I had a lot of trouble getting the line endings out. I couldn't figure out what the line separator was. The troubl...
复制链接

扫一扫