python html解析纯文本_Python 正则表达式将纯文本转化为HTML格式

1. Detail step

Step 1: Replace HTML special characters with named character references

& ---> &

< ---> <

> ---> >

Step 2: Replace all line breaks with

result = re.sub("\r\n?|\n", "
", subject)

Step 3: Replace double
tags with

result = re.sub(r"
\s*
", "

", subject)

Step 4: Wrap the entire string with

...

result = "

" + subject + "

"

2. Python code

def plainTextToHtml(subject):

import re

# Step 1 (plain text searches)

subject = re.sub("&", "&", subject)

subject = re.sub("

subject = re.sub(">", ">", subject)

# Step 2

subject = re.sub("\r\n?|\n", "
", subject)

# Step 3

subject = re.sub(r"
\s*
", "

", subject)

# Step 4

subject = "

" + subject + "

"

return subject

3. Test

In [2]: plainTextToHtml("Test.")

Out[2]: '

Test.

'

In [3]: plainTextToHtml("Test.\n")

Out[3]: '

Test.

'

In [4]: plainTextToHtml("Test.\n\n")

Out[4]: '

Test.

In [5]: plainTextToHtml("Test1.\nTest2.")

Out[5]: '

Test1.
Test2.

'

In [6]: plainTextToHtml("Test1.\n\nTest2.")

Out[6]: '

Test1.

Test2.

'

In [7]: plainTextToHtml("< AT&T >")

Out[7]: '

< AT&T >

'
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值