1. Detail step
Step 1: Replace HTML special characters with named character references
& ---> &
< ---> <
> ---> >
Step 2: Replace all line breaks with <br>
result = re.sub("\r\n?|\n", "<br>", subject)
Step 3: Replace double <br> tags with </p><p>
result = re.sub(r"<br>\s*<br>", "</p><p>", subject)
Step 4: Wrap the entire string with <p>...</p)
result = "<p>" + subject + "</p>"
2. Python code
def plainTextToHtml(subject):
import re
# Step 1 (plain text searches)
subject = re.sub("&", "&", subject)
subject = re.sub("<", "<", subject)
subject = re.sub(">", ">", subject)
# Step 2
subject = re.sub("\r\n?|\n", "<br>", subject)
# Step 3
subject = re.sub(r"<br>\s*<br>", "</p><p>", subject)
# Step 4
subject = "<p>" + subject + "</p>"
return subject
3. Test
In [2]: plainTextToHtml("Test.")
Out[2]: '<p>Test.</p>'
In [3]: plainTextToHtml("Test.\n")
Out[3]: '<p>Test.<br></p>'
In [4]: plainTextToHtml("Test.\n\n")
Out[4]: '<p>Test.</p><p></p>'
In [5]: plainTextToHtml("Test1.\nTest2.")
Out[5]: '<p>Test1.<br>Test2.</p>'
In [6]: plainTextToHtml("Test1.\n\nTest2.")
Out[6]: '<p>Test1.</p><p>Test2.</p>'
In [7]: plainTextToHtml("< AT&T >")
Out[7]: '<p>< AT&T ></p>'
Python 正则表达式将纯文本转化为HTML格式
最新推荐文章于 2024-03-28 15:29:44 发布