爬取中财网内容页
一部分内容页,新闻主题部分的内容unicode编码替换了
例如
http://industry.cfi.cn/p20210415000078.html
代码:
import requests
from bs4 import BeautifulSoup
import re
import time
import json
import urllib.parse
# url = 'http://industry.cfi.cn/p20210415000078.html'
# url = 'http://industry.cfi.cn/p20210415000413.html'
url = 'http://industry.cfi.cn/p20210411000066.html'
r = requests.get(url,)
html = r.text
# print(html)
if '</div><script>var'