0X01 问题描述
语雀导出的markdown格式的文档,无法利用hexo,或csdn对其渲染,因为语雀含有防盗链机制,导致图片无法加载。
该脚本将会收集语雀中markdown中的图片链接,自动下载到脚本的同级目录的images目录下,并生成新的test.md文档,替换掉原有的图片链接。
0X02 代码实现
项目路径: GitHub - misaki7in/yuque
import os
import re
import requests
import glob
def mkdir(file_path):
file_path= file_path
isExists = os.path.exists(file_path)
if not isExists:
os.makedirs(file_path)
return file_path
def get_image(md_file):
img_list= []
with open(md_file,"r",encoding="utf-8") as f:
for line in f.readlines():
line = re.sub(r"png#.*","png",line)
#line = line.split(']')[1]
if ('](https://' in line and 'png' in line):
line = line.split('(')[1].rstrip()
img_list.append(line)
return img_list
def img_download(img_list):
for url in img_list:
img_name = url.split('/')[-1]
img_path = "./images/"
#img_path = url.replace("https://cdn.nlark.com","").replace(img_name,"")
img_name2 = img_path + img_name
r = requests.get(url, stream=True,timeout=5) #增加超时,避免requests一直等待
if r.status_code == 200:
print(img_name)
print(img_path)
print(img_name2)
mkdir(img_path)
open(img_name2, 'wb').write(r.content)
print(img_name2 + " download success!")
else:
print(img_name2 + "download failed!")
def new_md(md_file,new_file,domain):
with open(md_file,"r",encoding="utf-8") as f:
for line in f.readlines():
with open(new_file,"a",encoding="utf-8") as f:
if ('](https://' in line and 'png' in line):
line = re.sub(r"png#.*","png)",line)
url = line.split('/')[-1].rstrip()
url = domain + url
line = re.sub(r"https?://[^\s/$.?#].[^\s]*\.(?:png|jpe?g|gif)",url,line)
line = line.split(')')[0]+')'
#line =line.replace("https://cdn.nlark.com",domain)
line =line.replace("image.png","")
f.write(line.rstrip())
else:
f.write(line)
if __name__ == "__main__":
mdfile_list = glob.glob("*.md")
#for md_file in mdfile_list:
# img_list = get_image(md_file)
# img_download(img_list)
domain = input("please input the domain\r\n")
for md_file in mdfile_list:
img_list = get_image(md_file)
img_download(img_list)
new_md(md_file,"test.md",domain)
注意:
1.替换图片链接的功能,需要自己将下载后的图片上传至指定图传,或者自己的WEB服务器上,并且文件名称不要改变。
2.脚本使用时,需要将同目录的readme.md文件删除,否则readme.md会影响文档输出
0X03 使用效果
1.执行脚本文件,注意时python3版本,输入新的图片网络路径,注意后面要加/
2.如下所示,生成images目录,里面含有下载的图片
3.比较test.md文档和原始文档,可以发现图片链接已经自动改变