语雀批量导出与图片下载

最新推荐文章于 2025-03-06 11:11:13 发布

魅Lemon

最新推荐文章于 2025-03-06 11:11:13 发布

阅读量5.1k

点赞数 7

分类专栏： # IDETools 文章标签： python 开发语言

本文链接：https://blog.csdn.net/lemon_TT/article/details/128380655

版权

IDETools 专栏收录该内容

4 篇文章

订阅专栏

文章目录

一、简介
二、导出文档图片批量替换
三、Markdown中的图片转换到本地
四、文档批量下载

一、简介

在云笔记方面我一般使用wolai和语雀，本地笔记用Typora，但是这两个云笔记各有利弊

wolai的导出可以随md文件直接生成对应的图片文件夹，而且可以直接批量导出(需要企业版)，但是普通账户的图床容量只有200M
语雀个人账户的图床拥有10G容量，但是只支持单个文件导出，而且导出md文档的时候图片使用的还是语雀的图床，断网会导致不可访问，而且不能进行本地离线备份

因此今天这篇文章就记录一下语雀如何进行图片本地化保存以及文档批量备份下载

二、导出文档图片批量替换

在实际的使用中，有几个网站是可以获取到语雀图片的（不用重新上传，自动转存）

微信公众号
csdn
掘金
知乎

但是还是需要将语雀图片的后缀给去掉，第一种方法是无需运行脚本，如果Typora支持正则，直接正则匹配#clientId=[a-z0-9-&=%.]*(注意可能会变，自己根据实际情况来进行替换)，将这串字符给全部替换为空；

第二种需要进行跑python脚本，然后运行python test.py [源文章] [目标文章](举例：python test.py test.md test2.md)

import re
import requests
import os
import sys

output_content = []

def deal_yuque(origin_md_path, output_md_path):
    with open(origin_md_path, 'r', encoding='utf-8', errors='ignore') as f:
        for line in f.readlines():
            line = re.sub(r'png#(.*)+', 'png)', line)
            image_url = str(re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+',line))
            output_content.append(line)
    with open(output_md_path, 'w', encoding='utf-8', errors='ignore') as f:
        for _output_content in output_content:
            f.write(str(_output_content))



def main():
    origin_md_path = sys.argv[1]
    output_md_path = sys.argv[2]
    deal_yuque(origin_md_path, output_md_path)


if __name__ == '__main__':
    main()

三、Markdown中的图片转换到本地

参考：https://github.com/u21h2/yuque2md

可以根据自己定制批量化的修改操作，修改原理识别文档的图片地址，并自动下载到本地，最后替换文档中的路径

import re
import requests
import os
import sys


yuque_cdn_domain = 'cdn.nlark.com'
output_content = []
image_file_prefix = 'image-'


# origin_md_path: 输入的markdown文件路径
# output_md_path: 输出的markdown文件路径
# image_dir: 图片存储的目录
# image_url_prefix: 图片链接前缀，空字符串或者路径或者CDN地址
# image_rename_mode: 图片重命名模式，raw: 原始uuid模式，asc: 递增重命名模式
def deal_yuque(origin_md_path, output_md_path, image_dir,image_url_prefix,image_rename_mode):
    idx = 0
    with open(origin_md_path, 'r', encoding='utf-8', errors='ignore') as f:
        for line in f.readlines():
            line = re.sub(r'png#(.*)+', 'png)', line)
            image_url = str(re.findall(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+',line))
            # 如果只下载语雀的图片可以在这里加个判断
            # if yuque_cdn_domain in image_url:
            if ('https://' in image_url) and ('.png' in image_url):
                image_url = image_url.replace('(', '').replace(')', '').replace('[', '').replace(']', '').replace("'", '')
                if '.png' in image_url:
                    suffix = '.png'
                elif '.jpeg' in image_url:
                    suffix = '.jpeg'
                download_image(image_url, image_dir, image_rename_mode, idx, suffix)
                to_replace = '/'.join(image_url.split('/')[:-1])
                new_image_url = image_url.replace(to_replace, 'placeholder')
                if image_rename_mode == 'asc':
                    new_image_url = image_url_prefix + image_file_prefix + str(idx) + suffix
                else:
                    new_image_url = new_image_url.replace('placeholder/',image_url_prefix)
                idx += 1
                line = line.replace(image_url, new_image_url)
            output_content.append(line)
    with open(output_md_path, 'w', encoding='utf-8', errors='ignore') as f:
        for _output_content in output_content:
            f.write(str(_output_content))
    return idx


def download_image(image_url, image_dir, image_name_mode, idx, suffix):
    r = requests.get(image_url, stream=True)
    image_name = image_url.split('/')[-1]
    if image_name_mode == 'asc':
        image_name = image_file_prefix + str(idx) + suffix
    if r.status_code == 200:
        open(image_dir+'/'+image_name, 'wb').write(r.content)
    del r



def mkdir(image_dir):
    image_dir = image_dir.strip()
    image_dir = image_dir.rstrip("\\")
    isExists = os.path.exists(image_dir)
    if isExists:
        print('图片存储目录已存在')
    else:
        os.makedirs(image_dir)
        print('图片存储目录创建成功')
    return image_dir



def main():
    origin_md_path = sys.argv[1]
    output_md_path = sys.argv[2]
    image_dir = sys.argv[3]
    image_url_prefix = sys.argv[4]
    image_rename_mode = sys.argv[5] # raw asc
    mkdir(image_dir)
    cnt = deal_yuque(origin_md_path, output_md_path, image_dir, image_url_prefix, image_rename_mode)
    print('处理完成, 共{}张图片'.format(cnt))



if __name__ == '__main__':
    # origin_md_path = input('原文件路径：') 
    # output_md_path = input('目标输出文件路径：')
    # image_dir = input('图片存储路径：')
    # image_url_prefix = input('文档图片前缀(默认为当前路径)：') or ''
    # image_rename_mode = input('图片重命名模式(raw和asc默认为asc)：') or 'asc'
    # mkdir(image_dir)
    # cnt = deal_yuque(origin_md_path, output_md_path, image_dir, image_url_prefix, image_rename_mode)
    # print('处理完成, 共{}张图片'.format(cnt))
    main()

四、文档批量下载

参考：https://github.com/dzh929/ExportMD-rectify-pics
https://www.yuque.com/duzh929/blog/ocffqg

改导出方法不仅批量导出md文档，而且图片也以文件夹方式保存在本地

git clone https://github.com/dzh929/ExportMD-rectify-pics.git
cd ExportMD-rectify-pics
pip install -r requirements.txt
python ExportMD.py
# 对于namespace的获取
# 知识库 https://www.yuque.com/YourYuqueUserName 对应的 namespace 为 YourYuqueUserName
# Token需要创建

# 如果发生错误，删除.userinfo后重试