自制副业神器！微信对话视频生成器

最新推荐文章于 2025-04-18 09:36:05 发布

我只有三天不想上班

最新推荐文章于 2025-04-18 09:36:05 发布

阅读量4.3k

点赞数 1

文章标签：后端

本文链接：https://blog.csdn.net/sinat_41715275/article/details/128523229

版权

相信你有在抖音或视频号刷到过这样的视频

这样由微信对话合成的视频，因为内容有趣，很多人都喜欢看，播放量都是很恐怖的。做这样的视频也很赚钱，某位大v的收益 alt

原理就是利用巨大的播放量，添加商品推广链接，有人通过链接购买商品，你赚取佣金。

制作这样的视频，一般做法是：

去网上找对话素材
找一个模拟微信对话的工具
使用对话模拟器生成图片
对图片进行剪辑，生成视频

这样下来几个步骤其实挺麻烦的，并且制作每个视频其实都属于是重复劳作。

所以我们今天，教大家制作一个一键生成微信对话视频的工具，让你能够省时省力，抢占先机！

先看成品

只需要在界面点点点就能生成视频了，很方便有木有！

技术原理

用到的技术有

爬虫——从百度爬取图片
ocr识别——识别图片文字
模拟微信对话工具
server服务
playwright自动化操作
图片处理
音频处理
视频处理
桌面gui

原来用到了这么多技术！不过不用担心，我都已经写好了，本文就是教大家怎么实现的，想直接用成品的同学，点击下面的链接获取！

链接：https://pan.baidu.com/s/1u78R9vhtnY5fwehlK17Z6Q?pwd=yb87 提取码：yb87

思路

根据关键字在百度图片批量下载数据
使用ocr识别图片中的文字
分析文字的位置，区分左右对话
ocr精度有限，手工校对文字
启动一个模拟微信对话服务
进行自动化操作，将结果进行截图
将截图、音频进行合成视频

根据关键字在百度图片批量下载数据

我们从https://image.baidu.com/search搜索图片，提取出每张图片的地址，然后进行下载，主要代码如下(代码为主要部分代码)

url = (
    "https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj"
    "&ct=201326592&is=&fp=result&queryWord=%s&cl=2&lm=-1&ie=utf-8&oe=utf-8"
    "&adpicid=&st=-1&z=&ic=&hd=&latest=&copyright=&word=%s&s=&se=&tab=&width=&height=&face=0"
    "&istype=2&qc=&nc=1&fr=&expermode=&force=&pn=%s&rn=%d&gsm=1e&1594447993172="
    % (search, search, str(pn), self.__per_page)
)
# 设置header防403
try:
    time.sleep(self.time_sleep)
    req = urllib.request.Request(url=url, headers=self.headers)
    page = urllib.request.urlopen(req)
    self.headers["Cookie"] = self.handle_baidu_cookie(
        self.headers["Cookie"], page.info().get_all("Set-Cookie")
    )
    rsp = page.read()
    page.close()
except UnicodeDecodeError as e:
    self.logger.error(e)
    self.logger.error("-----UnicodeDecodeErrorurl:", url)

使用ocr识别图片中的文字

使用cnocr库，我们就能实现本地ocr识别，而不需要网络，代码如下

self.model = CnOcr(
            det_model_name="ch_PP-OCRv3_det",
        )
out = self.model.ocr(
    cv2.imdecode(self.read_file(f), -1),
)
for i in out:
    i["score"] = float(i["score"])
    i["position"] = i["position"].astype(float).tolist()
    target_path.joinpath(file_name + ".json").write_text(
        json.dumps(out, ensure_ascii=False, indent=4)
    )

分析文字的位置，区分左右对话

我们基于文字的内容和位置，判断文字是属于

标题——距离顶部的位置<30并字数不大于5
同一句话——两句话的距离<50
左边的人说的——句子位置离左边更近
右边的人说的——句子位置离右边更近主要逻辑如上，代码如下

res = []
"""去掉标题栏"""
if json_content and "中国" in json_content[0]["text"]:
    # 中国联通行，去掉
    x = json_content[0]["position"][0][0]
    for i in range(1, len(json_content)):
        if abs(json_content[i]["position"][0][0] - x) <= 20:
            continue
        else:
            break
    json_content = json_content[:i]

"""去掉‘微信’"""
if (
    json_content
    and "微信" in json_content[0]["text"]
    and len(json_content[0]["text"]) < 8
):
    json_content = json_content[1:]

"""首行是否为标题"""
if not json_content:
    return res
left_top_position = json_content[0]["position"][0]
left_top_position_x = left_top_position[0]
left_top_position_y = left_top_position[1]
if left_top_position_y < 30 and len(json_content[0]["text"]) < 5:
    # 认为是标题
    res.append({"position": "title", "text": json_content[0]["text"]})
    json_content = json_content[1:]

"""同一句话判断的阈值"""
same_sentence_threshold = 30
for i in range(1, len(json_content)):
    same_sentence_threshold = min(
        same_sentence_threshold,
        abs(
            json_content[i]["position"][0][1]
            - json_content[i - 1]["position"][0][1]
        ),
    )
same_sentence_threshold = max(50, same_sentence_threshold + 35)  # 误差

if not json_content:
    return res
"""找到左侧和右侧的位置"""
left_around_position = min([i["position"][0][0] for i in json_content])
right_around_position = max([i["position"][1][0] for i in json_content])

"""判断左右"""
n = len(json_content)
text = ""
position_left = 0
position_right = 0
for i in range(n):
    if re.compile(r"[0-9]{1,2}:[ ]{0,1}[0-9]{1,2}").findall(
        json_content[i]["text"]
    ):
        # 微信时间
        continue
    if "微信" in json_content[i]["text"]:
        # ”微信“标题
        continue

    if (
        i > 0
        and abs(
            json_content[i]["position"][0][1]
            - json_content[i - 1]["position"][0][1]
        )
        < same_sentence_threshold
    ):
        # 认为当前话跟上一句话是同一句话
        text += json_content[i]["text"]
    else:
        # 现在是另一个人说话，将上一个说的话保存
        if text:
            if res and res[-1]["position"] == "left":
                # 如果上一句话是左边说的，我们更倾向于下一句话是右边的人说的
                float_value = 25
            else:
                # 否则，更倾向于左边的人说的
                float_value = -5
            if not res:
                # 第一句话更倾向于右边的人说的
                float_value = 25
            if abs(position_left - left_around_position) + float_value < abs(
                position_right - right_around_position
            ):
                # 离左侧更近
                res.append({"position": "left", "text": text})
            else:
                # 离右侧更近
                res.append({"position": "right", "text": text})
        text = json_content[i]["text"]
        position_left = json_content[i]["position"][0][0]
        position_right = json_content[i]["position"][1][0]
if text:
    if res and res[-1]["position"] == "left":
        # 如果上一句话是左边说的，我们更倾向于下一句话是右边的人说的
        float_value = 25
    else:
        # 否则，更倾向于左边的人说的
        float_value = -5
    if not res:
        # 第一句话更倾向于右边的人说的
        float_value = 25
    if abs(position_left - left_around_position) + float_value < abs(
        position_right - right_around_position
    ):
        # 离左侧更近
        res.append({"position": "left", "text": text})
    else:
        # 离右侧更近
        res.append({"position": "right", "text": text})
if len(res) == 1:
    return []
return res

当然了，判断无法百分百正确，所以我们后续添加了手工修正

ocr精度有限，手工校对文字

基于flet库, 实现了ui界面

启动一个模拟微信对话服务

我们启动了一个fastapi服务，用于调用模拟微信对话

app = FastAPI()

app.mount(
    "/static",
    StaticFiles(directory=MAIN_PATH.joinpath("weixin_chat", "static")),
    name="static",
)

@app.route("/")
def index(*args, **kwargs):
    return HTMLResponse(
        MAIN_PATH.joinpath(
            "weixin_chat",
            "index.html",
        ).read_text(encoding="utf-8")
    )

进行自动化操作，将结果进行截图

使用playwright库，打开浏览器，自动化进行操作，输入每条对话内容后，进行截图，保存到本地。

browser = playwright.chromium.launch(headless=True)
context = browser.new_context()

page = context.new_page()
page.goto("http://127.0.0.1:36999")
page.wait_for_load_state()
"""生成标题"""
if formatted_jsons[0]["position"] == "title":
    page.fill(title, formatted_jsons[0]["text"])
    formatted_jsons = formatted_jsons[1:]
else:
    titles = ["佳佳", "小小", "♥", "❀", "啊呜", "奔波儿灞与灞波儿奔", "亲亲"]
    page.fill(title, random.choice(titles))
time.sleep(0.2)
"""跳转到对话页"""
page.click(
    "#vueApp > div > div.edit-content > div.tab > ul > li:nth-child(2) > a"
)
time.sleep(0.2)
page.wait_for_selector(
    "#tabContent2 > div > div.dialog-user-items > div:nth-child(1) > div > a.dialog-user-face-a > input[type=file]"
)
"""清空对话"""
page.on("dialog", lambda dialog: dialog.accept())
page.click(clear_conv)
time.sleep(0.2)
"""选取头像"""
photos = self.two_random_photo()
page.set_input_files(my_photo, photos[0])
time.sleep(0.2)
page.set_input_files(second_photo, photos[1])
time.sleep(0.2)
_uuid = "".join(
    re.compile(r"[0-9a-zA-Z\u4e00-\u9fa5]*").findall(
        "".join(map(lambda e: e["text"], formatted_jsons))
    )
)
save_path = WORK_PATH.joinpath(self.keyword, "images", _uuid[:15])
save_path.mkdir(parents=True, exist_ok=True)
index = 0
for _json in formatted_jsons:
    if _json["position"] == "left":
        page.click(select_left)
    else:
        page.click(select_right)
    time.sleep(0.2)
    page.fill(input_words, _json["text"])
    page.click(add_words)
    time.sleep(0.2)
    save_file = save_path.joinpath(f"{index}.jpg")
    page.locator(target_area).screenshot(
        path=save_file, quality=100, type="jpeg"
    )
    index += 1

将截图、音频进行合成视频

截图、音频合成视频，我们用到了moviepy,Pillow库，将图片按照名称，以固定的间隔拼合为视频，并在每个拼合的位置添加微信消息提示音。

images = glob.glob(str(path.joinpath("*.jpg")))
if not images:
    return
images = sorted(images, key=lambda e: int(e.split("\\")[-1].split(".")[0]))
images = self.resize_images(images)
if not self.output_path:
    video_path = WORK_PATH.joinpath(self.keyword, "output")
    video_path.mkdir(parents=True, exist_ok=True)
else:
    video_path = Path(self.output_path)
video_file = video_path.joinpath(str(path).split("\\")[-1] + ".mp4")
fps = 1 / 1.5
video_clip = ImageSequenceClip(images, fps=fps)
during = video_clip.duration
wechat_audio = AudioFileClip(
    str(MAIN_PATH.joinpath("wechat_sound", "9411.mp3"))
)
audio_clips = []
i = 0
while i < during:
    audio_clips.append(wechat_audio.set_start(i))
    i += 1.5
final_audio_clip = CompositeAudioClip(audio_clips).set_fps(44100)
video_clip = video_clip.set_audio(final_audio_clip.subclip(0, during))
video_clip.write_videofile(str(video_file))
self.logger.info(f"{video_file}生成完成")