2024年调研学习文档资料汇总

福将～白鹿

已于 2024-04-29 15:42:10 修改

阅读量443

点赞数 3

文章标签：学习大模型

于 2024-04-03 11:34:10 首次发布

本文链接：https://blog.csdn.net/qq_41475067/article/details/137338340

版权

3月学习文档

1、chatGLM实践：https://zhuanlan.zhihu.com/p/622686205?utm_id=0
chatGLM模型结构与核心代码解读：
https://blog.csdn.net/weixin_42878111/article/details/134017313
2、图文分类：https://huggingface.co/docs/transformers/model_doc/chinese_clip
（图像相似检测、零样本分类、图文特征抽取见git代码）
ChineseClip代码，预训练、图文特征抽取、零样本图像分类（支持文到图、图到文检索的top-k召回结果，详见readme）：
https://github.com/OFA-Sys/Chinese-CLIP/tree/master?tab=readme-ov-file#API%E5%BF%AB%E9%80%9F%E4%B8%8A%E6%89%8B

3、多卡训练：https://blog.csdn.net/qq_51392112/article/details/129737803
4、图像质量打分模型（图像质量打分模型，训练语料只质量由高到低写到train.txt中）：
https://github.com/zheng-yuwei/RankIQA.PyTorch/?tab=readme-ov-file
5、图文相似匹配：https://huggingface.co/OFA-Sys/chinese-clip-rn50
6、句子文本向量表示：https://huggingface.co/shibing624/text2vec-base-chinese
bert4vec：一个基于预训练的句向量生成工具：
https://github.com/zejunwang1/bert4vec/tree/main?tab=readme-ov-file

7、中文文本摘要：
https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese
rouge安装：pip install rouge

8、bert系列人物集成（分类+阅读理解）：
https://github.com/CLUEbenchmark/CLUE/tree/master
9、fasttext模型训练代码：（意图打标服务用的这个脚本）
https://github.com/649453932/Chinese-Text-Classification-Pytorch
fasttext模型训练脚本：（游戏信息分类服务在用）
https://github.com/Tencent/NeuralNLP-NeuralClassifier?tab=readme-ov-file

10、bert和lert模型区别：https://zhuanlan.zhihu.com/p/664200148?utm_id=0
11、大模型图生文，模型太大，每个模型需要28G，下载耗时严重；运行需要28G及以上的显存资源，并且只能在V100及以上的机器上才能运行：
https://huggingface.co/THUDM/CogVLM/tree/main
12、ChatGLM3-6B中文纠错LoRA模型:
https://huggingface.co/shibing624/chatglm3-6b-csc-chinese-lora
13、中文拼写纠错和中文语法纠错：
https://github.com/shibing624/pycorrector/tree/master

调研评价：字粒度纠错还凑合，词粒度纠错效果差，在难样本面前就是个傻子，Macbert整体文本纠错base效果要强于其他模型

MacBert4csc模型介绍：
https://github.com/shibing624/pycorrector/blob/master/examples/macbert/README.md

14、最新的中文大模型chatGLM3：https://huggingface.co/THUDM/chatglm3-6b
15、天池数据集，需要登陆下载：https://tianchi.aliyun.com/dataset/138195
16、头条，中文文本分类数据集：https://huggingface.co/datasets/fourteenBDr/toutiao
17、数据可视化：https://www.gradio.app/guides/quickstart
gradio官方文档：https://www.gradio.app/docs/image
18、腾讯预训练平台：https://github.com/Tencent/TencentPretrain/tree/main
19、llama中文预训练模型：https://github.com/LlamaFamily/Llama-Chinese?tab=readme-ov-file#-%E6%A8%A1%E5%9E%8B%E9%A2%84%E8%AE%AD%E7%BB%83

4月学习记录
1、Llama3中文大模型（已经应用到蓝厂浏览器AI摘要项目）
Llama3中文大模型，支持v100，32G显存机器运行：https://github.com/LlamaFamily/Llama-Chinese
2、vivo蓝心大模型
模型下载地址：https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K/tree/main
Git仓库地址：https://github.com/vivo-ai-lab/BlueLM

福将～白鹿

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
2024年调研学习文档资料汇总

2、图文分类：https://huggingface.co/docs/transformers/model_doc/chinese_clip。3、多卡训练：https://blog.csdn.net/qq_51392112/article/details/129737803。18、腾讯预训练平台：https://github.com/Tencent/TencentPretrain/tree/main。1、chatGLM实践：https://zhuanlan.zhihu.com/p/622686205?
复制链接

扫一扫