1 gpt介绍
transformer的介绍
output是label
gpt1——gpt4的经历
gpt1
主要背下面的这段
结构就是tansformer的encoder
gpt2
Zero-shot learning(零样本学习)是机器学习中的一个概念,指的是模型在没有接收到任何针对特定任务的样本数据进行训练的情况下,能够完成这一任务的能力。这种学习方式与传统的机器学习方法形成对比,后者通常需要大量的标注数据来训练模型以完成特定的任务。
gpt3
chatgpt
gpt4
gpt本身如何使用
1 openai的使用就是gpt变成程序设置可以看pdf讲义week02 -2
2 数据分析的主要用途,重点在于如何输入数据:
1 概括信息总部
products 该表单的shape为(32951, 9) 该表单包括如下特征['product_id', 'product_category_name', 'product_name_lenght', 'product_description_lenght', 'product_photos_qty', 'product_weight_g', 'product_length_cm', 'product_height_cm', 'product_width_cm'] order 该表单的shape为(99441, 8) 该表单包括如下特征['order_id', 'customer_id', 'order_status', 'order_purchase_timestamp', 'order_approved_at', 'order_delivered_carrier_date', 'order_delivered_customer_date', 'order_estimated_delivery_date'] payments 该表单的shape为(103886, 5) 该表单包括如下特征['order_id', 'payment_sequential', 'payment_type', 'payment_installments', 'payment_value'] customers 该表单的shape为(99441, 5) 该表单包括如下特征['customer_id', 'customer_unique_id', 'customer_zip_code_prefix', 'customer_city', 'customer_state']
2 抽取部分
3分段输入
4 提问gpt概括然后输入信息
得到程序以后,直接得到信息的概括信息来计算
那些gpt插件可以用?
pdf 分析pdf wolfrm数学处理 scholar al写论文 webpilot 找资料
vscode+chatgpt
VSCode集成ChatGPT插件:ChatGPT中文版_vscode chatgpt插件-CSDN博客
excel如何使用
我想在在L6:L15中填写信息 ,帮我生成excel公式,公式遵循这样的规则:"如果I5,J5和K5的值都等于1,那么这个客户被分类为""重要价值客户""。
如果I5和J5的值等于1,但K5的值等于0,那么这个客户被分类为""潜力客户""。
如果I5和K5的值等于1,但J5的值等于0,那么这个客户被分类为""重要深耕客户""。
如果I5的值等于1,但J5和K5的值等于0,那么这个客户被分类为""新客户""。
如果J5和K5的值等于1,但I5的值等于0,那么这个客户被分类为""重要唤回客户""。
如果J5的值等于1,但I5和K5的值等于0,那么这个客户被分类为""一般客户""。
如果K5的值等于1,但I5和J5的值等于0,那么这个客户被分类为""挽回客户""。
如果I5,J5和K5的值都等于0,那么这个客户被分类为""流失客户""。
大模型的种类
语言类大模型: openal的3 文字提到了GPT-3、GPT-3.5和GPT-4这几个版本。同时,还提到了模型的不同变体,例如Ada、Babbage、Curie和DaVinci,谷歌模型:palm2
图像大模型:最新版为DALL·E OpenAI将大语言模型的理解能力“复制”到视觉领域的核心方法:将图像视作一种一种语言,将其转化为Token,并和文本Token一起进行训练;
语音识别:语音识别模型:最新版为Whisper v2-large model,可以调用api
文本向量化模型:embedding-ada-002 就是分析用的
总结一下:
text-davinci-003——gpt-3.5-turbo-instruct 现在模型改变了
大模型训练过程
预训练+微调:微调就是微小的调整 就是特定的进行设置
大模型的概念:
自回归与生成式:前者用规律,后者会有随机性
自回归和双向自回归:
主要的区别就在于自回归模型只看前文,而双向自回归模型会同时考虑前文和后文。
大模型微调的方法
openal只能微调在线大模型
使用开源模型进行微调
rlhf方法
基于强化学习做的方法什么是RLHF-CSDN博客
lora方法
他们的比较:
prefix tuning方法
prompt tuning方法
轻量化微调的方法,选择一部分进行微调,实现有点难
LangChain
六个元素凭借构成的
openai.Completion——gpt-3.5-turbo-instruct
openai.chat.Completion
2 openal的调用
Completions
Completions 的基本介绍
是最原始的一类,与chatxxx比起来:
def chat_now(model='gpt-3.5-turbo-instruct',mode='balance'):
"""
基于Completion.create函数的多轮对话机器人
:param model: 调用的大语言模型,默认为text-davinci-003
:param mode: 聊天机器人预设模式,默认为平衡模式balance,可选precision(精确模式)和creativity(创造力模式)
"""
# 提示想终止聊天时输入"quit"
print("if you want to stop the conversation, please input 'quit'")
# 三种不同的模式及其对应的参数
if mode == 'balance':
temperature = 1
presence_penalty = 0
elif mode == 'precision':
temperature = 0.8
presence_penalty = 2
elif mode == 'creativity':
temperature = 1.2
presence_penalty = -1
# 定义执行对话函数,方便后续反复调用
def chat(prompt):
try:
# 不报错的情况下,返回Completion.create函数输出结果
response = openai.Completion.create(
model = model,
prompt = prompt,
max_tokens = 1000,
temperature=temperature,
presence_penalty=presence_penalty,
stop = [" Human:", " AI:"]
)
answer = response["choices"][0]["text"].strip()
return answer
except Exception as exc:
# 报错时返回"broken"
return "broken"
# 对话执行函数,首先准备空容器
text = ""
turns = []
# 执行多轮对话,即多次调用chat函数
while True:
# 启动对话框
question = input()
# 首次开启对话框时提示请输入问题
if len(question.strip()) == 0:
print("please input your question")
# 当输入为'quit'时,停止多轮对话,即停止while循环
elif question == "quit":
print("\nAI: See You Next Time!")
break
else:
# 多轮对话时,将问题和此前对话结果都作为prompt输入
prompt = text + "\nHuman: " + question
result = chat(prompt)
# 当一次请求失败时,再次发起请求
while result == "broken":
print("please wait...")
result = chat(prompt)
else:
# 保留本次对话结果
turns += [question] + [result]
print(result)
# 最多保留十次对话结果,超出次数则最开始的对话会被删除
if len(turns)<=10:
text = " ".join(turns)
else:
text = " ".join(turns[-10:])
- model:必选参数,具体调用的Completions模型名称,可以调用的模型包括text-davinci-003、text-davinci-002、text-curie-001、text-babbage-001、text-ada-001等,不同模型参数规模不同;这里需要注意,大模型领域不同于机器学习领域,后者哪怕是简单模型在某些场景下可能也会拥有比复杂模型更好的表现。在大模型领域,(就OpenAI提供的A、B、C、D四大模型来看)参数规模越大、越新版本的模型效果更好(当然费用也更高),因此课程中主要以text-davinci-003使用为例进行讲解;
- prompt:必选参数,提示词;
- suffix:可选参数,默认为空,具体指模型返回结果的后缀;
- max_tokens:可选参数,默认为16,代表返回结果的token数量;
- temperature:可选参数,取值范围为0-2,默认值为1。参数代表采样温度,**数值越小,则模型会倾向于选择概率较高的词汇,生成的文本会更加保守;而当temperature值较高时,模型会更多地选择概率较低的词汇,生成的文本会更加多样; 不同模式的风格就是temperture。**
- top_p:可选参数,取值范围为0-1,默认值为1,和temperature作用类似,用于控制输出文本的随机性,数值越趋近与1,输出文本随机性越强,越趋近于0文本随机性越弱;通常来说若要调节文本随机性,**top_p和temperature两个参数选择一个进行调整即可;这里更推荐使用temperature参数进行文本随机性调整;**
- n:可选参数,默认值为1,表示一个提示返回几个Completion;
- stream:可选参数,默认值为False,表示回复响应的方式,当为False时,模型会等待返回结果全部生成后一次性返回全部结果,而为True时,则会逐个字进行返回;
- logprobs:可选参数,默认为null,该参数用于指定模型返回前N个概率最高的token及其对数概率。例如,如果logprobs设为10,那么对于生成的每个token,API会返回模型预测的前10个token及其对数概率;
- echo:可选参数,默认为False,该参数用于控制模型是否应该简单地复述用户的输入。如果设为True,模型的响应会尽可能地复述用户的输入;
- stop:可选参数,默认为null,该参数接受一个或多个字符串,用于指定生成文本的停止信号。当模型生成的文本遇到这些字符串中的任何一个时,会立即停止生成。这可以用来控制模型的输出长度或格式;
- presence_penalty:可选参数,默认为0,取值范围为[-2, 2],该参数用于调整模型生成新内容(例如新的概念或主题)的倾向性。较高的值会使模型更倾向于生成新内容,而较低的值则会使模型更倾向于坚持已有的内容,当返回结果篇幅较大并且存在前后主题重复时,可以提高该参数的取值;
- frequency_penalty:可选参数,默认为0,取值范围为[-2, 2],该参数用于调整模型重复自身的倾向性。较高的值会使模型更倾向于避免重复,而较低的值则会使模型更可能重复自身;当返回结果篇幅较大并且存在前后语言重复时,可以提高该参数的取值;
- **best_of:该参数用于控制模型的生成过程。它会让模型进行多次尝试(例如,生成5个不同的响应),然后选择这些响应中得分最高的一个;**
- logit_bias:该参数接受一个字典,用于调整特定token的概率。字典的键是token的ID,值是应用于该token的对数概率的偏置;在GPT中我们可以使用tokenizer tool查看文本Token的标记。一般不建议修改;
- user:可选参数,使用用户的身份标记,可以通过人为设置标记,来注明当前使用者身份。需要注意的是,Completion.create函数中的user和后续介绍的对话类模型的user参数含义并不相同,需要注意区分;
如何提高大模型能力
1 可以采用few——shot和one——shot方法进行、Zero-shot-CoT与Few-shot-CoT方法进行
few与zero指的是多和少的问题,cot指的是思维链条的问题
思维链条注重于分析:
效果是思维大于无思维,多大于少
2 采用特殊值:“Let’s think step by step”其实是一句“具有魔法”的语句,最终判断将其翻译为“请一步步进行推理并得出结论”。用思维链条的方式进行设置会好很多
3 Ltm提示法(least to most prompting)
3.1 一个提示法则
分成几个问题进行回答,主问题分出几个子问题来进行回答。
3.2 一个提示法则
多个提示方法
scan数据集
1 介绍:
2 方法:
详细代码:
def SCAN_predict(dataSet=scan_test, model="text-davinci-003", CD_Few_shot=CD_Few_shot, CM_Few_shot=CM_Few_shot):
# 转化为dataframe
data_frame = dataSet.to_pandas()
# 最后一列标记为unkown
data_frame['actions_predict'] = 'unkown'
# 在字典中循环
for i,data in enumerate(dataSet):
# 阶段一:拆解命令
prompt_CD = CD_Few_shot + 'Q:“%s” A:' % data['commands']
response_CD = openai.Completion.create(
model="text-davinci-003",
prompt=prompt_CD,
temperature=0.5,
max_tokens=1000
)
# 拆解命令结果
CD_result = extract_phrases(response_CD["choices"][0]["text"].strip())
# 阶段二:短命令翻译
CM_Few_shot_temp = CM_Few_shot
sub_qs = CD_result
for qs in sub_qs:
CM_Few_shot_temp += 'Q:“%s” A:' % qs
response_CM = openai.Completion.create(
model="text-davinci-003",
prompt=CM_Few_shot_temp,
temperature=0.5,
max_tokens=1000,
)
CM_Few_shot_temp += response_CM["choices"][0]["text"].strip()
# 对原始问题提问
prompt_CM = CM_Few_shot_temp + 'Q:“%s” A:' % data['commands']
response_CM = openai.Completion.create(
model="text-davinci-003",
prompt=prompt_CM,
temperature=0.5,
max_tokens=1000,
)
# 将结果保存在dataframe的对应位置
data_frame['actions_predict'][i] = transform_expression(CM_result)
return data_frame
流程:
Chat completion
基本介绍
week 06 ch7
特点:强化了对话能力
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "请问什么是机器学习?"}
]
)
message例子
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "请问什么是机器学习?"},
{"role": "user", "content": "请问什么是决策树算法?"}
]
)
加一个例子:
一个经典例子
补充一个知识点:大模型里面可以多用json对象
例如:
df = pd.DataFrame({'x1':[1, 2], 'x2':[3, 4]})
# df是dataframe的格式
response = openai.ChatCompletion.create(
model="gpt-4-0613",
messages=[
{"role": "system", "content": "数据集df_json:'%s'" % df.to_json(orient='records')},
{"role": "user", "content": "请帮我解释下df_json数据集"}
]
)
response.choices[0].message['content']
function函数
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
import numpy as np
import pandas as pd
import json
import io
df = pd.DataFrame({'x1':[1, 2], 'x2':[3, 4]})
chen_ming_function = {"name": "chen_ming_algorithm",
"description": "用于执行陈明算法的函数,定义了一种特殊的数据集计算过程",
"parameters": {"type": "object",
"properties": {"data": {"type": "string",
"description": "执行陈明算法的数据集"},
},
"required": ["data"],
},
}
messages=[
{"role": "system", "content": "数据集data:%s,数据集以字符串形式呈现" % df_str},
{"role": "user", "content": "请在数据集data上执行陈明算法"}
]
functions = [chen_ming_function]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
functions=functions,
function_call="auto",
)
设置案例外部函数
week07.ch9
function calling
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
import numpy as np
import pandas as pd
import json
import io
import inspect
import requests
# 创建一个DataFrame
df = pd.DataFrame({'x1':[1, 2], 'x2':[3, 4]})
df_str = df.to_string()
data = io.StringIO(df_str)
df_new = pd.read_csv(data, sep='\s+', index_col=0)
def chen_ming_algorithm(data):
"""
陈明算法函数,该函数定义了一种特殊的数据集计算过程
:param data: 必要参数,表示带入计算的数据表,用字符串进行表示
:return:陈明函数计算后的结果,返回结果为表示为JSON格式的Dataframe类型对象
"""
df_new = pd.read_json(data)
res = np.sum(df_new, axis=1) - 1
return res.to_json(orient='records')
def auto_functions(functions_list):
"""
Chat模型的functions参数编写函数
:param functions_list: 包含一个或者多个函数对象的列表;
:return:满足Chat模型functions参数要求的functions对象
"""
def functions_generate(functions_list):
# 创建空列表,用于保存每个函数的描述字典
functions = []
# 对每个外部函数进行循环
for function in functions_list:
# 读取函数对象的函数说明
function_description = inspect.getdoc(function)
# 读取函数的函数名字符串
function_name = function.__name__
system_prompt = '以下是某的函数说明:%s' % function_description
user_prompt = '根据这个函数的函数说明,请帮我创建一个JSON格式的字典,这个字典有如下5点要求:\
1.字典总共有三个键值对;\
2.第一个键值对的Key是字符串name,value是该函数的名字:%s,也是字符串;\
3.第二个键值对的Key是字符串description,value是该函数的函数的功能说明,也是字符串;\
4.第三个键值对的Key是字符串parameters,value是一个JSON Schema对象,用于说明该函数的参数输入规范。\
5.输出结果必须是一个JSON格式的字典,只输出这个字典即可,前后不需要任何前后修饰或说明的语句' % function_name
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
)
functions.append(json.loads(response.choices[0].message['content']))
return functions
max_attempts = 3
attempts = 0
while attempts < max_attempts:
try:
functions = functions_generate(functions_list)
break # 如果代码成功执行,跳出循环
except Exception as e:
attempts += 1 # 增加尝试次数
print("发生错误:", e)
if attempts == max_attempts:
print("已达到最大尝试次数,程序终止。")
raise # 重新引发最后一个异常
else:
print("正在重新运行...")
return functions
def zhao_min_algorithm(data):
"""
赵敏算法函数,该函数定义了一种特殊的数据集计算过程
:param data: 必要参数,表示带入计算的数据表,用字符串进行表示
:return:赵敏函数计算后的结果,返回结果为表示为JSON格式的Dataframe类型对象
"""
df_new = pd.read_json(data)
res = np.sum(df_new, axis=1) + 1
return res.to_json(orient='records')
def run_conversation(messages, functions_list=None, model="gpt-4-0613"):
"""
能够自动执行外部函数调用的Chat对话模型
:param messages: 必要参数,字典类型,输入到Chat模型的messages参数对象
:param functions_list: 可选参数,默认为None,可以设置为包含全部外部函数的列表对象
:param model: Chat模型,可选参数,默认模型为gpt-4
:return:Chat模型输出结果
"""
# 如果没有外部函数库,则执行普通的对话任务
if functions_list == None:
response = openai.ChatCompletion.create(
model=model,
messages=messages,
)
response_message = response["choices"][0]["message"]
final_response = response_message["content"]
# 若存在外部函数库,则需要灵活选取外部函数并进行回答
else:
# 创建functions对象
functions = auto_functions(functions_list)
# 创建外部函数库字典
available_functions = {func.__name__: func for func in functions_list}
# first response
response = openai.ChatCompletion.create(
model=model,
messages=messages,
functions=functions,
function_call="auto")
response_message = response["choices"][0]["message"]
# 判断返回结果是否存在function_call,即判断是否需要调用外部函数来回答问题
if response_message.get("function_call"):
# 需要调用外部函数
# 获取函数名
function_name = response_message["function_call"]["name"]
# 获取函数对象
fuction_to_call = available_functions[function_name]
# 获取函数参数
function_args = json.loads(response_message["function_call"]["arguments"])
# 将函数参数输入到函数中,获取函数计算结果
function_response = fuction_to_call(**function_args)
# messages中拼接first response消息
messages.append(response_message)
# messages中拼接函数输出结果
messages.append(
{
"role": "function",
"name": function_name,
"content": function_response,
}
)
# 第二次调用模型
second_response = openai.ChatCompletion.create(
model=model,
messages=messages,
)
# 获取最终结果
final_response = second_response["choices"][0]["message"]["content"]
else:
final_response = response_message["content"]
return final_response
run_conversation(messages = messages, functions_list = functions_list)
#多轮对话函数
def chat_with_model(functions_list=None,
prompt="你好呀",
model="gpt-4-0613",
system_message=[{"role": "system", "content": "你是以为乐于助人的助手。"}]):
messages = system_message
messages.append({"role": "user", "content": prompt})
while True:
answer = run_conversation(messages=messages,
functions_list=functions_list,
model=model)
print(f"模型回答: {answer}")
# 询问用户是否还有其他问题
user_input = input("您还有其他问题吗?(输入退出以结束对话): ")
if user_input == "退出":
break
# 记录用户回答
messages.append({"role": "user", "content": user_input})
functions_list = [chen_ming_algorithm, zhao_min_algorithm]
functions = auto_functions(functions_list)
#function_dict = {func.__name__: func for func in functions_list}
messages = [
{"role": "system", "content": "数据集data:%s,数据集以字符串形式呈现" % df_str},
{"role": "user", "content": '请在data上执行陈明算法'}]
run_conversation(messages = messages, functions_list = functions_list)
补充一下Llama
google api
week 8 ch.11
from googleapiclient.discovery import build
from google.oauth2.credentials import Credentials
import base64
import email
from email import policy
from email.parser import BytesParser
# 从本地文件中加载凭据
creds = Credentials.from_authorized_user_file('token.json')
# 创建 Gmail API 客户端
service = build('gmail', 'v1', credentials=creds)
# 列出用户的一封最新邮件
results = service.users().messages().list(userId='me', maxResults=1).execute()
messages = results.get('messages', [])
# 遍历邮件
for message in messages:
# 获取邮件的详细信息
msg = service.users().messages().get(userId='me', id=message['id']).execute()
# 获取邮件头部信息
headers = msg['payload']['headers']
# 提取发件人、发件时间
From, Date = "", ""
for h in headers:
name = h['name']
if name.lower() == 'from':
From = h['value']
if name.lower() == 'date':
Date = h['value']
# 提取邮件正文
if 'parts' in msg['payload']:
part = msg['payload']['parts'][0]
if part['mimeType'] == 'text/plain':
data = part['body']["data"]
else:
data = msg['payload']['body']["data"]
else:
data = msg['payload']['body']["data"]
data = data.replace("-","+").replace("_","/")
decoded_data = base64.b64decode(data)
str_text = str(decoded_data, "utf-8")
msg_str = email.message_from_string(str_text)
if msg_str.is_multipart():
text = msg_str.get_payload()[0]
else:
text = msg_str.get_payload()
print('From: {}'.format(From[:8]))
print('Date: {}'.format(Date))
print('Content: {}'.format(text))
此时加上gpt的api
response = openai.ChatCompletion.create(
model="gpt-4-0613",
messages=[
{"role": "system", "content": "这是我的Gmail邮箱最近一封邮件的内容:%s" % msg},
{"role": "system", "content": "邮件内容是由Gmail API获取"},
{"role": "user", "content": "请问我的Gmail最近一封邮件是谁发送的,具体内容是什么?"}
]
)
response.choices[0].message['content']
3 nlp的基础信息 与大模型
3.1 预处理步骤
先进行分词,Python Jieba库
# 1.中文分词
# “结巴”Python中文分词组件
# * 支持三种分词模式:
# - 精确模式,试图将句子最精确地切开,适合文本分析;
# - 全模式,把句子中所有的可以成词的词语都扫描出来, 速度非常快,但是不能解决歧义;
# - 搜索引擎模式,在精确模式的基础上,对长词再次切分,提高召回率,适合用于搜索引擎分词。
# * 支持繁体分词
# * 支持自定义词典
import jieba
# 基本功能
# jieba.cut 方法接受三个输入参数: 需要分词的字符串;cut_all 参数用来控制是否采用全模式;HMM 参数用来控制是否使用 HMM 模型
# jieba.cut_for_search 方法接受两个参数:需要分词的字符串;是否使用 HMM 模型。该方法适合用于搜索引擎构建倒排索引的分词,粒度比较细
seg_list = jieba.cut("我来到北京清华大学", cut_all=True)
print("【全模式】: " + ", ".join(seg_list)) # 全模式
seg_list = jieba.cut("我来到北京清华大学", cut_all=False)
print("【精确模式】:" + ", ".join(seg_list)) # 精确模式
seg_list = jieba.cut("他来到了网易杭研大厦") # “杭研”并没有在词典中,但是也被Viterbi算法识别出来了
print("【新词识别】:"+", ".join(seg_list))
seg_list = jieba.cut_for_search("小明硕士毕业于中国科学院,后在日本京都大学深造") # 搜索引擎模式
print("【搜索引擎模式】:"+", ".join(seg_list))
# 2 自定义词典
# 用法: jieba.load_userdict(file_name) # file_name 为文件类对象或自定义词典的路径
# 词典格式和 dict.txt 一样,一个词占一行;每一行分三部分:词语、词频(可省略)、词性(可省略),用空格隔开,顺序不可颠倒。file_name 若为路径或二进制方式打开的文件,则文件必须为 UTF-8 编码。
# eg:
# ```
# 创新办 3 i
# 云计算 5
# 凱特琳 nz
# 台中
# ```
test_sent = (
"例如我输入一个带“韩玉赏鉴”的标题,在自定义词库中也增加了此词为N类\n"
"「台中」正確應該不會被切開。mac上可分出「石墨烯」;此時又可以分出來凱特琳了。"
)
words = jieba.cut(test_sent)
print('/'.join(words))
jieba.load_userdict("data/userdict.txt") #加载用户词典
words = jieba.cut(test_sent)
print('/'.join(words))
# 使用 add_word(word, freq=None, tag=None) 和 del_word(word) 可在程序中动态修改词典。
jieba.add_word('石墨烯')
jieba.add_word('雷课教育')
words = jieba.cut(test_sent)
print('/'.join(words))
# 使用 suggest_freq(segment, tune=True) 可调节单个词语的词频,使其能(或不能)被分出来。
print('/'.join(jieba.cut('如果放到post中将出错。')))
jieba.suggest_freq(('中', '将'), True)
print('/'.join(jieba.cut('如果放到post中将出错。')))
然后进行编码
先是简单文本表示(one-hot和词袋模型)
import jieba
texts = ['Python是目前最流行的数据分析和机器学习编程语言',
'Python语言编程将很快成为 各个高校的必修课',
'Python是科研工作者开展科学研究的高效工具']
from keras.preprocessing.text import Tokenizer
tk = Tokenizer()
# 创建单词索引
tk.fit_on_texts(sentences)
print(tk.word_index)
# 把单词转换为序列
seqs = tk.fit_on_texts(sentences)
seqs = tk.texts_to_sequences(sentences)
for seq in seqs:
print(seq)
‘’‘
[1, 3, 4, 5, 6, 2, 7, 8, 9, 10, 11]
[1, 12, 13, 14, 15, 16, 17, 18, 2, 19]
[1, 3, 20, 21, 22, 23, 2, 24, 25]
’‘’
#one hot编码
one_hot_results = tk.texts_to_matrix(sentences, mode='binary')
for one_hot_result in one_hot_results:
print(one_hot_result)
len(one_hot_result)
‘’‘
[0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]
[0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0.
0. 0.]
[0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1.
1. 1.]
’‘’
简单分类的方法
onehot问题
doc2bow方法来计算
from gensim import corpora
# 示例文本数据,每个文档是单词的列表
texts = [
['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
['graph', 'minors', 'trees'],
['graph', 'minors', 'survey']
]
# 创建字典
dictionary = corpora.Dictionary(texts)
# 使用 doc2bow 转换第一个文档
print(dictionary)
tf-idf的表示方法
from gensim import corpora, models
texts = [
['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
['graph', 'minors', 'trees'],
['graph', 'minors', 'survey']
]
# 创建一个词典,将文本数据中的每个单词与一个唯一的整数ID关联
dictionary = corpora.Dictionary(texts)
# 使用词典将每个文本转换为词袋模型表示的向量(单词ID和单词在文档中出现的次数)
corpus = [dictionary.doc2bow(text) for text in texts]
# 使用语料库训练 TF-IDF 模型
tfidf = models.TfidfModel(corpus)
# 使用 TF-IDF 模型转换整个语料库
corpus_tfidf = tfidf[corpus]
# 打印每个文档的 TF-IDF 向量
for doc in corpus_tfidf:
print(doc)
答案是
[(0, 0.5773502691896257), (1, 0.5773502691896257), (2, 0.5773502691896257)]
[(0, 0.44424552527467476), (3, 0.44424552527467476), (4, 0.44424552527467476), (5, 0.3244870206138555), (6, 0.44424552527467476), (7, 0.3244870206138555)]
[(2, 0.5710059809418182), (5, 0.4170757362022777), (7, 0.4170757362022777), (8, 0.5710059809418182)]
[(1, 0.49182558987264147), (5, 0.7184811607083769), (8, 0.49182558987264147)]
[(3, 0.6282580468670046), (6, 0.6282580468670046), (7, 0.45889394536615247)]
[(9, 1.0)]
[(9, 0.7071067811865475), (10, 0.7071067811865475)]
[(9, 0.5080429008916749), (10, 0.5080429008916749), (11, 0.695546419520037)]
[(4, 0.6282580468670046), (10, 0.45889394536615247), (11, 0.6282580468670046)]
LDA(潜在狄利克雷分配Latent Dirichlet Allocation)
from gensim import corpora, models
# 给定的文本数据
texts = [
['human', 'interface', 'computer'],
['survey', 'user', 'computer', 'system', 'response', 'time'],
['eps', 'user', 'interface', 'system'],
['system', 'human', 'system', 'eps'],
['user', 'response', 'time'],
['trees'],
['graph', 'trees'],
['graph', 'minors', 'trees'],
['graph', 'minors', 'survey']
]
# 创建一个词典,将文本数据中的每个单词与一个唯一的整数ID关联
dictionary = corpora.Dictionary(texts)
# 使用词典将每个文本转换为词袋模型表示的向量
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)
print()
# 创建 LDA 模型的实例
lda = models.LdaModel(corpus=corpus, id2word=dictionary, num_topics=2, random_state=100, update_every=1, chunksize=10, passes=10, alpha='auto', per_word_topics=True)
# 打印出每个主题的单词及其权重
for idx, topic in lda.print_topics(-1):
print('Topic: {} \nWords: {}'.format(idx, topic))
写出不同的主题相当于自己设定的每个主题
corpus
corpus
[(0, 1), (1, 1), (2, 1)],
[(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)],
[(2, 1), (5, 1), (7, 1), (8, 1)],
[(1, 1), (5, 2), (8, 1)],
[(3, 1), (6, 1), (7, 1)],
[(9, 1)],
[(9, 1), (10, 1)],
[(9, 1), (10, 1), (11, 1)],
[(4, 1), (10, 1), (11, 1)]]
对于每个数据的分析
corpus_lda = lda[corpus]
for doc in corpus_lda:
print(doc)
len(corpus_lda)
Word2Vec
import jieba
# 初始化分词器
jieba.initialize()
texts = [
'Python是目前最流行的数据分析和机器学习编程语言',
'Python语言编程将很快成为各个高校的必修课',
'Python是科研工作者开展科学研究的高效工具'
]
# 分词处理
texts_tokens = [list(jieba.cut(text)) for text in texts]
texts_tokens
‘’‘
[['Python', '是', '目前', '最', '流行', '的', '数据分析', '和', '机器', '学习', '编程语言'],
['Python', '语言', '编程', '将', '很快', '成为', '各个', '高校', '的', '必修课'],
['Python', '是', '科研', '工作者', '开展', '科学研究', '的', '高效', '工具']]
’‘’
from gensim.models import Word2Vec
# 训练模型
model = Word2Vec(texts_tokens, vector_size=2, window=5, min_count=1, workers=4)
texts_tokens
for tokens in texts_tokens:
for token in tokens:
vector = model.wv[token]
print(f'词语:{token} -> 向量:{vector[:10]}...') # 只显示向量的前10个元素
词语:Python -> 向量:[-0.02688289 0.01180699]...
词语:是 -> 向量:[-0.4651475 -0.35584044]...
词语:目前 -> 向量:[-0.25068876 -0.18806627]...
词语:最 -> 向量:[ 0.3690433 -0.07661679]...
词语:流行 -> 向量:[-0.22682734 0.3279404 ]...
词语:的 -> 向量:[0.25516748 0.45046365]...
词语:数据分析 -> 向量:[-0.24306364 -0.09094367]...
词语:和 -> 向量:[0.14385079 0.0496012 ]...
词语:机器 -> 向量:[-0.41426075 -0.4724409 ]...
词语:学习 -> 向量:[0.36556992 0.2533784 ]...
词语:编程语言 -> 向量:[0.3378655 0.03817212]...
词语:Python -> 向量:[-0.02688289 0.01180699]...
词语:语言 -> 向量:[0.32285434 0.4486569 ]...
词语:编程 -> 向量:[0.24893288 0.46179345]...
词语:将 -> 向量:[-0.37610796 -0.19678326]...
词语:很快 -> 向量:[-0.37554762 -0.04633344]...
词语:成为 -> 向量:[ 0.47686377 -0.3659358 ]...
词语:各个 -> 向量:[-0.11690424 -0.09688386]...
doc2vec
import jieba
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.utils import simple_preprocess
# 文本数据
texts = [
'Python是目前最流行的数据分析和机器学习编程语言',
'Python语言编程将很快成为各个高校的必修课',
'Python是科研工作者开展科学研究的高效工具'
]
# 使用jieba进行分词并创建TaggedDocument
documents = [TaggedDocument(words=list(jieba.cut(text)), tags=[i]) for i, text in enumerate(texts)]
# 训练Doc2Vec模型
model = Doc2Vec(documents, vector_size=3, window=5, min_count=1, workers=4, epochs=40)
# 获取并打印文档向量
for i in range(len(texts)):
vector = model.dv[i]
print(f'文档 {i} 的向量: {vector[:10]}...') # 显示向量的前10个元素
他们的区别:
3.2 textcnn
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
class TextDataset(Dataset):
def __init__(self, texts, labels):
self.texts = texts
self.labels = labels
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
return self.texts[idx], self.labels[idx]
class TextCNN(nn.Module):
def __init__(self, vocab_size, embed_dim, num_classes, filter_sizes, num_filters):
super(TextCNN, self).__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.convs = nn.ModuleList([
nn.Conv2d(1, num_filters, (k, embed_dim)) for k in filter_sizes
])
self.dropout = nn.Dropout(0.5)
self.fc = nn.Linear(len(filter_sizes) * num_filters, num_classes)
def forward(self, x):
# 2*7
x = self.embedding(x) # [batch_size, seq_length, embed_dim] 2, 7, 300
x = x.unsqueeze(1) # [batch_size, 1, seq_length, embed_dim] 2, 1,7, 300
x = [torch.relu(conv(x)).squeeze(3) for conv in self.convs] # list of [batch_size, num_filters, ~]
#x=[2,100,5] x=[2,100,3] x=[2,100,4]
##
x = [torch.max(pool, 2)[0] for pool in x] # list of [batch_size, num_filters]
# 3*2*100
x = torch.cat(x, 1) # [batch_size, num_filters * len(filter_sizes)] 2*300
x = self.dropout(x) # 2*300
x = self.fc(x) # [batch_size, num_classes] # 2*2
return x
def data():
texts = ["I love reading books", "Data science is fun", "Python is great for data analysis", "AI is the future",
"Machine learning is fascinating"]
labels = ["positive", "positive", "positive", "positive", "positive"] # 简单示例,假设所有都是正面评价
tokenized_texts = [text.lower().split() for text in texts]
vocab = {}
index = 1 # 开始索引为1,因为我们将0留给了未知词<UNK>
for sentence in tokenized_texts:
for word in sentence:
if word not in vocab:
vocab[word] = index
index += 1
vocab['<UNK>'] = 0
indexed_texts = []
for sentence in tokenized_texts:
indexed_sentence = [vocab[word] if word in vocab else vocab['<UNK>'] for word in sentence]
indexed_texts.append(indexed_sentence)
max_length = 7 # 选择或计算最适合你数据集的长度
padded_texts = [sentence + [vocab['<UNK>']] * (max_length - len(sentence)) if len(sentence) < max_length else sentence[
:max_length] for
sentence in indexed_texts]
text_tensor = torch.tensor(padded_texts)
label_tensor = torch.tensor([1 if label == "positive" else 0 for label in labels])
return text_tensor,label_tensor
vocab_size = 1000 # 词汇表大小
embed_dim = 300 # 词向量维度
num_classes = 2 # 输出类别数
filter_sizes = [3, 4, 5] # 卷积核尺寸
num_filters = 100 # 卷积核数量
# 实例化模型
model = TextCNN(vocab_size, embed_dim, num_classes, filter_sizes, num_filters)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 模拟一些数据
texts, labels=data()
import ipdb;ipdb.set_trace()
dataset = TextDataset(texts, labels)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
# 训练过程
num_epochs = 5
for epoch in range(num_epochs):
for texts, labels in dataloader:
optimizer.zero_grad()
outputs = model(texts)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}, Loss: {loss.item()}')
3.3 RNN
预测任务就是从10个里面预测5个
import torch
import datetime
import numpy as np
import torch.nn as nn
import torch.optim as optim
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['FangSong']
mpl.rcParams['axes.unicode_minus'] = False
###########################设置全局变量###################################
num_time_steps = 16 # 训练时时间窗的步长
input_size = 3 # 输入数据维度
hidden_size = 16 # 隐含层维度
output_size = 3 # 输出维度
num_layers = 1
lr=0.01
####################定义RNN类##############################################
class Net(nn.Module):
def __init__(self, input_size, hidden_size, num_layers):
super(Net, self).__init__()
self.rnn = nn.RNN(
input_size=input_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True,
)
print(self.rnn)
for p in self.rnn.parameters():
nn.init.normal_(p, mean=0.0, std=0.001)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x, hidden_prev):
# x——torch.Size([1, 15, 3])
# import ipdb;ipdb.set_trace()
out, hidden_prev = self.rnn(x, hidden_prev) # hidden_prev——torch.Size([1, 1, 16]) out——[1, 15, 16]
# [b, seq, h]
out = out.view(-1, hidden_size) # out [15, 16]]
out = self.linear(out)#[seq,h] => [seq,3] [15, 3]
out = out.unsqueeze(dim=0) # => [1,seq,3]
return out, hidden_prev #[1,15, 3] [[1, 1, 16]
####################初始化训练集#################################
def getdata():
x1 = np.linspace(1,10,30).reshape(30,1)
y1 = (np.zeros_like(x1)+2)+np.random.rand(30,1)*0.1
z1 = (np.zeros_like(x1)+2).reshape(30,1)
tr1 = np.concatenate((x1,y1,z1),axis=1)
# mm = MinMaxScaler()
# data = mm.fit_transform(tr1) #数据归一化
return tr1
#####################开始训练模型#################################
def tarin_RNN(data):
model = Net(input_size, hidden_size, num_layers)
print('model:\n',model)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr)
#初始化h
# hidden state 第一个batch size 第二个数量是层数,第三个是维度大小([1, 1, 16])
hidden_prev = torch.zeros(1, 1, hidden_size)
l = []
# 训练3000次
for iter in range(300):
# loss = 0
start = np.random.randint(10, size=1)[0]
end = start + 15
# x和y[1, 15, 3]
x = torch.tensor(data[start:end]).float().view(1, num_time_steps - 1, 3)
# 在data里面随机选择15个点作为输入,预测第16
y = torch.tensor(data[start + 5:end + 5]).float().view(1, num_time_steps - 1, 3)
# output[1,15,3]
output, hidden_prev = model(x, hidden_prev)
hidden_prev = hidden_prev.detach()
loss = criterion(output, y)
model.zero_grad()
loss.backward()
optimizer.step()
if iter % 100 == 0:
print("Iteration: {} loss {}".format(iter, loss.item()))
l.append(loss.item())
##############################绘制损失函数#################################
plt.plot(l,'r')
plt.xlabel('训练次数')
plt.ylabel('loss')
plt.title('RNN损失函数下降曲线')
return hidden_prev,model
#############################预测#########################################
def RNN_pre(model,data,hidden_prev):
data_test = data[19:29]
data_test = torch.tensor(np.expand_dims(data_test, axis=0),dtype=torch.float32)
import ipdb;ipdb.set_trace()
pred1,h1 = model(data_test,hidden_prev )
# pred1 [1, 10, 3] h1 [1, 1, 16]
print('pred1.shape:',pred1.shape)
pred2,h2 = model(pred1,hidden_prev )
print('pred2.shape:',pred2.shape)
pred1 = pred1.detach().numpy().reshape(10,3)
pred2 = pred2.detach().numpy().reshape(10,3)
predictions = np.concatenate((pred1,pred2),axis=0)
# predictions= mm.inverse_transform(predictions)
print('predictions.shape:',predictions.shape)
#############################预测可视化########################################
fig = plt.figure(figsize=(9, 6))
ax = Axes3D(fig)
ax.scatter3D(data[:, 0],data[:, 1],data[:,2],c='red')
ax.scatter3D(predictions[:,0],predictions[:,1],predictions[:,2],c='y')
ax.set_xlabel('X')
ax.set_xlim(0, 8.5)
ax.set_ylabel('Y')
ax.set_ylim(0, 10)
ax.set_zlabel('Z')
ax.set_zlim(0, 4)
plt.title("RNN航迹预测")
plt.show()
def main():
data = getdata()
start = datetime.datetime.now()
hidden_pre, model = tarin_RNN(data)
end = datetime.datetime.now()
print('The training time: %s' % str(end - start))
plt.show()
RNN_pre(model, data, hidden_pre)
if __name__ == '__main__':
main()
有一个变种——双向RNN
代码:
import torch
import torch.nn as nn
class BiRNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, num_classes):
super(BiRNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
# 创建一个双向RNN层
# bidirectional 设置为True即可实现双向RNN
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
# 因为是双向的,所以最后的输出维度是 hidden_size 的两倍
self.fc = nn.Linear(hidden_size * 2, num_classes)
def forward(self, x):
# 初始化隐藏状态
h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(x.device) # 2 for bidirectional
# 前向传播
# out: [batch_size, seq_length, hidden_size * 2]
out, _ = self.rnn(x, h0)
# 取出最后时刻的特征
out = self.fc(out[:, -1, :])
return out
# 参数定义
input_size = 10 # 输入的特征维度
hidden_size = 20 # 隐藏层的特征维度
num_layers = 2 # RNN的层数
num_classes = 3 # 输出的类别数
# 创建模型
model = BiRNN(input_size, hidden_size, num_layers, num_classes)
print(model)
3.4 LSTM原则
预测任务
时间序列预测——LSTM模型(附代码实现)_lstm模型代码-CSDN博客
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
# 生成示例数据
def generate_data(timesteps):
x = np.linspace(0, 10, timesteps)
y = np.sin(x)
return y
# 准备训练数据
def prepare_data(data, n_steps):
X, y = [], []
for i in range(len(data)):
end_ix = i + n_steps
if end_ix > len(data)-1:
break
seq_x, seq_y = data[i:end_ix], data[end_ix]
X.append(seq_x)
y.append(seq_y)
return np.array(X), np.array(y)
# 模型参数
timesteps = 1000
n_steps = 10
# 数据生成和处理
data = generate_data(timesteps)
X, y = prepare_data(data, n_steps)
X = X.reshape((X.shape[0], X.shape[1], 1))
# 构建LSTM模型
model = Sequential([
LSTM(50, activation='relu', input_shape=(n_steps, 1)),
Dense(1)
])
model.compile(optimizer='adam', loss='mse')
# 训练模型
model.fit(X, y, epochs=2, verbose=1)
# 进行预测
x_input = np.array(data[-n_steps:])
x_input = x_input.reshape((1, n_steps, 1))
yhat = model.predict(x_input, verbose=0)
import ipdb;ipdb.set_trace()
print(f'Predicted Value: {yhat[0][0]}')
举几个例子nlp的例子
3.5 SAM分割
【图像分割】Meta分割一切(SAM)模型环境配置和使用教程_sam使用教程-CSDN博客
就是attention不停做
import torch
import torch.nn.functional as F
# 假设output是模型的输出,shape为(N, H, W, C)
# 真实的标签y_true,shape为(N, H, W)
# 注意:PyTorch预期的输出形状是(N, C, H, W),需要转置维度
y_pred = torch.randn(10, 256, 256, 5) # 模拟模型输出
y_pred = y_pred.permute(0, 3, 1, 2) # 转置y_pred为(N, C, H, W)
y_true = torch.randint(0, 5, (10, 256, 256), dtype=torch.long) # 模拟真实标签
# 计算交叉熵损失
loss = F.cross_entropy(y_pred, y_true)
print(loss)
把图片mask变成label
import numpy as np
from skimage import color
from skimage.io import imread
# 读取彩色掩码图像
color_mask = imread('path_to_color_mask.png')
# 定义颜色到类别索引的映射
color_to_class = {
(255, 0, 0): 0, # 红色对应类别0
(0, 255, 0): 1, # 绿色对应类别1
(0, 0, 255): 2, # 蓝色对应类别2
# ...(更多颜色)
}
# 初始化一个空的数组来存放类别索引
index_mask = np.zeros((color_mask.shape[0], color_mask.shape[1]), dtype=np.int32)
# 对每种颜色进行映射
for color_value, class_index in color_to_class.items():
# 在掩码中找到匹配颜色的所有位置
matches = (color_mask == color_value).all(axis=-1)
# 将这些位置的类别索引设置为对应的整数
index_mask[matches] = class_index
# 现在index_mask是一个整数形式的掩码,每个像素的值是类别索引
4 大模型LLM
T5
一文搞懂Transformer架构的三种注意力机制_transformer注意力机制-CSDN博客
dis step的介绍
distilling-step-by-step-main-CSDN博客
如何训练——非常重要
Huggingface trainer、model.from_pretrained、tokenizer()简单介绍(笔记)_transformes trainer 如何保存优化器的状态-CSDN博客
gpt——llm
神经网络算法:一文搞懂GPT(Generative Pre-trained Transformer)-CSDN博客
qwen——llm
现在的模型是通义千问模型(Qwen)
Llama——llm
LLaMA系列 | LLaMA和LLaMA-2精简总结-CSDN博客
GPT qwen llama bert ELMo 差异是什么
LLaMa、Qwen、ChatGLM、ChatGLM2的区别_llama qwen-CSDN博客
BERT与其他NLP模型的对比:了解预训练模型的差异-CSDN博客
LN与RMSNorm区别是什么
bert——llm
BertTokenizer 使用方法_berttokenizer.from_pretrained-CSDN博客
这个帖子很好:大模型面试准备(十四):BERT 为何青睐 Transformer 双向编码器?_双向transformer 大模型-CSDN博客
介绍
model
代码
import torch
import math
import numpy as np
from transformers import BertModel
from transformers import BertTokenizer
'''
通过手动矩阵运算实现Bert结构
模型文件下载 https://huggingface.co/models
'''
bert = BertModel.from_pretrained(r"/Users/depeng.yao/Desktop/yaodepeng/nlp/bert/bert-base-chinese", return_dict=False)
state_dict = bert.state_dict()
bert.eval()
x = np.array([2450, 15486, 15167, 2110]) #通过vocab对应输入:深度学习
torch_x = torch.LongTensor([x]) #pytorch形式输入
# seqence_output, pooler_output = bert(torch_x)
# print(seqence_output.shape, pooler_output.shape)
# print(seqence_output, pooler_output)
# print(bert.state_dict().keys()) #查看所有的权值矩阵名称
#softmax归一化
def softmax(x):
return np.exp(x)/np.sum(np.exp(x), axis=-1, keepdims=True)
#gelu激活函数
def gelu(x):
return 0.5 * x * (1 + np.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * np.power(x, 3))))
class DiyBert:
#将预训练好的整个权重字典输入进来
def __init__(self, state_dict):
self.num_attention_heads = 12
self.hidden_size = 768
self.num_layers = 1
self.load_weights(state_dict)
def load_weights(self, state_dict):
#embedding部分
self.word_embeddings = state_dict["embeddings.word_embeddings.weight"].numpy()
self.position_embeddings = state_dict["embeddings.position_embeddings.weight"].numpy()
self.token_type_embeddings = state_dict["embeddings.token_type_embeddings.weight"].numpy()
self.embeddings_layer_norm_weight = state_dict["embeddings.LayerNorm.weight"].numpy()
self.embeddings_layer_norm_bias = state_dict["embeddings.LayerNorm.bias"].numpy()
self.transformer_weights = []
#transformer部分,有多层
for i in range(self.num_layers):
q_w = state_dict["encoder.layer.%d.attention.self.query.weight" % i].numpy()
q_b = state_dict["encoder.layer.%d.attention.self.query.bias" % i].numpy()
k_w = state_dict["encoder.layer.%d.attention.self.key.weight" % i].numpy()
k_b = state_dict["encoder.layer.%d.attention.self.key.bias" % i].numpy()
v_w = state_dict["encoder.layer.%d.attention.self.value.weight" % i].numpy()
v_b = state_dict["encoder.layer.%d.attention.self.value.bias" % i].numpy()
attention_output_weight = state_dict["encoder.layer.%d.attention.output.dense.weight" % i].numpy()
attention_output_bias = state_dict["encoder.layer.%d.attention.output.dense.bias" % i].numpy()
attention_layer_norm_w = state_dict["encoder.layer.%d.attention.output.LayerNorm.weight" % i].numpy()
attention_layer_norm_b = state_dict["encoder.layer.%d.attention.output.LayerNorm.bias" % i].numpy()
intermediate_weight = state_dict["encoder.layer.%d.intermediate.dense.weight" % i].numpy()
intermediate_bias = state_dict["encoder.layer.%d.intermediate.dense.bias" % i].numpy()
output_weight = state_dict["encoder.layer.%d.output.dense.weight" % i].numpy()
output_bias = state_dict["encoder.layer.%d.output.dense.bias" % i].numpy()
ff_layer_norm_w = state_dict["encoder.layer.%d.output.LayerNorm.weight" % i].numpy()
ff_layer_norm_b = state_dict["encoder.layer.%d.output.LayerNorm.bias" % i].numpy()
self.transformer_weights.append([q_w, q_b, k_w, k_b, v_w, v_b, attention_output_weight, attention_output_bias,
attention_layer_norm_w, attention_layer_norm_b, intermediate_weight, intermediate_bias,
output_weight, output_bias, ff_layer_norm_w, ff_layer_norm_b])
#pooler层
self.pooler_dense_weight = state_dict["pooler.dense.weight"].numpy()
self.pooler_dense_bias = state_dict["pooler.dense.bias"].numpy()
#bert embedding,使用3层叠加,在经过一个embedding层
def embedding_forward(self, x):
# x.shape = [max_len]
# import ipdb;ipdb.set_trace()
we = self.get_embedding(self.word_embeddings, x) # shpae: [max_len, hidden_size] # (2, 4)
# position embeding的输入 [0, 1, 2, 3]
pe = self.get_embedding(self.position_embeddings, np.array(list(range(len(x))))) # shpae: [max_len, hidden_size]
# token type embedding,单输入的情况下为[0, 0, 0, 0]
te = self.get_embedding(self.token_type_embeddings, np.array([0] * len(x))) # shpae: [max_len, hidden_size]
embedding = we + pe + te
# 加和后有一个归一化层
embedding = self.layer_norm(embedding, self.embeddings_layer_norm_weight, self.embeddings_layer_norm_bias) # shpae: [max_len, hidden_size]
return embedding
#embedding层实际上相当于按index索引,或理解为onehot输入乘以embedding矩阵
def get_embedding(self, embedding_matrix, x):
return np.array([embedding_matrix[index] for index in x])
#执行全部的transformer层计算
def all_transformer_layer_forward(self, x):
for i in range(self.num_layers):
x = self.single_transformer_layer_forward(x, i)
return x
#执行单层transformer层计算
def single_transformer_layer_forward(self, x, layer_index):
weights = self.transformer_weights[layer_index]
#取出该层的参数,在实际中,这些参数都是随机初始化,之后进行预训练
q_w, q_b, \
k_w, k_b, \
v_w, v_b, \
attention_output_weight, attention_output_bias, \
attention_layer_norm_w, attention_layer_norm_b, \
intermediate_weight, intermediate_bias, \
output_weight, output_bias, \
ff_layer_norm_w, ff_layer_norm_b = weights
#self attention层
attention_output = self.self_attention(x,
q_w, q_b,
k_w, k_b,
v_w, v_b,
attention_output_weight, attention_output_bias,
self.num_attention_heads,
self.hidden_size)
#bn层,并使用了残差机制
x = self.layer_norm(x + attention_output, attention_layer_norm_w, attention_layer_norm_b)
#feed forward层
feed_forward_x = self.feed_forward(x,
intermediate_weight, intermediate_bias,
output_weight, output_bias)
#bn层,并使用了残差机制
x = self.layer_norm(x + feed_forward_x, ff_layer_norm_w, ff_layer_norm_b)
return x
# self attention的计算
def self_attention(self,
x,
q_w,
q_b,
k_w,
k_b,
v_w,
v_b,
attention_output_weight,
attention_output_bias,
num_attention_heads,
hidden_size):
# x.shape = max_len * hidden_size
# q_w, k_w, v_w shape = hidden_size * hidden_size
# q_b, k_b, v_b shape = hidden_size
q = np.dot(x, q_w.T) + q_b # shape: [max_len, hidden_size] W * X + B lINER
k = np.dot(x, k_w.T) + k_b # shpae: [max_len, hidden_size]
v = np.dot(x, v_w.T) + v_b # shpae: [max_len, hidden_size]
attention_head_size = int(hidden_size / num_attention_heads)
# q.shape = num_attention_heads, max_len, attention_head_size
q = self.transpose_for_scores(q, attention_head_size, num_attention_heads)
# k.shape = num_attention_heads, max_len, attention_head_size
k = self.transpose_for_scores(k, attention_head_size, num_attention_heads)
# v.shape = num_attention_heads, max_len, attention_head_size
v = self.transpose_for_scores(v, attention_head_size, num_attention_heads)
# qk.shape = num_attention_heads, max_len, max_len
qk = np.matmul(q, k.swapaxes(1, 2))
qk /= np.sqrt(attention_head_size)
qk = softmax(qk)
# qkv.shape = num_attention_heads, max_len, attention_head_size
qkv = np.matmul(qk, v)
# qkv.shape = max_len, hidden_size
qkv = qkv.swapaxes(0, 1).reshape(-1, hidden_size)
# attention.shape = max_len, hidden_size
attention = np.dot(qkv, attention_output_weight.T) + attention_output_bias
return attention
#多头机制
def transpose_for_scores(self, x, attention_head_size, num_attention_heads):
# hidden_size = 768 num_attent_heads = 12 attention_head_size = 64
max_len, hidden_size = x.shape
x = x.reshape(max_len, num_attention_heads, attention_head_size)
x = x.swapaxes(1, 0) # output shape = [num_attention_heads, max_len, attention_head_size]
return x
#前馈网络的计算
def feed_forward(self,
x,
intermediate_weight, # intermediate_size, hidden_size
intermediate_bias, # intermediate_size
output_weight, # hidden_size, intermediate_size
output_bias, # hidden_size
):
# output shpae: [max_len, intermediate_size]
x = np.dot(x, intermediate_weight.T) + intermediate_bias
x = gelu(x)
# output shpae: [max_len, hidden_size]
x = np.dot(x, output_weight.T) + output_bias
return x
#归一化层
def layer_norm(self, x, w, b):
x = (x - np.mean(x, axis=1, keepdims=True)) / np.std(x, axis=1, keepdims=True)
x = x * w + b
return x
#链接[cls] token的输出层
def pooler_output_layer(self, x):
x = np.dot(x, self.pooler_dense_weight.T) + self.pooler_dense_bias
x = np.tanh(x)
return x
#最终输出
def forward(self, x):
# import ipdb;ipdb.set_trace()
x = self.embedding_forward(x)# (4, 768)
sequence_output = self.all_transformer_layer_forward(x) # (4, 768)
pooler_output = self.pooler_output_layer(sequence_output[0]) # pooler_output.shape
return sequence_output, pooler_output
#自制
db = DiyBert(state_dict)
diy_sequence_output, diy_pooler_output = db.forward(x)# 4*768 768
#torch
torch_sequence_output, torch_pooler_output = bert(torch_x) # 1*768 # 1*4*768
# import torch
# import torch.nn as nn
#
# num_labels = 5 # 假设有5个不同的标签
# num_classes = 5 # 假设有5个不同的标签
# classifier = nn.Linear(bert.config.hidden_size, num_classes)
# import ipdb;ipdb.set_trace()
#
# logits = classifier(torch_pooler_output) # logits的形状应为 [batch_size, num_classes]
# classifier = nn.Linear(bert.config.hidden_size, num_labels) # bert.config.hidden_size通常是768
# import ipdb;ipdb.set_trace()
#
# # 应用分类层到BERT输出的每个token # [1, 4, 768]
# logits = classifier(torch_sequence_output) # 输出形状为 [batch_size, seq_length, num_labels]
#
#
# # 假设`targets`是包含实际标签的Tensor,形状为 [batch_size, seq_length] #1*4
# targets = torch.randint(0, num_labels, (1, 4)) # 随机生成一些示例标签
#
# # 交叉熵损失函数需要将logits展平到2D,targets也同样展平到1D
# criterion = nn.CrossEntropyLoss()
# loss = criterion(logits.view(-1, num_labels), targets.view(-1))
#
# print("Loss:", loss.item())
#
#
# print(diy_sequence_output)
# print(torch_sequence_output)
#
# # print(diy_pooler_output)
# # print(torch_pooler_output)
23区别在于cls和768的区别。对于start和end的
差异
BERT与其他NLP模型的对比:了解预训练模型的差异-CSDN博客
问题1:BERT两个特殊怎么实现的
问题2 :GPT使用生成式预训练,而ELMo利用双向LSTM进行深度上下文建模。
问题三:BERT和GPT基于Transformer,但BERT使用编码器,GPT使用解码器,编码器解码器这两个结构哪不同可以写一下吗
问题四:双向和单向区别是什么?
import numpy as np
# 假设我们有一个序列
sequence = np.array([1, 2, 3, 4, 5])
# 单向处理:我们只累加到当前元素为止的值
def unidirectional_process(sequence):
results = []
accumulator = 0
for value in sequence:
accumulator += value
results.append(accumulator)
return results
# 双向处理:我们累加所有元素,并考虑到每个元素之前和之后的值
def bidirectional_process(sequence):
forward_results = unidirectional_process(sequence)
backward_results = unidirectional_process(sequence[::-1])[::-1]
results = (np.array(forward_results) + np.array(backward_results)) - sequence
return results
# 计算单向和双向处理结果
unidirectional_results = unidirectional_process(sequence)
bidirectional_results = bidirectional_process(sequence)
print("单向处理结果:", unidirectional_results)
print("双向处理结果:", bidirectional_results)
羊驼
5 MLLM多模态
qwen-vl——mllm
【LLM多模态】Qwen-VL模型结构和训练流程-CSDN博客
import torch
import torch.nn as nn
from transformers import ViTModel
class CrossAttention(nn.Module):
def __init__(self, num_queries, d_model):
super(CrossAttention, self).__init__()
self.num_queries = num_queries
# 初始化查询向量作为模型参数
self.query = nn.Parameter(torch.randn(1, num_queries, d_model)) # 修改维度以适应批量处理
# [1, 256, 768]
self.cross_attn = nn.MultiheadAttention(d_model, num_heads=8)
def forward(self, img_features):
# img_features shape: (seq_len, batch_size, d_model)
# 将查询向量复制扩展到与图像特征相同的批量大小
import ipdb;ipdb.set_trace()
batch_size = img_features.size(1)
query = self.query.repeat(batch_size, 1, 1).transpose(0, 1) # 调整查询向量维度以匹配注意力层输入要求
# Cross-attention operation
# query [256, 10, 768] [197, 10, 768]
attn_output, _ = self.cross_attn(query, img_features, img_features)
return attn_output
# 假设 ViT 输出的维度 d_model 为 768
d_model = 768
num_queries = 256
# 加载预训练的 ViT 模型
vit = ViTModel.from_pretrained('/Users/depeng.yao/Desktop/vit')
import ipdb;ipdb.set_trace()
# 初始化 Cross-Attention 模块
cross_attention = CrossAttention(num_queries=num_queries, d_model=d_model)
# 假设有一个随机生成的图像批量
batch_size = 10
dummy_images = torch.rand(batch_size, 3, 224, 224) # 假设输入图像大小为 224x224
# 通过 ViT 提取图像特征
outputs = vit(pixel_values=dummy_images)
# img_feautures ([197, 10, 768])
img_features = outputs.last_hidden_state.transpose(0, 1) # 调整维度以匹配 Cross-Attention 的输入需求
# 通过 Cross-Attention 模块处理图像特征
attn_output = cross_attention(img_features)
# 打印输出结果的维度
# torch.Size([256, 10, 768])
print(attn_output.shape) # 输出形状为 (num_queries, batch_size, d_model)
# /Users/depeng.yao/Desktop/vit
qwen-vl的和
ureader——mllm
UReader:基于多模态大型语言模型的通用无ocr视觉情境语言理解_ureader: universal ocr-free visually-situated lang-CSDN博客
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import cv2
from torchvision.models import resnet50
def calculate_iou(box1, box2):
"""计算两个矩形框的交并比"""
x_left = max(box1[0], box2[0])
y_top = max(box1[1], box2[1])
x_right = min(box1[2], box2[2])
y_bottom = min(box1[3], box2[3])
if x_right < x_left or y_bottom < y_top:
return 0.0
intersection_area = (x_right - x_left) * (y_bottom - y_top)
box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])
iou = intersection_area / float(box1_area + box2_area - intersection_area)
return iou
class PositionalEncoding(nn.Module):
def __init__(self, grid_size, embed_size):
super(PositionalEncoding, self).__init__()
self.row_embed = nn.Embedding(grid_size[0], embed_size)
self.col_embed = nn.Embedding(grid_size[1], embed_size)
def forward(self, dims):
# 提取行和列的嵌入并将它们相加
return self.row_embed(dims[:, 0]) + self.col_embed(dims[:, 1])
def shape_adaptive_cropping(image, target_resolution=(224, 224)):
height, width, _ = image.shape
target_height, target_width = target_resolution
# 确定最佳网格大小
best_grid = (1, 1)
best_iou = 0
for nh in range(1, 6):
for nw in range(1, 6):
proposed_height = target_height * nh
proposed_width = target_width * nw
iou = calculate_iou((0, 0, width, height), (0, 0, proposed_width, proposed_height))
if iou > best_iou:
best_iou = iou
best_grid = (nh, nw)
nh, nw = best_grid
resized_image = cv2.resize(image, (nw * target_width, nh * target_height))
# 裁剪为小块
crops = []
for i in range(nh):
for j in range(nw):
crop = resized_image[i * target_height:(i + 1) * target_height, j * target_width:(j + 1) * target_width]
crops.append(crop)
return crops, resized_image
# 设置网格大小和嵌入维度
grid_size = (5, 5) # 假设最大网格大小
embed_size = 1000 # 假设与特征维度相同
# 初始化位置编码模块
positional_encoding = PositionalEncoding(grid_size, embed_size)
# 图像预处理操作
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# 加载视觉编码器
visual_encoder = resnet50(pretrained=True)
visual_encoder.eval() # 设置为评估模式
# 读取并处理图像
image_path = '/Users/depeng.yao/Downloads/demo.jpeg'
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# 执行形状自适应裁剪
crops, resized_image = shape_adaptive_cropping(image, target_resolution=(224, 224))
# 转换并编码图像
resized_features = visual_encoder(transform(resized_image).unsqueeze(0)) # 1*1000
crop_features = [visual_encoder(transform(crop).unsqueeze(0)) for crop in crops] # 25*1*1000
# 生成位置索引
positions = [(i, j) for i in range(grid_size[0]) for j in range(grid_size[1])][:len(crops)]
positions_tensor = torch.tensor(positions, dtype=torch.long) #(25,2)
# 生成位置嵌入
position_embeddings = positional_encoding(positions_tensor)
# 合并特征和位置嵌入
features = torch.cat([resized_features] + crop_features + [position_embeddings], dim=0) # 51(25+1+25)*1000
import ipdb;ipdb.set_trace()
print("ok")
Clip——mllm
训练完成CLIP后,可以直接做图像分类了(所谓的Zero-shot图像分类),原理其实也很简单:
model
import torch
import clip
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device) # 首次使用会默认下载clip模型
image = preprocess(Image.open("/Users/depeng.yao/Downloads/demo.jpeg")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)
import numpy as np
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
logits_per_image, logits_per_image = model(image, text)
logits=np.dot(logits_per_image,logits_per_image.T)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()
print("Label probs:", probs) # prints: [[0.9927937 0.00421068 0.00299572]]
多模态模型学习1——CLIP对比学习 语言-图像预训练模型_clip模型-CSDN博客
loss
代码:
其实就是embediing维度都64*128✖️一起变成64*64 接着变成labels的64
import torch
import torch.nn as nn
import torch.nn.functional as F
# 假设 visual_embedding 和 text_embedding 已经计算得到
# 这里直接使用随机数据来模拟这两个embedding矩阵
embedding_size = 128
num_samples = 64
# 模拟数据
visual_embedding = torch.randn(num_samples, embedding_size)
text_embedding = torch.randn(num_samples, embedding_size)
# 计算内积矩阵
similarity_matrix = torch.matmul(visual_embedding, text_embedding.t())
# 将内积矩阵的每一行通过softmax转换为概率分布
probabilities = F.softmax(similarity_matrix, dim=1)
# 创建标签:每个样本的正确文本匹配其自身的索引
labels = torch.arange(num_samples)
# 计算交叉熵损失
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(probabilities, labels)
print("Loss:", loss.item())
info-nce-loss
import torch
import torch.nn.functional as F
def info_nce_loss(image_features, text_features, temperature=0.07):
"""
计算InfoNCE损失。
参数:
- image_features: 图像特征,维度为(N, feature_dim)
- text_features: 文本特征,维度为(N, feature_dim)
- temperature: 温度参数,控制softmax的软度
返回:
- loss: 计算得到的损失值
"""
# 计算图像和文本之间的相似度矩阵
import ipdb;
ipdb.set_trace()
logits = torch.mm(image_features, text_features.t()) / temperature
# 对角线元素是正样本的相似度
labels = torch.arange(logits.shape[0], device=logits.device)
# 使用交叉熵损失计算InfoNCE损失 其实就是相乘得到类别的矩阵进行交叉函数
loss_i2t = F.cross_entropy(logits, labels)
loss_t2i = F.cross_entropy(logits.t(), labels)
# 返回图像到文本和文本到图像损失的平均值
return (loss_i2t + loss_t2i) / 2
# 假设的特征向量和批次大小
batch_size = 32
feature_dim = 256
image_features = torch.randn(batch_size, feature_dim)
text_features = torch.randn(batch_size, feature_dim)
# 计算损失
loss = info_nce_loss(image_features, text_features)
print("InfoNCE Loss:", loss.item())
余弦相似度
多模态模型学习1——CLIP对比学习 语言-图像预训练模型_clip模型-CSDN博客
余弦相似度clip用的是
BLIP
【BLIP/BLIP2/InstructBLIP】一篇文章快速了解BLIP系列(附代码讲解说明)-CSDN博客
blip1
第一个问题:大致流程是什么
第二个问题:的交叉注意力(CA)指的是什么
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
class CrossAttentionLayer(nn.Module):
def __init__(self, d_model, n_heads):
super(CrossAttentionLayer, self).__init__()
self.query_proj = nn.Linear(d_model, d_model)
self.key_proj = nn.Linear(d_model, d_model)
self.value_proj = nn.Linear(d_model, d_model)
self.n_heads = n_heads
self.d_k = d_model // n_heads
def forward(self, query, key, value, mask=None):
# 处理批次大小和头数
batch_size = query.size(0)
# 线性变换并分割成多头
query = self.query_proj(query).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
key = self.key_proj(key).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
value = self.value_proj(value).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
# 计算点积注意力
scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.d_k)
if mask is not None:
scores = scores.masked_fill(mask == 0, float('-inf'))
attn = F.softmax(scores, dim=-1)
# 获取加权的值
output = torch.matmul(attn, value).transpose(1, 2).contiguous().view(batch_size, -1, self.n_heads * self.d_k)
return output
# 示例用法
d_model = 512
n_heads = 8
seq_len_text = 10 # 文本序列长度
seq_len_img = 20 # 图像特征长度(假设图像特征被展平处理)
batch_size = 32
# 假设query来自文本编码器,key和value来自图像编码器
query_text = torch.randn(batch_size, seq_len_text, d_model) # 文本查询
key_image = torch.randn(batch_size, seq_len_img, d_model) # 图像键
value_image = torch.randn(batch_size, seq_len_img, d_model) # 图像值
cross_attention = CrossAttentionLayer(d_model, n_heads)
output = cross_attention(query_text, key_image, value_image)
print(output.shape) # 输出尺寸应与文本查询的尺寸相同
query_text 代表从文本模态编码后的特征,key_image 和 value_image 代表从图像模态编码后的特征。通过交叉注意力机制,模型能够基于图像特征来增强文本的表示,这在图像和文本的联合理解任务中非常关键。
问题3:对于itc函数:就是info函数,参照clip
import torch
import torch.nn as nn
import torch.nn.functional as F
def image_text_contrastive_loss(img_features, text_features, temperature=0.07):
"""
计算图像-文本对比损失
:param img_features: 图像特征张量,形状为 (batch_size, feature_dim)
:param text_features: 文本特征张量,形状为 (batch_size, feature_dim)
:param temperature: 温度参数,用于调整softmax的饱和度
:return: 损失值
"""
# 归一化特征向量
img_features = F.normalize(img_features, p=2, dim=1)
text_features = F.normalize(text_features, p=2, dim=1)
# 计算相似度矩阵
similarity_matrix = torch.matmul(img_features, text_features.t()) / temperature
# 目标:每个图像应与对应的文本匹配(对角线元素是正样本)
batch_size = img_features.size(0)
targets = torch.arange(batch_size).long().to(img_features.device)
# 计算损失
loss_i2t = F.cross_entropy(similarity_matrix, targets)
loss_t2i = F.cross_entropy(similarity_matrix.t(), targets)
# 返回总损失(双向损失的平均)
return (loss_i2t + loss_t2i) / 2
# 示例用法
batch_size = 32
feature_dim = 128
img_features = torch.randn(batch_size, feature_dim)
text_features = torch.randn(batch_size, feature_dim)
loss = image_text_contrastive_loss(img_features, text_features)
print("ITC Loss:", loss.item())
问题3:对于itm函数:将特征合起来,然后fc输出1,判断是和否就可以
import torch
import torch.nn as nn
import torch.nn.functional as F
class ImageTextMatchingModel(nn.Module):
def __init__(self, image_feature_dim, text_feature_dim, hidden_dim):
super(ImageTextMatchingModel, self).__init__()
# 可以根据需要调整这些维度
self.fc1 = nn.Linear(image_feature_dim + text_feature_dim, hidden_dim)
self.fc2 = nn.Linear(hidden_dim, 1) # 二元分类,输出匹配的概率
def forward(self, image_features, text_features):
# 图像和文本特征的简单拼接
combined_features = torch.cat((image_features, text_features), dim=1)
x = F.relu(self.fc1(combined_features))
logits = self.fc2(x)
return logits
def itm_loss(logits, labels):
"""计算二元交叉熵损失"""
loss = F.binary_cross_entropy_with_logits(logits, labels)
return loss
# 假设的特征维度和一些数据
image_feature_dim = 256
text_feature_dim = 256
hidden_dim = 128
# 创建模型实例
model = ImageTextMatchingModel(image_feature_dim, text_feature_dim, hidden_dim)
# 假设的输入数据
batch_size = 10
image_features = torch.randn(batch_size, image_feature_dim) # 随机生成图像特征
text_features = torch.randn(batch_size, text_feature_dim) # 随机生成文本特征
labels = torch.randint(0, 2, (batch_size, 1)).float() # 随机生成标签,0或1
# 前向传播
logits = model(image_features, text_features)
# 计算损失
loss = itm_loss(logits, labels)
print("Loss:", loss.item())
问题5:对于lm函数来说就是图片输出的结果直接和文字比较
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
class ImageCaptioningModel(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers, image_feature_dim):
super(ImageCaptioningModel, self).__init__()
self.embed = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_dim + image_feature_dim, vocab_size)
self.image_features = nn.Linear(image_feature_dim, hidden_dim)
def forward(self, image_features, captions):
# image_features: [batch_size, image_feature_dim]
# captions: [batch_size, seq_length]
embeddings = self.embed(captions[:, :-1]) # Exclude the last token
image_features_transformed = self.image_features(image_features).unsqueeze(1)
image_features_transformed = image_features_transformed.expand(-1, embeddings.size(1), -1)
lstm_input = torch.cat((embeddings, image_features_transformed), dim=2)
hiddens, _ = self.lstm(lstm_input)
outputs = self.fc(hiddens)
return outputs
def language_modeling_loss(outputs, targets):
# outputs: [batch_size, seq_length, vocab_size]
# targets: [batch_size, seq_length]
loss = F.cross_entropy(outputs.view(-1, outputs.size(-1)), targets.view(-1), ignore_index=0)
return loss
# Parameters
vocab_size = 10000 # Example vocabulary size
embed_dim = 256
hidden_dim = 512
num_layers = 2
image_feature_dim = 2048
seq_length = 20
batch_size = 32
# Model and optimization
model = ImageCaptioningModel(vocab_size, embed_dim, hidden_dim, num_layers, image_feature_dim)
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Dummy data
image_features = torch.randn(batch_size, image_feature_dim)
captions = torch.randint(1, vocab_size, (batch_size, seq_length))
# Forward pass
outputs = model(image_features, captions)
targets = captions[:, 1:] # Shift for predicting the next word
# Compute loss
loss = language_modeling_loss(outputs, targets)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("Loss:", loss.item())
blip2
【大模型系列】统一图文理解与生成(BLIP/BLIPv2/InstructBLIP)_大模型如何同时输入图文数据-CSDN博客
q-former的实现
from transformers import BertConfig, BertModel, BertTokenizer
import torch
import torch.nn as nn
from transformers import ViTModel
def init_Qformer(num_query_tokens, vision_width, freeze):
# 使用预训练的bert模型配置Q-Former,并明确启用解码功能
encoder_config = BertConfig.from_pretrained("/Users/depeng.yao/Desktop/yaodepeng/pythonProject/bert-base-uncased/config.json",
is_decoder=True, # 指定为解码器模型
add_cross_attention=True, # 在每个层中添加交叉注意力
hidden_size=vision_width,
num_hidden_layers=12,
cross_attention_layers=[i for i in range(12)]) # 每层都添加交叉注意力
import ipdb;ipdb.set_trace()
encoder_config.query_length = num_query_tokens
# 初始化Q-Former模型
Qformer = BertModel(config=encoder_config)
# 初始化查询标记
# 1*10*768
query_tokens = nn.Parameter(torch.zeros(1, num_query_tokens, encoder_config.hidden_size))
# 1*10*768
query_tokens.data.normal_(mean=0.0, std=encoder_config.initializer_range)
if freeze:
for param in Qformer.parameters():
param.requires_grad = False
Qformer.eval()
return Qformer, query_tokens
# query_tokens [1, 10, 768] image_features(1, 10, 768) text_input_ids[1, 6] text_attention_mask[1, 6] bert 语言模型
def forward_pass(Qformer, query_tokens, image_features, text_input_ids, text_attention_mask):
batch_size = text_input_ids.size(0)
# import ipdb;ipdb.set_trace()
# 对query_tokens进行扩展以匹配批次大小,并减少一个维度以匹配text_input_ids
repeated_query_tokens = query_tokens.repeat(batch_size, 1, 1) # 扩展以匹配batch_size
# [1, 10] query_token_ids
query_token_ids = torch.full((batch_size, 10), fill_value=30521) # 假设30522是特殊的token id
# 拼接input_ids
# input_ids [1, 16]
input_ids = torch.cat([query_token_ids, text_input_ids], dim=1)
# 创建一个全1的attention mask,形状与input_ids相匹配
# attention_mask [1,16]
query_attention_mask = torch.ones(batch_size, 10, dtype=torch.long) # 创建query部分的mask
attention_mask = torch.cat([query_attention_mask, text_attention_mask], dim=1)
# 执行模型前向传播
outputs = Qformer(input_ids=input_ids, attention_mask=attention_mask,
encoder_hidden_states=image_features, encoder_attention_mask=None,
return_dict=True)
return outputs
model = BertModel.from_pretrained("/Users/depeng.yao/Desktop/yaodepeng/pythonProject/bert-base-uncased")
# 示例使用
num_query_tokens = 10
vision_width = 768
freeze = False
Qformer, query_tokens = init_Qformer(num_query_tokens, vision_width, freeze)
image_features = torch.randn(1, 10, 768)
tokenizer = BertTokenizer.from_pretrained("/Users/depeng.yao/Desktop/yaodepeng/pythonProject/bert-base-uncased")
text = "a cat wearing sunglasses"
encoded_text = tokenizer(text, return_tensors="pt")
# {'input_ids': tensor([[ 101, 1037, 4937, 4147, 17072, 102]]),
# 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}
text_input_ids = encoded_text['input_ids']
text_attention_mask = encoded_text['attention_mask']
# query_tokens [1, 10, 768] image_features(1, 10, 768) text_input_ids[1, 6] text_attention_mask[1, 6] bert 语言模型
outputs = forward_pass(Qformer, query_tokens, image_features, text_input_ids, text_attention_mask)
# [1, 16, 768] 相当于图像输入什么就输出什么 图像输入的是[1, 16, 768] 文字输入的是[1, 768]
print(outputs.last_hidden_state.shape)
Llava
感觉还是数据更加细节了,输入更多的细节数据
import json
# 假设的GPT-4模型API调用函数
def call_gpt4_model(prompt):
# 这里只是一个示例函数,实际中应替换为调用OpenAI的API
print("调用GPT-4模型,输入:")
print(prompt)
return "这是模拟的回答。"
# 构建对话上下文和问题
messages = [
{"role": "user", "content": "What objects are in this image?"},
{"role": "assistant", "content": "The image contains a cat and a dog sitting on a mat."},
{"role": "user", "content": "What is the dog doing?"},
{"role": "assistant", "content": "The dog is playing with a ball."},
{"role": "user", "content": "How far is the cat from the dog?"},
{"role": "assistant", "content": "The cat is about two feet away from the dog."}
]
# 创建用于GPT-4的prompt
prompt = "\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
prompt += "\nUser: Are there any other animals in the picture?"
# 调用GPT-4模型
response = call_gpt4_model(prompt)
print("GPT-4回答:", response)
模型:还是clip的vit和语言编码器得到token后cat一起输入llm里面
Minigpt4
【LLM多模态】MiniGPT4模型结构和训练流程-CSDN博客
blip2与minigpt区别是什么?
Lora
import torch
import torch.nn as nn
class LoRA(nn.Module):
def __init__(self, d, r):
super(LoRA, self).__init__()
self.A = nn.Parameter(torch.randn(d, r))
self.B = nn.Parameter(torch.randn(r, d))
def forward(self, X):
return torch.matmul(self.A, self.B).matmul(X)
# 示例参数
d = 512 # 原始权重矩阵的维度
r = 32 # 低秩矩阵的秩
# 初始化 LoRA 层
lora_layer = LoRA(d, r)
# 输入张量
X = torch.randn(d, 128) # 假设有 128 个样本
# 前向传播
output = lora_layer(X)
print(output.shape)
DDPM原理
总体结构
AIGC系列之:DDPM原理解读(简单易懂版)_ddpm中的unet-CSDN博客
https://juejin.cn/post/7251391372394053691
AIGC专栏2——Stable Diffusion结构解析-以文本生成图像(文生图,txt2img)为例-CSDN博客
u-net模型
其实模型就是unet里面加了很多的attention
Stable Diffusion
dcl-mllm