大模型资料总结

1 gpt介绍

transformer的介绍

2598f7f0a702f2d3ef62b194f2413130.png

 output是label

gpt1——gpt4的经历

gpt1

主要背下面的这段

a0337fbb850ee71df22472ef155f89e2.png

 结构就是tansformer的encoder

19ba67d096208de0b55b7a9d9b8f97b8.png

 4558ecc8f1173506c7bf90ebc870ea41.png

c2e6ac21a260a6501c34cb42bd99f15a.png 7e69010fee2f682f88388347f4832f1f.png

 gpt2

8af2824262e219be9b7606fcd6333748.png

fd08f0fc404eca7aa414dacd1f0a3bec.png

Zero-shot learning(零样本学习)是机器学习中的一个概念,指的是模型在没有接收到任何针对特定任务的样本数据进行训练的情况下,能够完成这一任务的能力。这种学习方式与传统的机器学习方法形成对比,后者通常需要大量的标注数据来训练模型以完成特定的任务。 

gpt3 

8dbcd5668c747e838a0fc34085000c5e.png

d286028d92fb0a88cc94a5354e619a64.png

chatgpt

59c8f7cb6aca69684885c68032ace413.png

gpt4 

 

0335327a5e01f954c8e0d8c3c4b482e7.png

gpt本身如何使用

 1 openai的使用就是gpt变成程序设置可以看pdf讲义week02 -2

2 数据分析的主要用途,重点在于如何输入数据:

1 概括信息总部

products 该表单的shape为(32951, 9) 该表单包括如下特征['product_id', 'product_category_name', 'product_name_lenght', 'product_description_lenght', 'product_photos_qty', 'product_weight_g', 'product_length_cm', 'product_height_cm', 'product_width_cm'] order 该表单的shape为(99441, 8) 该表单包括如下特征['order_id', 'customer_id', 'order_status', 'order_purchase_timestamp', 'order_approved_at', 'order_delivered_carrier_date', 'order_delivered_customer_date', 'order_estimated_delivery_date'] payments 该表单的shape为(103886, 5) 该表单包括如下特征['order_id', 'payment_sequential', 'payment_type', 'payment_installments', 'payment_value'] customers 该表单的shape为(99441, 5) 该表单包括如下特征['customer_id', 'customer_unique_id', 'customer_zip_code_prefix', 'customer_city', 'customer_state']

2 抽取部分 

63b0ec8519f1041d9cb936a9a82bb575.png

3分段输入

a3e6bcc2cb9e07dea14d9727a7157307.png 4 提问gpt概括然后输入信息

41947698a01e4f8e24c0bea04f8f60e2.png d323584668e8e68fe06804be65ad4a80.png

080160f9377874e399896af5660e08f7.png

得到程序以后,直接得到信息的概括信息来计算

那些gpt插件可以用?   

pdf 分析pdf wolfrm数学处理 scholar al写论文 webpilot 找资料 

6dd2181ecca6b4a72525d5a8a42fb40b.png

vscode+chatgpt 

VSCode集成ChatGPT插件:ChatGPT中文版_vscode chatgpt插件-CSDN博客

excel如何使用

0fa72215a2b52a3510ed9db48f7df88e.png

我想在在L6:L15中填写信息 ,帮我生成excel公式,公式遵循这样的规则:"如果I5,J5和K5的值都等于1,那么这个客户被分类为""重要价值客户""。
如果I5和J5的值等于1,但K5的值等于0,那么这个客户被分类为""潜力客户""。
如果I5和K5的值等于1,但J5的值等于0,那么这个客户被分类为""重要深耕客户""。
如果I5的值等于1,但J5和K5的值等于0,那么这个客户被分类为""新客户""。
如果J5和K5的值等于1,但I5的值等于0,那么这个客户被分类为""重要唤回客户""。
如果J5的值等于1,但I5和K5的值等于0,那么这个客户被分类为""一般客户""。
如果K5的值等于1,但I5和J5的值等于0,那么这个客户被分类为""挽回客户""。
如果I5,J5和K5的值都等于0,那么这个客户被分类为""流失客户""。

大模型的种类

语言类大模型: openal的3 文字提到了GPT-3、GPT-3.5和GPT-4这几个版本。同时,还提到了模型的不同变体,例如Ada、Babbage、Curie和DaVinci,谷歌模型:palm2

e7b46d589831c41c275ee9d4b0500444.png

图像大模型:最新版为DALL·E OpenAI将大语言模型的理解能力“复制”到视觉领域的核心方法:将图像视作一种一种语言,将其转化为Token,并和文本Token一起进行训练;

语音识别:语音识别模型:最新版为Whisper v2-large model,可以调用api

文本向量化模型:embedding-ada-002 就是分析用的

总结一下:

2abc5e9c7f94962592c8d0db292b0832.png

text-davinci-003——gpt-3.5-turbo-instruct 现在模型改变了

6c95cfd1b11c8f17a51ac2b3f5c2f06d.png

eaaf04742f5ad2bbc369f2cc7bb673a9.png

 大模型训练过程

预训练+微调:微调就是微小的调整 就是特定的进行设置

8ab943df2ab6553e2c8d969fd062c364.png

大模型的概念:

自回归与生成式:前者用规律,后者会有随机性

6915be28c2e74865a78a4abacabb7c50.png 自回归和双向自回归:

主要的区别就在于自回归模型只看前文,而双向自回归模型会同时考虑前文和后文。

大模型微调的方法

openal只能微调在线大模型

使用开源模型进行微调

4900ba96d11560f0a7befac1cb7f5424.png

rlhf方法

基于强化学习做的方法什么是RLHF-CSDN博客 

d470ab2e5a5530ffad3764ceb0e913e7.png

lora方法 

f779adc80f611994f962cfb0dce08a11.png

他们的比较:

76b02fcd45ac805cf93dd52528258782.png

73e74fda1403bf52865e6b6c623ae9d1.png prefix tuning方法

136887a263c873c45640f7156db77030.png

0c04ee39d4c8a087f0374950883b375f.pngprompt tuning方法 

 轻量化微调的方法,选择一部分进行微调,实现有点难

LangChain

1c38b2bd4d3d351dcb0079b546d0534f.png

六个元素凭借构成的

6de4fc7c0b5e3e1b6eabe9624812d854.png

openai.Completion——gpt-3.5-turbo-instruct

openai.chat.Completion 

2 openal的调用

Completions 

Completions 的基本介绍

是最原始的一类,与chatxxx比起来:

8f4b5ec580b7179742c2eca1c379c94b.png

def chat_now(model='gpt-3.5-turbo-instruct',mode='balance'):
    """
    基于Completion.create函数的多轮对话机器人
    
    :param model: 调用的大语言模型,默认为text-davinci-003
    :param mode: 聊天机器人预设模式,默认为平衡模式balance,可选precision(精确模式)和creativity(创造力模式)

    """
    # 提示想终止聊天时输入"quit"
    print("if you want to stop the conversation, please input 'quit'") 
    # 三种不同的模式及其对应的参数
    if mode == 'balance':
        temperature = 1
        presence_penalty = 0
    elif mode == 'precision':
        temperature = 0.8
        presence_penalty = 2
    elif mode == 'creativity':
        temperature = 1.2
        presence_penalty = -1     
    
    # 定义执行对话函数,方便后续反复调用
    def chat(prompt):
        try:
            # 不报错的情况下,返回Completion.create函数输出结果
            response = openai.Completion.create(
                       model = model,
                       prompt = prompt,
                       max_tokens = 1000,
                       temperature=temperature, 
                       presence_penalty=presence_penalty,
                       stop = [" Human:", " AI:"]
                       )

            answer = response["choices"][0]["text"].strip()
            return answer
        except Exception as exc:
            # 报错时返回"broken"
            return "broken"

    # 对话执行函数,首先准备空容器
    text = "" 
    turns = [] 
    # 执行多轮对话,即多次调用chat函数
    while True: 
        # 启动对话框
        question = input()
        # 首次开启对话框时提示请输入问题
        if len(question.strip()) == 0: 
            print("please input your question")
        # 当输入为'quit'时,停止多轮对话,即停止while循环
        elif question == "quit":  
            print("\nAI: See You Next Time!")
            break
        else:
            # 多轮对话时,将问题和此前对话结果都作为prompt输入
            prompt = text + "\nHuman: " + question
            result = chat(prompt)
            # 当一次请求失败时,再次发起请求
            while result == "broken": 
                print("please wait...")
                result = chat(prompt) 
            else:
                # 保留本次对话结果
                turns += [question] + [result]
                print(result)
            # 最多保留十次对话结果,超出次数则最开始的对话会被删除
            if len(turns)<=10:  
                text = " ".join(turns)
            else:
                text = " ".join(turns[-10:])

- model:必选参数,具体调用的Completions模型名称,可以调用的模型包括text-davinci-003、text-davinci-002、text-curie-001、text-babbage-001、text-ada-001等,不同模型参数规模不同;这里需要注意,大模型领域不同于机器学习领域,后者哪怕是简单模型在某些场景下可能也会拥有比复杂模型更好的表现。在大模型领域,(就OpenAI提供的A、B、C、D四大模型来看)参数规模越大、越新版本的模型效果更好(当然费用也更高),因此课程中主要以text-davinci-003使用为例进行讲解;
- prompt:必选参数,提示词;
- suffix:可选参数,默认为空,具体指模型返回结果的后缀;
- max_tokens:可选参数,默认为16,代表返回结果的token数量;
- temperature:可选参数,取值范围为0-2,默认值为1。参数代表采样温度,**数值越小,则模型会倾向于选择概率较高的词汇,生成的文本会更加保守;而当temperature值较高时,模型会更多地选择概率较低的词汇,生成的文本会更加多样; 不同模式的风格就是temperture。**
- top_p:可选参数,取值范围为0-1,默认值为1,和temperature作用类似,用于控制输出文本的随机性,数值越趋近与1,输出文本随机性越强,越趋近于0文本随机性越弱;通常来说若要调节文本随机性,**top_p和temperature两个参数选择一个进行调整即可;这里更推荐使用temperature参数进行文本随机性调整;**
- n:可选参数,默认值为1,表示一个提示返回几个Completion;
- stream:可选参数,默认值为False,表示回复响应的方式,当为False时,模型会等待返回结果全部生成后一次性返回全部结果,而为True时,则会逐个字进行返回;
- logprobs:可选参数,默认为null,该参数用于指定模型返回前N个概率最高的token及其对数概率。例如,如果logprobs设为10,那么对于生成的每个token,API会返回模型预测的前10个token及其对数概率;
- echo:可选参数,默认为False,该参数用于控制模型是否应该简单地复述用户的输入。如果设为True,模型的响应会尽可能地复述用户的输入;
- stop:可选参数,默认为null,该参数接受一个或多个字符串,用于指定生成文本的停止信号。当模型生成的文本遇到这些字符串中的任何一个时,会立即停止生成。这可以用来控制模型的输出长度或格式;
- presence_penalty:可选参数,默认为0,取值范围为[-2, 2],该参数用于调整模型生成新内容(例如新的概念或主题)的倾向性。较高的值会使模型更倾向于生成新内容,而较低的值则会使模型更倾向于坚持已有的内容,当返回结果篇幅较大并且存在前后主题重复时,可以提高该参数的取值;
- frequency_penalty:可选参数,默认为0,取值范围为[-2, 2],该参数用于调整模型重复自身的倾向性。较高的值会使模型更倾向于避免重复,而较低的值则会使模型更可能重复自身;当返回结果篇幅较大并且存在前后语言重复时,可以提高该参数的取值;
- **best_of:该参数用于控制模型的生成过程。它会让模型进行多次尝试(例如,生成5个不同的响应),然后选择这些响应中得分最高的一个;**
- logit_bias:该参数接受一个字典,用于调整特定token的概率。字典的键是token的ID,值是应用于该token的对数概率的偏置;在GPT中我们可以使用tokenizer tool查看文本Token的标记。一般不建议修改;
- user:可选参数,使用用户的身份标记,可以通过人为设置标记,来注明当前使用者身份。需要注意的是,Completion.create函数中的user和后续介绍的对话类模型的user参数含义并不相同,需要注意区分; 

如何提高大模型能力

1 可以采用few——shot和one——shot方法进行、Zero-shot-CoT与Few-shot-CoT方法进行

90fc5fd392c3a62b76b4ec66b1ea65b3.png

14b6c557e14e938336658937d62d2bf8.png

 few与zero指的是多和少的问题,cot指的是思维链条的问题

思维链条注重于分析:

beb9f9aca5adb04037cf64fbb7125253.png

faa0c6c7e71791b932a6e44da49d8c20.png

 效果是思维大于无思维,多大于少

2 采用特殊值:“Let’s think step by step”其实是一句“具有魔法”的语句,最终判断将其翻译为“请一步步进行推理并得出结论”。用思维链条的方式进行设置会好很多

4992f718842dbcb04ddaba98cf5a24fb.png

453fafdc921984e847b186ee0a12a4c5.png

 3 Ltm提示法(least to most prompting)

3.1 一个提示法则

分成几个问题进行回答,主问题分出几个子问题来进行回答。

c0887c615689b4dcbc7809a6edbc170e.png

7060c58c777df3f7d02841990a7b3fe6.png

3.2 一个提示法则

多个提示方法

 scan数据集

1 介绍:

dbc0741e0d40f7f72cbabcd914bd9744.png

707cec92f4271dc25b78754ff80a48e3.png

2  方法:

67c6aa50c589959e2eabaad7dfcf4288.png

 详细代码:

def SCAN_predict(dataSet=scan_test, model="text-davinci-003", CD_Few_shot=CD_Few_shot, CM_Few_shot=CM_Few_shot):
    # 转化为dataframe
    data_frame = dataSet.to_pandas()
    # 最后一列标记为unkown
    data_frame['actions_predict'] = 'unkown'
    # 在字典中循环
    for i,data in enumerate(dataSet):
        # 阶段一:拆解命令
        prompt_CD = CD_Few_shot + 'Q:“%s” A:' % data['commands']
        response_CD = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompt_CD,
              temperature=0.5,
              max_tokens=1000
              )
        # 拆解命令结果
        CD_result = extract_phrases(response_CD["choices"][0]["text"].strip())
        # 阶段二:短命令翻译
        CM_Few_shot_temp = CM_Few_shot
        sub_qs = CD_result
        for qs in sub_qs:
            CM_Few_shot_temp += 'Q:“%s” A:' % qs
            response_CM = openai.Completion.create(
                                model="text-davinci-003",
                                prompt=CM_Few_shot_temp,
                                temperature=0.5,
                                max_tokens=1000,
                                )
            CM_Few_shot_temp += response_CM["choices"][0]["text"].strip()
        # 对原始问题提问
        prompt_CM = CM_Few_shot_temp + 'Q:“%s” A:' % data['commands']
        response_CM = openai.Completion.create(
              model="text-davinci-003",
              prompt=prompt_CM,
              temperature=0.5,
              max_tokens=1000,
              )
        # 将结果保存在dataframe的对应位置
        data_frame['actions_predict'][i] = transform_expression(CM_result)
        
    return data_frame

流程:

49f7dbca48cf84c8ba95f01791f157db.png

Chat completion

基本介绍

week 06 ch7

 特点:强化了对话能力

cd2853b7183454fcfde6f4229b4a7ac4.png

77f5d4b253a44cc26e76a0e0824bd49e.png

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": "请问什么是机器学习?"}
  ]
)

message例子

response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": "请问什么是机器学习?"},
    {"role": "user", "content": "请问什么是决策树算法?"}
  ]
)

 加一个例子:

0c5ed4007975c96dd193b0dbe746f1e1.png

abe9ce693c357ad7619be3e5501bb88b.png 49791814cfed810996ae4ac79a74f4e3.png

826a9c37b84ce6368ce8296bab3ec25c.png 5b749254b7c558c3573fe17708a2f5f0.png

一个经典例子

2af7c143ac99ddacde3b2b8d31f89c11.png 8dccf1980551e45f9b3b650fe1df9611.png

 6c5824cd271c14240b01cc46304de4d8.png

6c4b9da2d931ed006aed78529d5ad4c6.png

 补充一个知识点:大模型里面可以多用json对象

be7abe1a3b385108d2cdd3c05e15d9c1.png

例如:

df = pd.DataFrame({'x1':[1, 2], 'x2':[3, 4]})
# df是dataframe的格式 

response = openai.ChatCompletion.create(
  model="gpt-4-0613",
  messages=[
    {"role": "system", "content": "数据集df_json:'%s'" % df.to_json(orient='records')},
    {"role": "user", "content": "请帮我解释下df_json数据集"}
  ]
)
response.choices[0].message['content']

function函数

85ddad551d4cc97590f59695029157a1.png

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")

import numpy as np
import pandas as pd

import json
import io

df = pd.DataFrame({'x1':[1, 2], 'x2':[3, 4]})
chen_ming_function = {"name": "chen_ming_algorithm",
                      "description": "用于执行陈明算法的函数,定义了一种特殊的数据集计算过程",
                      "parameters": {"type": "object",
                                     "properties": {"data": {"type": "string",
                                                             "description": "执行陈明算法的数据集"},
                                                   },
                                     "required": ["data"],
                                    },
                     }

messages=[
    {"role": "system", "content": "数据集data:%s,数据集以字符串形式呈现" % df_str},
    {"role": "user", "content": "请在数据集data上执行陈明算法"}
]
functions = [chen_ming_function]

response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        functions=functions,
        function_call="auto",  
    )

a0b097290b00915cffe3a2b6be157c33.png 9d16f3359f0c02a1c6ce7ec83d0e861a.png

设置案例外部函数

week07.ch9

function calling

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")

import numpy as np
import pandas as pd

import json
import io
import inspect
import requests


# 创建一个DataFrame
df = pd.DataFrame({'x1':[1, 2], 'x2':[3, 4]})

df_str = df.to_string()

data = io.StringIO(df_str)

df_new = pd.read_csv(data, sep='\s+', index_col=0)



def chen_ming_algorithm(data):
    """
    陈明算法函数,该函数定义了一种特殊的数据集计算过程
    :param data: 必要参数,表示带入计算的数据表,用字符串进行表示
    :return:陈明函数计算后的结果,返回结果为表示为JSON格式的Dataframe类型对象
    """
    df_new = pd.read_json(data)
    res = np.sum(df_new, axis=1) - 1
    return res.to_json(orient='records')


def auto_functions(functions_list):
    """
    Chat模型的functions参数编写函数
    :param functions_list: 包含一个或者多个函数对象的列表;
    :return:满足Chat模型functions参数要求的functions对象
    """
    def functions_generate(functions_list):
        # 创建空列表,用于保存每个函数的描述字典
        functions = []
        # 对每个外部函数进行循环
        for function in functions_list:
            # 读取函数对象的函数说明
            function_description = inspect.getdoc(function)
            # 读取函数的函数名字符串
            function_name = function.__name__

            system_prompt = '以下是某的函数说明:%s' % function_description
            user_prompt = '根据这个函数的函数说明,请帮我创建一个JSON格式的字典,这个字典有如下5点要求:\
                           1.字典总共有三个键值对;\
                           2.第一个键值对的Key是字符串name,value是该函数的名字:%s,也是字符串;\
                           3.第二个键值对的Key是字符串description,value是该函数的函数的功能说明,也是字符串;\
                           4.第三个键值对的Key是字符串parameters,value是一个JSON Schema对象,用于说明该函数的参数输入规范。\
                           5.输出结果必须是一个JSON格式的字典,只输出这个字典即可,前后不需要任何前后修饰或说明的语句' % function_name

            response = openai.ChatCompletion.create(
                              model="gpt-3.5-turbo",
                              messages=[
                                {"role": "system", "content": system_prompt},
                                {"role": "user", "content": user_prompt}
                              ]
                            )
            functions.append(json.loads(response.choices[0].message['content']))
        return functions
    
    max_attempts = 3
    attempts = 0

    while attempts < max_attempts:
        try:
            functions = functions_generate(functions_list)
            break  # 如果代码成功执行,跳出循环
        except Exception as e:
            attempts += 1  # 增加尝试次数
            print("发生错误:", e)
            if attempts == max_attempts:
                print("已达到最大尝试次数,程序终止。")
                raise  # 重新引发最后一个异常
            else:
                print("正在重新运行...")
    return functions


def zhao_min_algorithm(data):
    """
    赵敏算法函数,该函数定义了一种特殊的数据集计算过程
    :param data: 必要参数,表示带入计算的数据表,用字符串进行表示
    :return:赵敏函数计算后的结果,返回结果为表示为JSON格式的Dataframe类型对象
    """
    df_new = pd.read_json(data)
    res = np.sum(df_new, axis=1) + 1
    return res.to_json(orient='records')

def run_conversation(messages, functions_list=None, model="gpt-4-0613"):
    """
    能够自动执行外部函数调用的Chat对话模型
    :param messages: 必要参数,字典类型,输入到Chat模型的messages参数对象
    :param functions_list: 可选参数,默认为None,可以设置为包含全部外部函数的列表对象
    :param model: Chat模型,可选参数,默认模型为gpt-4
    :return:Chat模型输出结果
    """
    # 如果没有外部函数库,则执行普通的对话任务
    if functions_list == None:
        response = openai.ChatCompletion.create(
                        model=model,
                        messages=messages,
                        )
        response_message = response["choices"][0]["message"]
        final_response = response_message["content"]
        
    # 若存在外部函数库,则需要灵活选取外部函数并进行回答
    else:
        # 创建functions对象
        functions = auto_functions(functions_list)
        # 创建外部函数库字典
        available_functions = {func.__name__: func for func in functions_list}

        # first response
        response = openai.ChatCompletion.create(
                        model=model,
                        messages=messages,
                        functions=functions,
                        function_call="auto")
        response_message = response["choices"][0]["message"]

        # 判断返回结果是否存在function_call,即判断是否需要调用外部函数来回答问题
        if response_message.get("function_call"):
            # 需要调用外部函数
            # 获取函数名
            function_name = response_message["function_call"]["name"]
            # 获取函数对象
            fuction_to_call = available_functions[function_name]
            # 获取函数参数
            function_args = json.loads(response_message["function_call"]["arguments"])
            # 将函数参数输入到函数中,获取函数计算结果
            function_response = fuction_to_call(**function_args)

            # messages中拼接first response消息
            messages.append(response_message)  
            # messages中拼接函数输出结果
            messages.append(
                {
                    "role": "function",
                    "name": function_name,
                    "content": function_response,
                }
            )  
            # 第二次调用模型
            second_response = openai.ChatCompletion.create(
                model=model,
                messages=messages,
            )  
            # 获取最终结果
            final_response = second_response["choices"][0]["message"]["content"]
        else:
            final_response = response_message["content"]
    
    return final_response



run_conversation(messages = messages, functions_list = functions_list)

#多轮对话函数

def chat_with_model(functions_list=None, 
                    prompt="你好呀", 
                    model="gpt-4-0613", 
                    system_message=[{"role": "system", "content": "你是以为乐于助人的助手。"}]):
    
    messages = system_message
    messages.append({"role": "user", "content": prompt})
    
    while True:           
        answer = run_conversation(messages=messages, 
                                    functions_list=functions_list, 
                                    model=model)
        
        
        print(f"模型回答: {answer}")

        # 询问用户是否还有其他问题
        user_input = input("您还有其他问题吗?(输入退出以结束对话): ")
        if user_input == "退出":
            break

        # 记录用户回答
        messages.append({"role": "user", "content": user_input})



functions_list = [chen_ming_algorithm, zhao_min_algorithm]
functions = auto_functions(functions_list)
#function_dict = {func.__name__: func for func in functions_list}
messages = [
        {"role": "system", "content": "数据集data:%s,数据集以字符串形式呈现" % df_str},
        {"role": "user", "content": '请在data上执行陈明算法'}]



run_conversation(messages = messages, functions_list = functions_list)

补充一下Llama

11d1cbb555133a3ba21d2d0537ee747b.png

48bf75f0fb7233bb11dc283f0708de5b.png google api

google api的使用_谷歌api-CSDN博客

week 8 ch.11 

from googleapiclient.discovery import build
from google.oauth2.credentials import Credentials
import base64
import email
from email import policy
from email.parser import BytesParser

# 从本地文件中加载凭据
creds = Credentials.from_authorized_user_file('token.json')

# 创建 Gmail API 客户端
service = build('gmail', 'v1', credentials=creds)

# 列出用户的一封最新邮件
results = service.users().messages().list(userId='me', maxResults=1).execute()
messages = results.get('messages', [])

# 遍历邮件
for message in messages:
    # 获取邮件的详细信息
    msg = service.users().messages().get(userId='me', id=message['id']).execute()

    # 获取邮件头部信息
    headers = msg['payload']['headers']

    # 提取发件人、发件时间
    From, Date = "", ""
    for h in headers:
        name = h['name']
        if name.lower() == 'from':
            From = h['value']
        if name.lower() == 'date':
            Date = h['value']

    # 提取邮件正文
    if 'parts' in msg['payload']:
        part = msg['payload']['parts'][0]
        if part['mimeType'] == 'text/plain':
            data = part['body']["data"]
        else:
            data = msg['payload']['body']["data"]
    else:
        data = msg['payload']['body']["data"]
        
    data = data.replace("-","+").replace("_","/")
    decoded_data = base64.b64decode(data)
    str_text = str(decoded_data, "utf-8")
    msg_str = email.message_from_string(str_text)

    if msg_str.is_multipart():
        text = msg_str.get_payload()[0]  
    else:
        text = msg_str.get_payload()
    
    print('From: {}'.format(From[:8]))
    print('Date: {}'.format(Date))
    print('Content: {}'.format(text))

c10b3cc104bbf1d2be7912e3ca57757b.pngd2fa3c2c46d94c4817c71c999edbf545.png

此时加上gpt的api

response = openai.ChatCompletion.create(
  model="gpt-4-0613",
  messages=[
    {"role": "system", "content": "这是我的Gmail邮箱最近一封邮件的内容:%s" % msg},
    {"role": "system", "content": "邮件内容是由Gmail API获取"},
    {"role": "user", "content": "请问我的Gmail最近一封邮件是谁发送的,具体内容是什么?"}
  ]
)
response.choices[0].message['content']

3 nlp的基础信息 与大模型

【NLP】NLP基础知识_nlp学习-CSDN博客

3.1 预处理步骤

先进行分词,Python Jieba库

# 	1.中文分词
# “结巴”Python中文分词组件
# * 支持三种分词模式:
#  - 精确模式,试图将句子最精确地切开,适合文本分析;
#  - 全模式,把句子中所有的可以成词的词语都扫描出来, 速度非常快,但是不能解决歧义;
#  - 搜索引擎模式,在精确模式的基础上,对长词再次切分,提高召回率,适合用于搜索引擎分词。
# * 支持繁体分词
# * 支持自定义词典
import jieba
# 基本功能
# jieba.cut 方法接受三个输入参数: 需要分词的字符串;cut_all 参数用来控制是否采用全模式;HMM 参数用来控制是否使用 HMM 模型
# jieba.cut_for_search 方法接受两个参数:需要分词的字符串;是否使用 HMM 模型。该方法适合用于搜索引擎构建倒排索引的分词,粒度比较细
seg_list = jieba.cut("我来到北京清华大学", cut_all=True)
print("【全模式】: " + ", ".join(seg_list))  # 全模式

seg_list = jieba.cut("我来到北京清华大学", cut_all=False)
print("【精确模式】:" + ", ".join(seg_list))  # 精确模式

seg_list = jieba.cut("他来到了网易杭研大厦")  # “杭研”并没有在词典中,但是也被Viterbi算法识别出来了
print("【新词识别】:"+", ".join(seg_list))

seg_list = jieba.cut_for_search("小明硕士毕业于中国科学院,后在日本京都大学深造")  # 搜索引擎模式
print("【搜索引擎模式】:"+", ".join(seg_list))


# 	2 自定义词典
# 用法: jieba.load_userdict(file_name) # file_name 为文件类对象或自定义词典的路径
# 词典格式和 dict.txt 一样,一个词占一行;每一行分三部分:词语、词频(可省略)、词性(可省略),用空格隔开,顺序不可颠倒。file_name 若为路径或二进制方式打开的文件,则文件必须为 UTF-8 编码。
# eg:
# ```
# 创新办 3 i
# 云计算 5
# 凱特琳 nz
# 台中
# ```
test_sent = (
"例如我输入一个带“韩玉赏鉴”的标题,在自定义词库中也增加了此词为N类\n"
"「台中」正確應該不會被切開。mac上可分出「石墨烯」;此時又可以分出來凱特琳了。"
)
words = jieba.cut(test_sent)
print('/'.join(words))

jieba.load_userdict("data/userdict.txt") #加载用户词典
words = jieba.cut(test_sent)
print('/'.join(words))


# 使用 add_word(word, freq=None, tag=None) 和 del_word(word) 可在程序中动态修改词典。
jieba.add_word('石墨烯')
jieba.add_word('雷课教育')
words = jieba.cut(test_sent)
print('/'.join(words))

# 使用 suggest_freq(segment, tune=True) 可调节单个词语的词频,使其能(或不能)被分出来。
print('/'.join(jieba.cut('如果放到post中将出错。')))
jieba.suggest_freq(('中', '将'), True)
print('/'.join(jieba.cut('如果放到post中将出错。')))

fbcb07693ce3bdb0570ba4a3b717a38f.png

 

 然后进行编码

先是简单文本表示(one-hot和词袋模型)

import jieba
texts = ['Python是目前最流行的数据分析和机器学习编程语言',
         'Python语言编程将很快成为 各个高校的必修课',
         'Python是科研工作者开展科学研究的高效工具']


from keras.preprocessing.text import Tokenizer
tk = Tokenizer()
# 创建单词索引
tk.fit_on_texts(sentences)
print(tk.word_index)

# 把单词转换为序列
seqs = tk.fit_on_texts(sentences)
seqs = tk.texts_to_sequences(sentences)
for seq in seqs:
    print(seq)

‘’‘
[1, 3, 4, 5, 6, 2, 7, 8, 9, 10, 11]
[1, 12, 13, 14, 15, 16, 17, 18, 2, 19]
[1, 3, 20, 21, 22, 23, 2, 24, 25]
’‘’

#one hot编码
one_hot_results = tk.texts_to_matrix(sentences, mode='binary')
for one_hot_result in one_hot_results:
    print(one_hot_result)
    len(one_hot_result)
‘’‘
[0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0.]
[0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0.
 0. 0.]
[0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1.
 1. 1.]
’‘’

简单分类的方法

927c67b7d5ffdb7ce1ef0f5b17300290.png

6435937597d780a5e7504b0d8c25bd71.png

onehot问题

5599f3ef6ecad0d7a8cf428f378a16da.png

0094021edf6381b803d3c6172de127ac.png doc2bow方法来计算

from gensim import corpora

# 示例文本数据,每个文档是单词的列表
texts = [
    ['human', 'interface', 'computer'],
    ['survey', 'user', 'computer', 'system', 'response', 'time'],
    ['eps', 'user', 'interface', 'system'],
    ['system', 'human', 'system', 'eps'],
    ['user', 'response', 'time'],
    ['trees'],
    ['graph', 'trees'],
    ['graph', 'minors', 'trees'],
    ['graph', 'minors', 'survey']
]

# 创建字典
dictionary = corpora.Dictionary(texts)

# 使用 doc2bow 转换第一个文档
print(dictionary)

tf-idf的表示方法

from gensim import corpora, models

texts = [
    ['human', 'interface', 'computer'],
    ['survey', 'user', 'computer', 'system', 'response', 'time'],
    ['eps', 'user', 'interface', 'system'],
    ['system', 'human', 'system', 'eps'],
    ['user', 'response', 'time'],
    ['trees'],
    ['graph', 'trees'],
    ['graph', 'minors', 'trees'],
    ['graph', 'minors', 'survey']
]

# 创建一个词典,将文本数据中的每个单词与一个唯一的整数ID关联
dictionary = corpora.Dictionary(texts)

# 使用词典将每个文本转换为词袋模型表示的向量(单词ID和单词在文档中出现的次数)
corpus = [dictionary.doc2bow(text) for text in texts]

# 使用语料库训练 TF-IDF 模型
tfidf = models.TfidfModel(corpus)

# 使用 TF-IDF 模型转换整个语料库
corpus_tfidf = tfidf[corpus]

# 打印每个文档的 TF-IDF 向量
for doc in corpus_tfidf:
    print(doc)

答案是

[(0, 0.5773502691896257), (1, 0.5773502691896257), (2, 0.5773502691896257)]
[(0, 0.44424552527467476), (3, 0.44424552527467476), (4, 0.44424552527467476), (5, 0.3244870206138555), (6, 0.44424552527467476), (7, 0.3244870206138555)]
[(2, 0.5710059809418182), (5, 0.4170757362022777), (7, 0.4170757362022777), (8, 0.5710059809418182)]
[(1, 0.49182558987264147), (5, 0.7184811607083769), (8, 0.49182558987264147)]
[(3, 0.6282580468670046), (6, 0.6282580468670046), (7, 0.45889394536615247)]
[(9, 1.0)]
[(9, 0.7071067811865475), (10, 0.7071067811865475)]
[(9, 0.5080429008916749), (10, 0.5080429008916749), (11, 0.695546419520037)]
[(4, 0.6282580468670046), (10, 0.45889394536615247), (11, 0.6282580468670046)]

7c809f99cda891e42c871b799f102ba8.png

LDA(潜在狄利克雷分配Latent Dirichlet Allocation) 

from gensim import corpora, models

# 给定的文本数据
texts = [
    ['human', 'interface', 'computer'],
    ['survey', 'user', 'computer', 'system', 'response', 'time'],
    ['eps', 'user', 'interface', 'system'],
    ['system', 'human', 'system', 'eps'],
    ['user', 'response', 'time'],
    ['trees'],
    ['graph', 'trees'],
    ['graph', 'minors', 'trees'],
    ['graph', 'minors', 'survey']
]

# 创建一个词典,将文本数据中的每个单词与一个唯一的整数ID关联
dictionary = corpora.Dictionary(texts)

# 使用词典将每个文本转换为词袋模型表示的向量
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)

print()
# 创建 LDA 模型的实例
lda = models.LdaModel(corpus=corpus, id2word=dictionary, num_topics=2, random_state=100, update_every=1, chunksize=10, passes=10, alpha='auto', per_word_topics=True)


    
# 打印出每个主题的单词及其权重
for idx, topic in lda.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))




写出不同的主题相当于自己设定的每个主题 

8f17cb6714da0819d33f68f17edc8ec6.pngcorpus

corpus 
[(0, 1), (1, 1), (2, 1)],
 [(0, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1)],
 [(2, 1), (5, 1), (7, 1), (8, 1)],
 [(1, 1), (5, 2), (8, 1)],
 [(3, 1), (6, 1), (7, 1)],
 [(9, 1)],
 [(9, 1), (10, 1)],
 [(9, 1), (10, 1), (11, 1)],
 [(4, 1), (10, 1), (11, 1)]]

对于每个数据的分析

corpus_lda = lda[corpus]
for doc in corpus_lda:
    print(doc)
len(corpus_lda)

192ef113db7cfd3937fc30b273a7b308.pngd825342a9e6684498c0902572ca3cd7d.png5898e437e41a6cea764a6faae046ee30.png

Word2Vec

import jieba

# 初始化分词器
jieba.initialize()

texts = [
    'Python是目前最流行的数据分析和机器学习编程语言',
    'Python语言编程将很快成为各个高校的必修课',
    'Python是科研工作者开展科学研究的高效工具'
]

# 分词处理
texts_tokens = [list(jieba.cut(text)) for text in texts]
texts_tokens

‘’‘
[['Python', '是', '目前', '最', '流行', '的', '数据分析', '和', '机器', '学习', '编程语言'],
 ['Python', '语言', '编程', '将', '很快', '成为', '各个', '高校', '的', '必修课'],
 ['Python', '是', '科研', '工作者', '开展', '科学研究', '的', '高效', '工具']]
’‘’
from gensim.models import Word2Vec

# 训练模型
model = Word2Vec(texts_tokens, vector_size=2, window=5, min_count=1, workers=4)
texts_tokens
for tokens in texts_tokens:
    for token in tokens:
        vector = model.wv[token]
        print(f'词语:{token} -> 向量:{vector[:10]}...')  # 只显示向量的前10个元素


词语:Python -> 向量:[-0.02688289  0.01180699]...
词语:是 -> 向量:[-0.4651475  -0.35584044]...
词语:目前 -> 向量:[-0.25068876 -0.18806627]...
词语:最 -> 向量:[ 0.3690433  -0.07661679]...
词语:流行 -> 向量:[-0.22682734  0.3279404 ]...
词语:的 -> 向量:[0.25516748 0.45046365]...
词语:数据分析 -> 向量:[-0.24306364 -0.09094367]...
词语:和 -> 向量:[0.14385079 0.0496012 ]...
词语:机器 -> 向量:[-0.41426075 -0.4724409 ]...
词语:学习 -> 向量:[0.36556992 0.2533784 ]...
词语:编程语言 -> 向量:[0.3378655  0.03817212]...
词语:Python -> 向量:[-0.02688289  0.01180699]...
词语:语言 -> 向量:[0.32285434 0.4486569 ]...
词语:编程 -> 向量:[0.24893288 0.46179345]...
词语:将 -> 向量:[-0.37610796 -0.19678326]...
词语:很快 -> 向量:[-0.37554762 -0.04633344]...
词语:成为 -> 向量:[ 0.47686377 -0.3659358 ]...
词语:各个 -> 向量:[-0.11690424 -0.09688386]...

doc2vec

import jieba
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.utils import simple_preprocess

# 文本数据
texts = [
    'Python是目前最流行的数据分析和机器学习编程语言',
    'Python语言编程将很快成为各个高校的必修课',
    'Python是科研工作者开展科学研究的高效工具'
]

# 使用jieba进行分词并创建TaggedDocument
documents = [TaggedDocument(words=list(jieba.cut(text)), tags=[i]) for i, text in enumerate(texts)]

# 训练Doc2Vec模型
model = Doc2Vec(documents, vector_size=3, window=5, min_count=1, workers=4, epochs=40)

# 获取并打印文档向量
for i in range(len(texts)):
    vector = model.dv[i]
    print(f'文档 {i} 的向量: {vector[:10]}...')  # 显示向量的前10个元素

5721e42c741b6363e0e449442a1099ef.png 他们的区别:

de8a3f4632c9bdc90a06de246482083d.png

243d3c79181f36c7b94b2df0624d8724.png

3.2 textcnn 

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

class TextDataset(Dataset):
    def __init__(self, texts, labels):
        self.texts = texts
        self.labels = labels

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        return self.texts[idx], self.labels[idx]

class TextCNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_classes, filter_sizes, num_filters):
        super(TextCNN, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.convs = nn.ModuleList([
            nn.Conv2d(1, num_filters, (k, embed_dim)) for k in filter_sizes
        ])
        self.dropout = nn.Dropout(0.5)
        self.fc = nn.Linear(len(filter_sizes) * num_filters, num_classes)

    def forward(self, x):
        # 2*7

        x = self.embedding(x)  # [batch_size, seq_length, embed_dim] 2, 7, 300
        x = x.unsqueeze(1)  # [batch_size, 1, seq_length, embed_dim] 2, 1,7, 300

        x = [torch.relu(conv(x)).squeeze(3) for conv in self.convs]  # list of [batch_size, num_filters, ~]
        #x=[2,100,5]  x=[2,100,3] x=[2,100,4]
        ##
        x = [torch.max(pool, 2)[0] for pool in x]  # list of [batch_size, num_filters]
        # 3*2*100
        x = torch.cat(x, 1)  # [batch_size, num_filters * len(filter_sizes)]  2*300
        x = self.dropout(x) # 2*300
        x = self.fc(x)  # [batch_size, num_classes] # 2*2
        return x

def data():
    texts = ["I love reading books", "Data science is fun", "Python is great for data analysis", "AI is the future",
             "Machine learning is fascinating"]
    labels = ["positive", "positive", "positive", "positive", "positive"]  # 简单示例,假设所有都是正面评价
    tokenized_texts = [text.lower().split() for text in texts]
    vocab = {}
    index = 1  # 开始索引为1,因为我们将0留给了未知词<UNK>
    for sentence in tokenized_texts:
        for word in sentence:
            if word not in vocab:
                vocab[word] = index
                index += 1
    vocab['<UNK>'] = 0
    indexed_texts = []
    for sentence in tokenized_texts:
        indexed_sentence = [vocab[word] if word in vocab else vocab['<UNK>'] for word in sentence]
        indexed_texts.append(indexed_sentence)
    max_length = 7  # 选择或计算最适合你数据集的长度
    padded_texts = [sentence + [vocab['<UNK>']] * (max_length - len(sentence)) if len(sentence) < max_length else sentence[
                                                                                                      :max_length] for
        sentence in indexed_texts]
    text_tensor = torch.tensor(padded_texts)
    label_tensor = torch.tensor([1 if label == "positive" else 0 for label in labels])
    return text_tensor,label_tensor

vocab_size = 1000  # 词汇表大小
embed_dim = 300  # 词向量维度
num_classes = 2  # 输出类别数
filter_sizes = [3, 4, 5]  # 卷积核尺寸
num_filters = 100  # 卷积核数量

# 实例化模型
model = TextCNN(vocab_size, embed_dim, num_classes, filter_sizes, num_filters)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 模拟一些数据

texts, labels=data()
import ipdb;ipdb.set_trace()
dataset = TextDataset(texts, labels)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# 训练过程
num_epochs = 5
for epoch in range(num_epochs):
    for texts, labels in dataloader:
        optimizer.zero_grad()
        outputs = model(texts)

        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print(f'Epoch {epoch+1}, Loss: {loss.item()}')

3.3 RNN

预测任务就是从10个里面预测5个

import  torch
import datetime
import  numpy as np
import  torch.nn as nn
import  torch.optim as optim
from    matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from pylab import mpl
mpl.rcParams['font.sans-serif'] = ['FangSong']
mpl.rcParams['axes.unicode_minus'] = False
###########################设置全局变量###################################

num_time_steps = 16    # 训练时时间窗的步长
input_size = 3          # 输入数据维度
hidden_size = 16        # 隐含层维度
output_size = 3         # 输出维度
num_layers = 1
lr=0.01
####################定义RNN类##############################################

class Net(nn.Module):

    def __init__(self, input_size, hidden_size, num_layers):
        super(Net, self).__init__()

        self.rnn = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
        )
        print(self.rnn)
        for p in self.rnn.parameters():
          nn.init.normal_(p, mean=0.0, std=0.001)

        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden_prev):
        # x——torch.Size([1, 15, 3])
       # import ipdb;ipdb.set_trace()
       out, hidden_prev = self.rnn(x, hidden_prev)  # hidden_prev——torch.Size([1, 1, 16])   out——[1, 15, 16]
       # [b, seq, h]
       out = out.view(-1, hidden_size) # out [15, 16]]
       out = self.linear(out)#[seq,h] => [seq,3] [15, 3]
       out = out.unsqueeze(dim=0)  # => [1,seq,3]
       return out, hidden_prev #[1,15, 3] [[1, 1, 16]

####################初始化训练集#################################
def getdata():
    x1 = np.linspace(1,10,30).reshape(30,1)
    y1 = (np.zeros_like(x1)+2)+np.random.rand(30,1)*0.1
    z1 = (np.zeros_like(x1)+2).reshape(30,1)
    tr1 =  np.concatenate((x1,y1,z1),axis=1)
    # mm = MinMaxScaler()
    # data = mm.fit_transform(tr1)   #数据归一化
    return tr1

#####################开始训练模型#################################
def tarin_RNN(data):

    model = Net(input_size, hidden_size, num_layers)

    print('model:\n',model)
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr)
    #初始化h

    # hidden state 第一个batch size 第二个数量是层数,第三个是维度大小([1, 1, 16])
    hidden_prev = torch.zeros(1, 1, hidden_size)
    l = []
    # 训练3000次
    for iter in range(300):
        # loss = 0
        start = np.random.randint(10, size=1)[0]
        end = start + 15
        # x和y[1, 15, 3]
        x = torch.tensor(data[start:end]).float().view(1, num_time_steps - 1, 3)
        # 在data里面随机选择15个点作为输入,预测第16
        y = torch.tensor(data[start + 5:end + 5]).float().view(1, num_time_steps - 1, 3)

        # output[1,15,3]
        output, hidden_prev = model(x, hidden_prev)
        hidden_prev = hidden_prev.detach()

        loss = criterion(output, y)
        model.zero_grad()
        loss.backward()
        optimizer.step()

        if iter % 100 == 0:
            print("Iteration: {} loss {}".format(iter, loss.item()))
            l.append(loss.item())


    ##############################绘制损失函数#################################
    plt.plot(l,'r')
    plt.xlabel('训练次数')
    plt.ylabel('loss')
    plt.title('RNN损失函数下降曲线')

    return hidden_prev,model
#############################预测#########################################

def RNN_pre(model,data,hidden_prev):
    data_test = data[19:29]
    data_test = torch.tensor(np.expand_dims(data_test, axis=0),dtype=torch.float32)
    import ipdb;ipdb.set_trace()
    pred1,h1 = model(data_test,hidden_prev )
    # pred1 [1, 10, 3]  h1 [1, 1, 16]
    print('pred1.shape:',pred1.shape)
    pred2,h2 = model(pred1,hidden_prev )
    print('pred2.shape:',pred2.shape)
    pred1 = pred1.detach().numpy().reshape(10,3)
    pred2 = pred2.detach().numpy().reshape(10,3)
    predictions = np.concatenate((pred1,pred2),axis=0)
    # predictions= mm.inverse_transform(predictions)
    print('predictions.shape:',predictions.shape)

    #############################预测可视化########################################

    fig = plt.figure(figsize=(9, 6))
    ax = Axes3D(fig)
    ax.scatter3D(data[:, 0],data[:, 1],data[:,2],c='red')
    ax.scatter3D(predictions[:,0],predictions[:,1],predictions[:,2],c='y')
    ax.set_xlabel('X')
    ax.set_xlim(0, 8.5)
    ax.set_ylabel('Y')
    ax.set_ylim(0, 10)
    ax.set_zlabel('Z')
    ax.set_zlim(0, 4)
    plt.title("RNN航迹预测")
    plt.show()

def main():
    data = getdata()
    start = datetime.datetime.now()
    hidden_pre, model = tarin_RNN(data)
    end = datetime.datetime.now()
    print('The training time: %s' % str(end - start))
    plt.show()
    RNN_pre(model, data, hidden_pre)
if __name__ == '__main__':
    main()

有一个变种——双向RNN29ae090669e0f3b1c08f63b74066d9ff.png

代码: 

import torch
import torch.nn as nn

class BiRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # 创建一个双向RNN层
        # bidirectional 设置为True即可实现双向RNN
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        
        # 因为是双向的,所以最后的输出维度是 hidden_size 的两倍
        self.fc = nn.Linear(hidden_size * 2, num_classes)

    def forward(self, x):
        # 初始化隐藏状态
        h0 = torch.zeros(self.num_layers * 2, x.size(0), self.hidden_size).to(x.device)  # 2 for bidirectional
        
        # 前向传播
        # out: [batch_size, seq_length, hidden_size * 2]
        out, _ = self.rnn(x, h0)
        
        # 取出最后时刻的特征
        out = self.fc(out[:, -1, :])
        return out

# 参数定义
input_size = 10  # 输入的特征维度
hidden_size = 20  # 隐藏层的特征维度
num_layers = 2  # RNN的层数
num_classes = 3  # 输出的类别数

# 创建模型
model = BiRNN(input_size, hidden_size, num_layers, num_classes)
print(model)

7fd5e4bb01b8733505ce45e0eea321f0.png

3.4 LSTM原则

预测任务 

时间序列预测——LSTM模型(附代码实现)_lstm模型代码-CSDN博客

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# 生成示例数据
def generate_data(timesteps):
    x = np.linspace(0, 10, timesteps)
    y = np.sin(x)
    return y

# 准备训练数据
def prepare_data(data, n_steps):
    X, y = [], []
    for i in range(len(data)):
        end_ix = i + n_steps
        if end_ix > len(data)-1:
            break
        seq_x, seq_y = data[i:end_ix], data[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)

# 模型参数
timesteps = 1000
n_steps = 10

# 数据生成和处理
data = generate_data(timesteps)
X, y = prepare_data(data, n_steps)
X = X.reshape((X.shape[0], X.shape[1], 1))

# 构建LSTM模型
model = Sequential([
    LSTM(50, activation='relu', input_shape=(n_steps, 1)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')

# 训练模型
model.fit(X, y, epochs=2, verbose=1)

# 进行预测
x_input = np.array(data[-n_steps:])
x_input = x_input.reshape((1, n_steps, 1))
yhat = model.predict(x_input, verbose=0)
import ipdb;ipdb.set_trace()
print(f'Predicted Value: {yhat[0][0]}')

举几个例子nlp的例子

dd514bfcab15bfe0fa6e94b3134f7463.png

4d1f38f6ecb22c8783938bf4f5e0c7b6.png

 3.5 SAM分割

【图像分割】Meta分割一切(SAM)模型环境配置和使用教程_sam使用教程-CSDN博客

就是attention不停做 

16419bdd39791c6481040121fc30abe7.png

import torch
import torch.nn.functional as F

# 假设output是模型的输出,shape为(N, H, W, C)
# 真实的标签y_true,shape为(N, H, W)
# 注意:PyTorch预期的输出形状是(N, C, H, W),需要转置维度
y_pred = torch.randn(10, 256, 256, 5)  # 模拟模型输出
y_pred = y_pred.permute(0, 3, 1, 2)  # 转置y_pred为(N, C, H, W)

y_true = torch.randint(0, 5, (10, 256, 256), dtype=torch.long)  # 模拟真实标签

# 计算交叉熵损失
loss = F.cross_entropy(y_pred, y_true)
print(loss)

把图片mask变成label

1a38d0ea7eb1f74b83d0c32d35e19b24.png

import numpy as np
from skimage import color
from skimage.io import imread

# 读取彩色掩码图像
color_mask = imread('path_to_color_mask.png')

# 定义颜色到类别索引的映射
color_to_class = {
    (255, 0, 0): 0,     # 红色对应类别0
    (0, 255, 0): 1,     # 绿色对应类别1
    (0, 0, 255): 2,     # 蓝色对应类别2
    # ...(更多颜色)
}

# 初始化一个空的数组来存放类别索引
index_mask = np.zeros((color_mask.shape[0], color_mask.shape[1]), dtype=np.int32)

# 对每种颜色进行映射
for color_value, class_index in color_to_class.items():
    # 在掩码中找到匹配颜色的所有位置
    matches = (color_mask == color_value).all(axis=-1)
    # 将这些位置的类别索引设置为对应的整数
    index_mask[matches] = class_index

# 现在index_mask是一个整数形式的掩码,每个像素的值是类别索引

4 大模型LLM

《大模型面试宝典》(2024版) 正式发布!-CSDN博客

T5

40d2744184fabd9758b3537de37e4b6a.png

T5模型简介-CSDN博客

交叉注意力机制CrossAttention-CSDN博客

一文搞懂Transformer架构的三种注意力机制_transformer注意力机制-CSDN博客

dis step的介绍

distilling-step-by-step-main-CSDN博客

如何训练——非常重要

Huggingface trainer、model.from_pretrained、tokenizer()简单介绍(笔记)_transformes trainer 如何保存优化器的状态-CSDN博客

gpt——llm

神经网络算法:一文搞懂GPT(Generative Pre-trained Transformer)-CSDN博客

qwen——llm

现在的模型是通义千问模型(Qwen)

GitCode - 开发者的代码家园

Llama——llm

LLaMA系列 | LLaMA和LLaMA-2精简总结-CSDN博客

GPT qwen llama bert ELMo 差异是什么

LLaMa、Qwen、ChatGLM、ChatGLM2的区别_llama qwen-CSDN博客

BERT与其他NLP模型的对比:了解预训练模型的差异-CSDN博客
 

 

LN与RMSNorm区别是什么

05395fae95cff7daaf36d937baf46097.png

723d9b991e49f994df77a58bd8a6a172.png

 bert——llm

BertTokenizer 使用方法_berttokenizer.from_pretrained-CSDN博客

这个帖子很好:大模型面试准备(十四):BERT 为何青睐 Transformer 双向编码器?_双向transformer 大模型-CSDN博客

介绍

9432f7856d49e8a3f4a5abb11ae1524e.png

 model

代码

import torch
import math
import numpy as np
from transformers import BertModel
from transformers import BertTokenizer
'''

通过手动矩阵运算实现Bert结构
模型文件下载 https://huggingface.co/models

'''



bert = BertModel.from_pretrained(r"/Users/depeng.yao/Desktop/yaodepeng/nlp/bert/bert-base-chinese", return_dict=False)
state_dict = bert.state_dict()
bert.eval()
x = np.array([2450, 15486, 15167, 2110]) #通过vocab对应输入:深度学习
torch_x = torch.LongTensor([x])  #pytorch形式输入
# seqence_output, pooler_output = bert(torch_x)
# print(seqence_output.shape, pooler_output.shape)
# print(seqence_output, pooler_output)

# print(bert.state_dict().keys())  #查看所有的权值矩阵名称


#softmax归一化
def softmax(x):
    return np.exp(x)/np.sum(np.exp(x), axis=-1, keepdims=True)

#gelu激活函数
def gelu(x):
    return 0.5 * x * (1 + np.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * np.power(x, 3))))

class DiyBert:
    #将预训练好的整个权重字典输入进来
    def __init__(self, state_dict):
        self.num_attention_heads = 12
        self.hidden_size = 768
        self.num_layers = 1
        self.load_weights(state_dict)

    def load_weights(self, state_dict):
        #embedding部分
        self.word_embeddings = state_dict["embeddings.word_embeddings.weight"].numpy()
        self.position_embeddings = state_dict["embeddings.position_embeddings.weight"].numpy()
        self.token_type_embeddings = state_dict["embeddings.token_type_embeddings.weight"].numpy()
        self.embeddings_layer_norm_weight = state_dict["embeddings.LayerNorm.weight"].numpy()
        self.embeddings_layer_norm_bias = state_dict["embeddings.LayerNorm.bias"].numpy()
        self.transformer_weights = []
        #transformer部分,有多层
        for i in range(self.num_layers):
            q_w = state_dict["encoder.layer.%d.attention.self.query.weight" % i].numpy()
            q_b = state_dict["encoder.layer.%d.attention.self.query.bias" % i].numpy()
            k_w = state_dict["encoder.layer.%d.attention.self.key.weight" % i].numpy()
            k_b = state_dict["encoder.layer.%d.attention.self.key.bias" % i].numpy()
            v_w = state_dict["encoder.layer.%d.attention.self.value.weight" % i].numpy()
            v_b = state_dict["encoder.layer.%d.attention.self.value.bias" % i].numpy()
            attention_output_weight = state_dict["encoder.layer.%d.attention.output.dense.weight" % i].numpy()
            attention_output_bias = state_dict["encoder.layer.%d.attention.output.dense.bias" % i].numpy()
            attention_layer_norm_w = state_dict["encoder.layer.%d.attention.output.LayerNorm.weight" % i].numpy()
            attention_layer_norm_b = state_dict["encoder.layer.%d.attention.output.LayerNorm.bias" % i].numpy()
            intermediate_weight = state_dict["encoder.layer.%d.intermediate.dense.weight" % i].numpy()
            intermediate_bias = state_dict["encoder.layer.%d.intermediate.dense.bias" % i].numpy()
            output_weight = state_dict["encoder.layer.%d.output.dense.weight" % i].numpy()
            output_bias = state_dict["encoder.layer.%d.output.dense.bias" % i].numpy()
            ff_layer_norm_w = state_dict["encoder.layer.%d.output.LayerNorm.weight" % i].numpy()
            ff_layer_norm_b = state_dict["encoder.layer.%d.output.LayerNorm.bias" % i].numpy()
            self.transformer_weights.append([q_w, q_b, k_w, k_b, v_w, v_b, attention_output_weight, attention_output_bias,
                                             attention_layer_norm_w, attention_layer_norm_b, intermediate_weight, intermediate_bias,
                                             output_weight, output_bias, ff_layer_norm_w, ff_layer_norm_b])
        #pooler层
        self.pooler_dense_weight = state_dict["pooler.dense.weight"].numpy()
        self.pooler_dense_bias = state_dict["pooler.dense.bias"].numpy()


    #bert embedding,使用3层叠加,在经过一个embedding层
    def embedding_forward(self, x):
        # x.shape = [max_len]
        # import ipdb;ipdb.set_trace()
        we = self.get_embedding(self.word_embeddings, x)  # shpae: [max_len, hidden_size] # (2, 4)
        # position embeding的输入 [0, 1, 2, 3]
        pe = self.get_embedding(self.position_embeddings, np.array(list(range(len(x)))))  # shpae: [max_len, hidden_size]
        # token type embedding,单输入的情况下为[0, 0, 0, 0]
        te = self.get_embedding(self.token_type_embeddings, np.array([0] * len(x)))  # shpae: [max_len, hidden_size]
        embedding = we + pe + te
        # 加和后有一个归一化层
        embedding = self.layer_norm(embedding, self.embeddings_layer_norm_weight, self.embeddings_layer_norm_bias)  # shpae: [max_len, hidden_size]
        return embedding

    #embedding层实际上相当于按index索引,或理解为onehot输入乘以embedding矩阵
    def get_embedding(self, embedding_matrix, x):
        return np.array([embedding_matrix[index] for index in x])

    #执行全部的transformer层计算
    def all_transformer_layer_forward(self, x):
        for i in range(self.num_layers):
            x = self.single_transformer_layer_forward(x, i)
        return x

    #执行单层transformer层计算
    def single_transformer_layer_forward(self, x, layer_index):
        weights = self.transformer_weights[layer_index]
        #取出该层的参数,在实际中,这些参数都是随机初始化,之后进行预训练
        q_w, q_b, \
        k_w, k_b, \
        v_w, v_b, \
        attention_output_weight, attention_output_bias, \
        attention_layer_norm_w, attention_layer_norm_b, \
        intermediate_weight, intermediate_bias, \
        output_weight, output_bias, \
        ff_layer_norm_w, ff_layer_norm_b = weights
        #self attention层
        attention_output = self.self_attention(x,
                                q_w, q_b,
                                k_w, k_b,
                                v_w, v_b,
                                attention_output_weight, attention_output_bias,
                                self.num_attention_heads,
                                self.hidden_size)
        #bn层,并使用了残差机制
        x = self.layer_norm(x + attention_output, attention_layer_norm_w, attention_layer_norm_b)
        #feed forward层
        feed_forward_x = self.feed_forward(x,
                              intermediate_weight, intermediate_bias,
                              output_weight, output_bias)
        #bn层,并使用了残差机制
        x = self.layer_norm(x + feed_forward_x, ff_layer_norm_w, ff_layer_norm_b)
        return x

    # self attention的计算
    def self_attention(self,
                       x,
                       q_w,
                       q_b,
                       k_w,
                       k_b,
                       v_w,
                       v_b,
                       attention_output_weight,
                       attention_output_bias,
                       num_attention_heads,
                       hidden_size):
        # x.shape = max_len * hidden_size
        # q_w, k_w, v_w  shape = hidden_size * hidden_size
        # q_b, k_b, v_b  shape = hidden_size
        q = np.dot(x, q_w.T) + q_b  # shape: [max_len, hidden_size]      W * X + B lINER
        k = np.dot(x, k_w.T) + k_b  # shpae: [max_len, hidden_size]
        v = np.dot(x, v_w.T) + v_b  # shpae: [max_len, hidden_size]
        attention_head_size = int(hidden_size / num_attention_heads)
        # q.shape = num_attention_heads, max_len, attention_head_size
        q = self.transpose_for_scores(q, attention_head_size, num_attention_heads)
        # k.shape = num_attention_heads, max_len, attention_head_size
        k = self.transpose_for_scores(k, attention_head_size, num_attention_heads)
        # v.shape = num_attention_heads, max_len, attention_head_size
        v = self.transpose_for_scores(v, attention_head_size, num_attention_heads)
        # qk.shape = num_attention_heads, max_len, max_len
        qk = np.matmul(q, k.swapaxes(1, 2))
        qk /= np.sqrt(attention_head_size)
        qk = softmax(qk)
        # qkv.shape = num_attention_heads, max_len, attention_head_size
        qkv = np.matmul(qk, v)
        # qkv.shape = max_len, hidden_size
        qkv = qkv.swapaxes(0, 1).reshape(-1, hidden_size)
        # attention.shape = max_len, hidden_size
        attention = np.dot(qkv, attention_output_weight.T) + attention_output_bias
        return attention

    #多头机制
    def transpose_for_scores(self, x, attention_head_size, num_attention_heads):
        # hidden_size = 768  num_attent_heads = 12 attention_head_size = 64
        max_len, hidden_size = x.shape
        x = x.reshape(max_len, num_attention_heads, attention_head_size)
        x = x.swapaxes(1, 0)  # output shape = [num_attention_heads, max_len, attention_head_size]
        return x

    #前馈网络的计算
    def feed_forward(self,
                     x,
                     intermediate_weight,  # intermediate_size, hidden_size
                     intermediate_bias,  # intermediate_size
                     output_weight,  # hidden_size, intermediate_size
                     output_bias,  # hidden_size
                     ):
        # output shpae: [max_len, intermediate_size]
        x = np.dot(x, intermediate_weight.T) + intermediate_bias
        x = gelu(x)
        # output shpae: [max_len, hidden_size]
        x = np.dot(x, output_weight.T) + output_bias
        return x

    #归一化层
    def layer_norm(self, x, w, b):
        x = (x - np.mean(x, axis=1, keepdims=True)) / np.std(x, axis=1, keepdims=True)
        x = x * w + b
        return x

    #链接[cls] token的输出层
    def pooler_output_layer(self, x):
        x = np.dot(x, self.pooler_dense_weight.T) + self.pooler_dense_bias
        x = np.tanh(x)
        return x

    #最终输出
    def forward(self, x):
        # import ipdb;ipdb.set_trace()
        x = self.embedding_forward(x)# (4, 768)
        sequence_output = self.all_transformer_layer_forward(x) # (4, 768)
        pooler_output = self.pooler_output_layer(sequence_output[0]) # pooler_output.shape
        return sequence_output, pooler_output


#自制
db = DiyBert(state_dict)

diy_sequence_output, diy_pooler_output = db.forward(x)# 4*768     768
#torch


torch_sequence_output, torch_pooler_output = bert(torch_x) # 1*768  # 1*4*768



# import torch
# import torch.nn as nn
#
# num_labels = 5  # 假设有5个不同的标签
# num_classes = 5  # 假设有5个不同的标签
# classifier = nn.Linear(bert.config.hidden_size, num_classes)
# import ipdb;ipdb.set_trace()
#
# logits = classifier(torch_pooler_output)  # logits的形状应为 [batch_size, num_classes]

# classifier = nn.Linear(bert.config.hidden_size, num_labels)  # bert.config.hidden_size通常是768
# import ipdb;ipdb.set_trace()
#
# # 应用分类层到BERT输出的每个token # [1, 4, 768]
# logits = classifier(torch_sequence_output)  # 输出形状为 [batch_size, seq_length, num_labels]
#
#
# # 假设`targets`是包含实际标签的Tensor,形状为 [batch_size, seq_length] #1*4
# targets = torch.randint(0, num_labels, (1, 4))  # 随机生成一些示例标签
#
# # 交叉熵损失函数需要将logits展平到2D,targets也同样展平到1D
# criterion = nn.CrossEntropyLoss()
# loss = criterion(logits.view(-1, num_labels), targets.view(-1))
#
# print("Loss:", loss.item())
#
#
# print(diy_sequence_output)
# print(torch_sequence_output)
#
# # print(diy_pooler_output)
# # print(torch_pooler_output)

b37f6f276481b07919798922dc6f5a3e.png

50f314c5f59b6d29974f0891894bb356.png 73e0eb10e320f476f2113dddb3dce22e.png

 23区别在于cls和768的区别。对于start和end的

c54a2517283479387ccd9a4c22252116.png

差异

649a880e8c07e50b197799cfcf238228.png

BERT与其他NLP模型的对比:了解预训练模型的差异-CSDN博客8033d77c298971110ea45c53cc1c7c52.png

f64ee9e868872f067b1819b61022ecca.png 问题1:BERT两个特殊怎么实现的

8f035aa314f40b84f015c3f7fd09be05.png

问题2 :GPT使用生成式预训练,而ELMo利用双向LSTM进行深度上下文建模。

fc2d3b8db29759c7c2cc17a05812bad9.png

问题三:BERT和GPT基于Transformer,但BERT使用编码器,GPT使用解码器,编码器解码器这两个结构哪不同可以写一下吗

0e4368b5419edd7051893c4a2ef3a4d8.png 7a869ff55a15b00364858249d0aaa668.png

问题四:双向和单向区别是什么?

6d84db2b4c8bf2c160f25ab552b4b19b.png

import numpy as np

# 假设我们有一个序列
sequence = np.array([1, 2, 3, 4, 5])

# 单向处理:我们只累加到当前元素为止的值
def unidirectional_process(sequence):
    results = []
    accumulator = 0
    for value in sequence:
        accumulator += value
        results.append(accumulator)
    return results

# 双向处理:我们累加所有元素,并考虑到每个元素之前和之后的值
def bidirectional_process(sequence):
    forward_results = unidirectional_process(sequence)
    backward_results = unidirectional_process(sequence[::-1])[::-1]
    results = (np.array(forward_results) + np.array(backward_results)) - sequence
    return results

# 计算单向和双向处理结果
unidirectional_results = unidirectional_process(sequence)
bidirectional_results = bidirectional_process(sequence)

print("单向处理结果:", unidirectional_results)
print("双向处理结果:", bidirectional_results)

 羊驼

04d0a481c92c3137d833579497045a36.png

5 MLLM多模态

qwen-vl——mllm

【LLM多模态】Qwen-VL模型结构和训练流程-CSDN博客

1fe1353063af2058eb643ce1ba75e4ab.png

import torch
import torch.nn as nn
from transformers import ViTModel

class CrossAttention(nn.Module):
    def __init__(self, num_queries, d_model):
        super(CrossAttention, self).__init__()
        self.num_queries = num_queries
        # 初始化查询向量作为模型参数

        self.query = nn.Parameter(torch.randn(1, num_queries, d_model))  # 修改维度以适应批量处理
        # [1, 256, 768]
        self.cross_attn = nn.MultiheadAttention(d_model, num_heads=8)

    def forward(self, img_features):
        # img_features shape: (seq_len, batch_size, d_model)
        # 将查询向量复制扩展到与图像特征相同的批量大小
        import ipdb;ipdb.set_trace()
        batch_size = img_features.size(1)

        query = self.query.repeat(batch_size, 1, 1).transpose(0, 1)  # 调整查询向量维度以匹配注意力层输入要求
        # Cross-attention operation
        # query [256, 10, 768]  [197, 10, 768]
        attn_output, _ = self.cross_attn(query, img_features, img_features)
        return attn_output

# 假设 ViT 输出的维度 d_model 为 768
d_model = 768
num_queries = 256

# 加载预训练的 ViT 模型
vit = ViTModel.from_pretrained('/Users/depeng.yao/Desktop/vit')

import ipdb;ipdb.set_trace()

# 初始化 Cross-Attention 模块
cross_attention = CrossAttention(num_queries=num_queries, d_model=d_model)

# 假设有一个随机生成的图像批量
batch_size = 10
dummy_images = torch.rand(batch_size, 3, 224, 224)  # 假设输入图像大小为 224x224

# 通过 ViT 提取图像特征
outputs = vit(pixel_values=dummy_images)
# img_feautures ([197, 10, 768])
img_features = outputs.last_hidden_state.transpose(0, 1)  # 调整维度以匹配 Cross-Attention 的输入需求

# 通过 Cross-Attention 模块处理图像特征
attn_output = cross_attention(img_features)

# 打印输出结果的维度
# torch.Size([256, 10, 768])
print(attn_output.shape)  # 输出形状为 (num_queries, batch_size, d_model)

# /Users/depeng.yao/Desktop/vit

qwen-vl的和

ureader——mllm

7256cb683a37d1223f78124933604948.png

5b98e6b153f776840478725864d34248.png

UReader:基于多模态大型语言模型的通用无ocr视觉情境语言理解_ureader: universal ocr-free visually-situated lang-CSDN博客

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import cv2
from torchvision.models import resnet50
def calculate_iou(box1, box2):
    """计算两个矩形框的交并比"""
    x_left = max(box1[0], box2[0])
    y_top = max(box1[1], box2[1])
    x_right = min(box1[2], box2[2])
    y_bottom = min(box1[3], box2[3])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    intersection_area = (x_right - x_left) * (y_bottom - y_top)
    box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
    box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])

    iou = intersection_area / float(box1_area + box2_area - intersection_area)
    return iou


class PositionalEncoding(nn.Module):
    def __init__(self, grid_size, embed_size):
        super(PositionalEncoding, self).__init__()
        self.row_embed = nn.Embedding(grid_size[0], embed_size)
        self.col_embed = nn.Embedding(grid_size[1], embed_size)

    def forward(self, dims):
        # 提取行和列的嵌入并将它们相加
        return self.row_embed(dims[:, 0]) + self.col_embed(dims[:, 1])


def shape_adaptive_cropping(image, target_resolution=(224, 224)):
    height, width, _ = image.shape
    target_height, target_width = target_resolution

    # 确定最佳网格大小
    best_grid = (1, 1)
    best_iou = 0
    for nh in range(1, 6):
        for nw in range(1, 6):
            proposed_height = target_height * nh
            proposed_width = target_width * nw
            iou = calculate_iou((0, 0, width, height), (0, 0, proposed_width, proposed_height))
            if iou > best_iou:
                best_iou = iou
                best_grid = (nh, nw)

    nh, nw = best_grid
    resized_image = cv2.resize(image, (nw * target_width, nh * target_height))

    # 裁剪为小块
    crops = []
    for i in range(nh):
        for j in range(nw):
            crop = resized_image[i * target_height:(i + 1) * target_height, j * target_width:(j + 1) * target_width]
            crops.append(crop)

    return crops, resized_image



# 设置网格大小和嵌入维度
grid_size = (5, 5)  # 假设最大网格大小
embed_size = 1000  # 假设与特征维度相同

# 初始化位置编码模块
positional_encoding = PositionalEncoding(grid_size, embed_size)

# 图像预处理操作
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# 加载视觉编码器
visual_encoder = resnet50(pretrained=True)
visual_encoder.eval()  # 设置为评估模式

# 读取并处理图像
image_path = '/Users/depeng.yao/Downloads/demo.jpeg'
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 执行形状自适应裁剪
crops, resized_image = shape_adaptive_cropping(image, target_resolution=(224, 224))


# 转换并编码图像
resized_features = visual_encoder(transform(resized_image).unsqueeze(0)) # 1*1000
crop_features = [visual_encoder(transform(crop).unsqueeze(0)) for crop in crops] # 25*1*1000

# 生成位置索引
positions = [(i, j) for i in range(grid_size[0]) for j in range(grid_size[1])][:len(crops)]
positions_tensor = torch.tensor(positions, dtype=torch.long) #(25,2)

# 生成位置嵌入
position_embeddings = positional_encoding(positions_tensor)

# 合并特征和位置嵌入
features = torch.cat([resized_features] + crop_features + [position_embeddings], dim=0) # 51(25+1+25)*1000
import ipdb;ipdb.set_trace()
print("ok")


Clip——mllm

训练完成CLIP后,可以直接做图像分类了(所谓的Zero-shot图像分类),原理其实也很简单:

model

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)  # 首次使用会默认下载clip模型

image = preprocess(Image.open("/Users/depeng.yao/Downloads/demo.jpeg")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)
import numpy as np
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)


    logits_per_image, logits_per_image = model(image, text)
    logits=np.dot(logits_per_image,logits_per_image.T)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.9927937  0.00421068 0.00299572]]

多模态:CLIP详解_clip 多模态-CSDN博客

多模态模型学习1——CLIP对比学习 语言-图像预训练模型_clip模型-CSDN博客

0a9d9555c3f3cc4e7fb27c3eca6267c2.png

cf21bf55a6851d1008b3549c720eab81.png

5b71a63d1e60f9f4279b640f48fa963a.png

loss

e2c1c6ebfecc7236c473bca55168bfb6.png

代码:

其实就是embediing维度都64*128✖️一起变成64*64 接着变成labels的64

import torch
import torch.nn as nn
import torch.nn.functional as F

# 假设 visual_embedding 和 text_embedding 已经计算得到
# 这里直接使用随机数据来模拟这两个embedding矩阵
embedding_size = 128
num_samples = 64

# 模拟数据
visual_embedding = torch.randn(num_samples, embedding_size)
text_embedding = torch.randn(num_samples, embedding_size)

# 计算内积矩阵
similarity_matrix = torch.matmul(visual_embedding, text_embedding.t())

# 将内积矩阵的每一行通过softmax转换为概率分布
probabilities = F.softmax(similarity_matrix, dim=1)

# 创建标签:每个样本的正确文本匹配其自身的索引
labels = torch.arange(num_samples)

# 计算交叉熵损失
loss_fn = nn.CrossEntropyLoss()
loss = loss_fn(probabilities, labels)

print("Loss:", loss.item())

a92f27bd08d591fdaf7d774bc233bdd7.png

info-nce-loss

import torch
import torch.nn.functional as F

def info_nce_loss(image_features, text_features, temperature=0.07):
    """
    计算InfoNCE损失。
    参数:
    - image_features: 图像特征,维度为(N, feature_dim)
    - text_features: 文本特征,维度为(N, feature_dim)
    - temperature: 温度参数,控制softmax的软度

    返回:
    - loss: 计算得到的损失值
    """
    # 计算图像和文本之间的相似度矩阵
    import ipdb;
    ipdb.set_trace()
    logits = torch.mm(image_features, text_features.t()) / temperature
    # 对角线元素是正样本的相似度
    labels = torch.arange(logits.shape[0], device=logits.device)
    # 使用交叉熵损失计算InfoNCE损失 其实就是相乘得到类别的矩阵进行交叉函数
    loss_i2t = F.cross_entropy(logits, labels)
    loss_t2i = F.cross_entropy(logits.t(), labels)
    # 返回图像到文本和文本到图像损失的平均值
    return (loss_i2t + loss_t2i) / 2

# 假设的特征向量和批次大小
batch_size = 32
feature_dim = 256
image_features = torch.randn(batch_size, feature_dim)
text_features = torch.randn(batch_size, feature_dim)

# 计算损失
loss = info_nce_loss(image_features, text_features)
print("InfoNCE Loss:", loss.item())

 余弦相似度

97e01016d16cfc2df0caa13c0cbfe9f5.png

多模态模型学习1——CLIP对比学习 语言-图像预训练模型_clip模型-CSDN博客

余弦相似度clip用的是

bf469e1830ca6f66590de495d73aff77.png

BLIP

【BLIP/BLIP2/InstructBLIP】一篇文章快速了解BLIP系列(附代码讲解说明)-CSDN博客

blip1

2bc2a152ce8955406e7677aeeb02ef17.png

e00f98b66bc50f03f4d974f7074d1b8e.png 第一个问题:大致流程是什么

76ce19b4460dc11bdf9900f31eccac20.png

第二个问题:的交叉注意力(CA)指的是什么

2a1f2ca349c8451fe559b43d688bea22.png

import torch
import torch.nn as nn
import torch.nn.functional as F
import math

class CrossAttentionLayer(nn.Module):
    def __init__(self, d_model, n_heads):
        super(CrossAttentionLayer, self).__init__()
        self.query_proj = nn.Linear(d_model, d_model)
        self.key_proj = nn.Linear(d_model, d_model)
        self.value_proj = nn.Linear(d_model, d_model)
        self.n_heads = n_heads
        self.d_k = d_model // n_heads

    def forward(self, query, key, value, mask=None):
        # 处理批次大小和头数
        batch_size = query.size(0)
        
        # 线性变换并分割成多头
        query = self.query_proj(query).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
        key = self.key_proj(key).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
        value = self.value_proj(value).view(batch_size, -1, self.n_heads, self.d_k).transpose(1, 2)
        
        # 计算点积注意力
        scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.d_k)
        if mask is not None:
            scores = scores.masked_fill(mask == 0, float('-inf'))
        attn = F.softmax(scores, dim=-1)
        
        # 获取加权的值
        output = torch.matmul(attn, value).transpose(1, 2).contiguous().view(batch_size, -1, self.n_heads * self.d_k)
        return output

# 示例用法
d_model = 512
n_heads = 8
seq_len_text = 10  # 文本序列长度
seq_len_img = 20  # 图像特征长度(假设图像特征被展平处理)
batch_size = 32

# 假设query来自文本编码器,key和value来自图像编码器
query_text = torch.randn(batch_size, seq_len_text, d_model)  # 文本查询
key_image = torch.randn(batch_size, seq_len_img, d_model)    # 图像键
value_image = torch.randn(batch_size, seq_len_img, d_model)  # 图像值

cross_attention = CrossAttentionLayer(d_model, n_heads)
output = cross_attention(query_text, key_image, value_image)
print(output.shape)  # 输出尺寸应与文本查询的尺寸相同

query_text 代表从文本模态编码后的特征,key_image 和 value_image 代表从图像模态编码后的特征。通过交叉注意力机制,模型能够基于图像特征来增强文本的表示,这在图像和文本的联合理解任务中非常关键。 

5247f5de35b643a18dc000e1e9aecccf.png

 问题3:对于itc函数:就是info函数,参照clip

5b1936b41511eb9a3aa2471281382fae.png

import torch
import torch.nn as nn
import torch.nn.functional as F

def image_text_contrastive_loss(img_features, text_features, temperature=0.07):
    """
    计算图像-文本对比损失
    :param img_features: 图像特征张量,形状为 (batch_size, feature_dim)
    :param text_features: 文本特征张量,形状为 (batch_size, feature_dim)
    :param temperature: 温度参数,用于调整softmax的饱和度
    :return: 损失值
    """
    # 归一化特征向量
    img_features = F.normalize(img_features, p=2, dim=1)
    text_features = F.normalize(text_features, p=2, dim=1)

    # 计算相似度矩阵
    similarity_matrix = torch.matmul(img_features, text_features.t()) / temperature

    # 目标:每个图像应与对应的文本匹配(对角线元素是正样本)
    batch_size = img_features.size(0)
    targets = torch.arange(batch_size).long().to(img_features.device)

    # 计算损失
    loss_i2t = F.cross_entropy(similarity_matrix, targets)
    loss_t2i = F.cross_entropy(similarity_matrix.t(), targets)

    # 返回总损失(双向损失的平均)
    return (loss_i2t + loss_t2i) / 2

# 示例用法
batch_size = 32
feature_dim = 128
img_features = torch.randn(batch_size, feature_dim)
text_features = torch.randn(batch_size, feature_dim)

loss = image_text_contrastive_loss(img_features, text_features)
print("ITC Loss:", loss.item())

 问题3:对于itm函数:将特征合起来,然后fc输出1,判断是和否就可以

import torch
import torch.nn as nn
import torch.nn.functional as F

class ImageTextMatchingModel(nn.Module):
    def __init__(self, image_feature_dim, text_feature_dim, hidden_dim):
        super(ImageTextMatchingModel, self).__init__()
        # 可以根据需要调整这些维度
        self.fc1 = nn.Linear(image_feature_dim + text_feature_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, 1)  # 二元分类,输出匹配的概率

    def forward(self, image_features, text_features):
        # 图像和文本特征的简单拼接
        combined_features = torch.cat((image_features, text_features), dim=1)
        x = F.relu(self.fc1(combined_features))
        logits = self.fc2(x)
        return logits

def itm_loss(logits, labels):
    """计算二元交叉熵损失"""
    loss = F.binary_cross_entropy_with_logits(logits, labels)
    return loss

# 假设的特征维度和一些数据
image_feature_dim = 256
text_feature_dim = 256
hidden_dim = 128

# 创建模型实例
model = ImageTextMatchingModel(image_feature_dim, text_feature_dim, hidden_dim)

# 假设的输入数据
batch_size = 10
image_features = torch.randn(batch_size, image_feature_dim)  # 随机生成图像特征
text_features = torch.randn(batch_size, text_feature_dim)    # 随机生成文本特征
labels = torch.randint(0, 2, (batch_size, 1)).float()  # 随机生成标签,0或1

# 前向传播
logits = model(image_features, text_features)

# 计算损失
loss = itm_loss(logits, labels)

print("Loss:", loss.item())

问题5:对于lm函数来说就是图片输出的结果直接和文字比较

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class ImageCaptioningModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers, image_feature_dim):
        super(ImageCaptioningModel, self).__init__()
        self.embed = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim + image_feature_dim, vocab_size)
        self.image_features = nn.Linear(image_feature_dim, hidden_dim)

    def forward(self, image_features, captions):
        # image_features: [batch_size, image_feature_dim]
        # captions: [batch_size, seq_length]
        embeddings = self.embed(captions[:, :-1])  # Exclude the last token
        image_features_transformed = self.image_features(image_features).unsqueeze(1)
        image_features_transformed = image_features_transformed.expand(-1, embeddings.size(1), -1)
        lstm_input = torch.cat((embeddings, image_features_transformed), dim=2)
        hiddens, _ = self.lstm(lstm_input)
        outputs = self.fc(hiddens)
        return outputs

def language_modeling_loss(outputs, targets):
    # outputs: [batch_size, seq_length, vocab_size]
    # targets: [batch_size, seq_length]
    loss = F.cross_entropy(outputs.view(-1, outputs.size(-1)), targets.view(-1), ignore_index=0)
    return loss

# Parameters
vocab_size = 10000  # Example vocabulary size
embed_dim = 256
hidden_dim = 512
num_layers = 2
image_feature_dim = 2048
seq_length = 20
batch_size = 32

# Model and optimization
model = ImageCaptioningModel(vocab_size, embed_dim, hidden_dim, num_layers, image_feature_dim)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Dummy data
image_features = torch.randn(batch_size, image_feature_dim)
captions = torch.randint(1, vocab_size, (batch_size, seq_length))

# Forward pass
outputs = model(image_features, captions)
targets = captions[:, 1:]  # Shift for predicting the next word

# Compute loss
loss = language_modeling_loss(outputs, targets)

# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()

print("Loss:", loss.item())

cc1da6c798d84b5c9a472e9996b9049a.png

a738c35b6222eaa863ad642002150862.png d65cc1e5642a35d7388bdefa04f5dd91.png

 blip2

BLIP2中Q-fodrmer详解-CSDN博客

【大模型系列】统一图文理解与生成(BLIP/BLIPv2/InstructBLIP)_大模型如何同时输入图文数据-CSDN博客

q-former的实现

ce43550ff6f01296ffedbb5410c0abd1.png

 8013e541a6482c89f01bf20a564d7018.png

from transformers import BertConfig, BertModel, BertTokenizer
import torch
import torch.nn as nn
from transformers import ViTModel

def init_Qformer(num_query_tokens, vision_width, freeze):

    # 使用预训练的bert模型配置Q-Former,并明确启用解码功能
    encoder_config = BertConfig.from_pretrained("/Users/depeng.yao/Desktop/yaodepeng/pythonProject/bert-base-uncased/config.json",
        is_decoder=True,  # 指定为解码器模型
        add_cross_attention=True,  # 在每个层中添加交叉注意力
        hidden_size=vision_width,
        num_hidden_layers=12,
        cross_attention_layers=[i for i in range(12)])  # 每层都添加交叉注意力

    import ipdb;ipdb.set_trace()
    encoder_config.query_length = num_query_tokens

    # 初始化Q-Former模型
    Qformer = BertModel(config=encoder_config)

    # 初始化查询标记
    # 1*10*768
    query_tokens = nn.Parameter(torch.zeros(1, num_query_tokens, encoder_config.hidden_size))
    # 1*10*768
    query_tokens.data.normal_(mean=0.0, std=encoder_config.initializer_range)

    if freeze:
        for param in Qformer.parameters():
            param.requires_grad = False
        Qformer.eval()

    return Qformer, query_tokens

# query_tokens [1, 10, 768] image_features(1, 10, 768)  text_input_ids[1, 6] text_attention_mask[1, 6] bert 语言模型
def forward_pass(Qformer, query_tokens, image_features, text_input_ids, text_attention_mask):
    batch_size = text_input_ids.size(0)
    # import ipdb;ipdb.set_trace()
    # 对query_tokens进行扩展以匹配批次大小,并减少一个维度以匹配text_input_ids
    repeated_query_tokens = query_tokens.repeat(batch_size, 1, 1)  # 扩展以匹配batch_size
    # [1, 10] query_token_ids
    query_token_ids = torch.full((batch_size, 10), fill_value=30521)  # 假设30522是特殊的token id

    # 拼接input_ids
   # input_ids [1, 16]
    input_ids = torch.cat([query_token_ids, text_input_ids], dim=1)

    # 创建一个全1的attention mask,形状与input_ids相匹配
    # attention_mask [1,16]
    query_attention_mask = torch.ones(batch_size, 10, dtype=torch.long)  # 创建query部分的mask
    attention_mask = torch.cat([query_attention_mask, text_attention_mask], dim=1)

    # 执行模型前向传播
    outputs = Qformer(input_ids=input_ids, attention_mask=attention_mask,
                      encoder_hidden_states=image_features, encoder_attention_mask=None,
                      return_dict=True)

    return outputs


model = BertModel.from_pretrained("/Users/depeng.yao/Desktop/yaodepeng/pythonProject/bert-base-uncased")
# 示例使用
num_query_tokens = 10
vision_width = 768
freeze = False
Qformer, query_tokens = init_Qformer(num_query_tokens, vision_width, freeze)

image_features = torch.randn(1, 10, 768)
tokenizer = BertTokenizer.from_pretrained("/Users/depeng.yao/Desktop/yaodepeng/pythonProject/bert-base-uncased")
text = "a cat wearing sunglasses"
encoded_text = tokenizer(text, return_tensors="pt")
# {'input_ids': tensor([[  101,  1037,  4937,  4147, 17072,   102]]),
# 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}
text_input_ids = encoded_text['input_ids']
text_attention_mask = encoded_text['attention_mask']

# query_tokens [1, 10, 768] image_features(1, 10, 768)  text_input_ids[1, 6] text_attention_mask[1, 6] bert 语言模型
outputs = forward_pass(Qformer, query_tokens, image_features, text_input_ids, text_attention_mask)
# [1, 16, 768] 相当于图像输入什么就输出什么 图像输入的是[1, 16, 768] 文字输入的是[1, 768]
print(outputs.last_hidden_state.shape)

9df107c2f2e1d33d013975f6de65712f.png

Llava

多模态大语言模型 LlaVA 论文解读:Visual Instruction Tuning_llama模型和llava-CSDN博客文章浏览阅读3.7k次,点赞5次,收藏26次。Alpaca [43]、Vicuna [45]、GPT-4-LLM [34]利用各种机器生成的高质量指令跟随样本来提高 LLM 的对齐能力,与专有 LLM 相比,报告了令人印象深刻的性能。重要的是,这行工作是纯文本的。Alpaca [43]、Vicuna [45]、GPT-4-LLM [34]利用各种机器生成的高质量指令跟随样本来提高 LLM 的对齐能力,与专有 LLM 相比,报告了令人印象深刻的性能。同时在使用ChatGPT和GPT-4时,作者发现GPT-4生成的指令遵循数据质量更高,比如空间推理能力。_llama模型和llavahttps://blog.csdn.net/qq_40491305/article/details/131400432?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522171540801316800213069877%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=171540801316800213069877&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~top_click~default-2-131400432-null-null.142%5Ev100%5Epc_search_result_base2&utm_term=llava&spm=1018.2226.3001.4187

感觉还是数据更加细节了,输入更多的细节数据

import json

# 假设的GPT-4模型API调用函数
def call_gpt4_model(prompt):
    # 这里只是一个示例函数,实际中应替换为调用OpenAI的API
    print("调用GPT-4模型,输入:")
    print(prompt)
    return "这是模拟的回答。"

# 构建对话上下文和问题
messages = [
    {"role": "user", "content": "What objects are in this image?"},
    {"role": "assistant", "content": "The image contains a cat and a dog sitting on a mat."},
    {"role": "user", "content": "What is the dog doing?"},
    {"role": "assistant", "content": "The dog is playing with a ball."},
    {"role": "user", "content": "How far is the cat from the dog?"},
    {"role": "assistant", "content": "The cat is about two feet away from the dog."}
]

# 创建用于GPT-4的prompt
prompt = "\n".join([f"{msg['role']}: {msg['content']}" for msg in messages])
prompt += "\nUser: Are there any other animals in the picture?"

# 调用GPT-4模型
response = call_gpt4_model(prompt)
print("GPT-4回答:", response)

776b9e29e82a80b0e766b9782728a25d.png

模型:还是clip的vit和语言编码器得到token后cat一起输入llm里面

5d565f5e60be777a821d3d2813714b79.png

Minigpt4

【LLM多模态】MiniGPT4模型结构和训练流程-CSDN博客

blip2与minigpt区别是什么?

bda8a662fe43e7707c9d065acff26d31.png

9bb454601a30702e67a5df3ea845f9aa.png

Lora

36fd8c569b5ee750198833e6d6338341.png

import torch
import torch.nn as nn

class LoRA(nn.Module):
    def __init__(self, d, r):
        super(LoRA, self).__init__()
        self.A = nn.Parameter(torch.randn(d, r))
        self.B = nn.Parameter(torch.randn(r, d))
    
    def forward(self, X):
        return torch.matmul(self.A, self.B).matmul(X)

# 示例参数
d = 512  # 原始权重矩阵的维度
r = 32   # 低秩矩阵的秩

# 初始化 LoRA 层
lora_layer = LoRA(d, r)

# 输入张量
X = torch.randn(d, 128)  # 假设有 128 个样本

# 前向传播
output = lora_layer(X)

print(output.shape)

84ad20d59283daa70bfbf43e21b9932f.png

DDPM原理

总体结构

AIGC系列之:DDPM原理解读(简单易懂版)_ddpm中的unet-CSDN博客

https://juejin.cn/post/7251391372394053691

AIGC专栏2——Stable Diffusion结构解析-以文本生成图像(文生图,txt2img)为例-CSDN博客

0c438d7ff7ffe04249212621ba332025.png

u-net模型

2826b50a42ca155b3bc77243854b37ab.png eebb3d4427973fa8d741b14cf3db210b.png3cab6a99f7b13d7311bef1a8c708eded.png

97d6c10965d2318f159a106c5bbaf9dd.png 其实模型就是unet里面加了很多的attention

Stable Diffusion

Stable Diffusion 超详细讲解-CSDN博客

191f0245bc2e7521c448bbf2c2768c29.png

dcl-mllm

 

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值