多模态生成式模型MultiModal Generative Models 2024最新列表

Introduction to Multimodal Generative Models-Model Architecture Key Features and Codes

1. 多模态生成式模型 MultiModal Generative Models 2024最新列表

ModelYearDeveloperModalityArchitectureKey Features
SORA2024OpenAIVideo,TextImage Encoder: Diffusion DiTGenerative Modeling,Text-to-Video
Gemini V1.52024GoogleVideo,Text,AudioImage Encoder: ViT,Text Encoder:TransformerGenerative Modeling,Long Context Window
BLIP22023Salesforce ResearchImage,TextQ-Former: Bridging Modality Gap,Image Encoder: ViT-L/ViT-G,Text LLM Encoder: OPT/FlanT5Generative Modeling,Image-to-Text,Visual Question Answering,Image-to-Text Retrieval
GPT-4V2023OpenAIImage,TextText Encoder: GPTGenerative Modeling,Multimodal LLM,Visual Question Answering
LLaVA2023MicrosoftImage,TextText LLM Encoder: Vicuna,Image Encoder:CLIP visual ViT-LGenerative Modeling,Visual Instruction Generation
KOSMOS-22023MicrosoftImage,TextVision encoder , LLM Encoder: 24-layer MAGNETO TransformerMultimodal Grounding,Language Understanding and Generation
PaLM-E2023GoogleImage,TextImage Encoder: ViT encodingMultimodal Language Model
BLIP2022Salesforce ResearchImage,TextImage Encoder: ViT-B,ViT-L; Text Encoder: BERT-BaseGenerative Modeling,Bootstrapping,VQA,Caption Generation
FLAMINGO2022DeepMindImage,TextGated Cross Attention,Multiway Transformer,ViT-giantVQA,Interleaved Visual and Textual Data
upCLIP2022OpenAIImage,TextCLIP ViT-L,Diffusion Prior/Autoregressive priorGenerative Modeling,Text-to-Image,Image Generation,Diffusion Models
BEiT-32022MicrosoftImage,TextText Encoder: OPT/FlanT5,Image Encoder:ViT-L/ViT-gObject Detection,Visual Question Answering,Image Captaining
CLIP2021OpenAIImage,TextText Encoder: Transformer; Image Encoder: ResNet/ViTMultimodal Alignment,Zero-Shot Learning
ALIGN2021GoogleImage,TextImage Encoder: EfficientNet,Text-Encoder: BERTMultimodal Alignment,Image-Text Retrieval

2.多模态生成式模型常见任务

  • Image Captioning
  • Image Text Retrieval
  • Text-to-Image
  • Text-to-Video
  • Visual Question Answering

3.相关链接

参考文档:多模态生成式模型MultiModal Generative Models 2024最新列表 - 知乎

  • 21
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值