![](https://img-blog.csdnimg.cn/20201014180756919.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
大模型
文章平均质量分 89
oukohou
https://www.oukohou.wang/,博客已全部迁往上述个人博客站点,敬请前往~~
展开
-
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
论文阅读:StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation原创 2024-06-11 11:25:02 · 633 阅读 · 0 评论 -
RAR:Retrieving And Ranking Augmented MLLMs for Visual Recognition
RAR,整体动机总结一下:clip知道吧?Multimodal Large Language Models (MLLMs)知道吧?作者把它们俩整合到一起,先用clip基于相似度找出外部知识库中最相近的几个候选项,然后用MLLM来选出最终的结果,这就是retrieving-and-ranking(RAR)的由来。原创 2024-05-14 10:15:36 · 768 阅读 · 0 评论