视频拍摄知识+AIGC数据预处理

视角

参考链接:https://www.polarpro.com/blogs/polarpro/filmmaking-101-types-of-camera-shots-and-angles

  1. Low Angle Shot 低角度拍摄、
  2. horizontal Shot 平视、
  3. Dutch Angle Shot 荷兰角斜拍、
  4. High Angle Shot 高角度拍摄、
  5. Bird’s-eye / Aerial Shot 鸟瞰 / 航拍

运镜:

参考链接:https://www.storyblocks.com/resources/tutorials/7-basic-camera-movements

  1. 变焦 Zoom (放大、缩小)
  2. 平移 Pan (全景,左右旋转)
  3. 倾斜 Tilt (上下旋转)
  4. 小车 Dolly (相对主体前后平移)
  5. 卡车 Truck (左右平移)
  6. 基座 Pedestal (上下平移)
  7. 焦点变换 Rack focus (画面大小不变,调整镜头以模糊一个主体,同时锐化另一个主体)
  8. 固定 Fixed (镜头不动)

景别

参考链接:https://images.wikia.com/scratchpad/zh/images/f/f5/%E6%94%9D%E5%BD%B1%E9%8F%A1%E9%A0%AD%E4%B8%80%E8%A6%BD%E8%A1%A8.pdf

物体可分为:

  1. 远景 Long Shot
  2. 全景 Medium Long Shot、Panorama
  3. 特写 Close-up
  4. 微距 Macro shot

人物可细分:

  1. 大远景 Extra Long Shot
  2. 远景 Long Shot
  3. 全景 Medium Long Shot、Panorama
  4. 中景 Medium-shot
  5. 近景 Medium Close-Up
  6. 特写 Close-up
  7. 微距 Macro shot

(就拿拍摄人物来说,远景就是把整个人和环境拍摄在画面里面,常用来展示事件发生的时间、环境、规模和气氛。往往是用来塑造一个大的场景。全景就是比远景更近一点,把人物的身体整个展示在画面里面,用来表现人物的全身动作,或者是人物之间的关系。中景就是指拍摄人物膝盖至头顶的部分,这样不仅能够展示人物的表情,而且还可以显示人物的形体动作。近景也就是拍摄人物胸部以上至头部的部位,非常有利于表现人物的面部或者是其他部门的表情、神态。包括眼角的细微动作。)

prompt test

使用视觉语言模型DeepseekVL-7B对视频的前中后三帧图像进行标注,然后使用大语言模型chatgpt对三个标注进行综合得到一段视频prompt。最后针对该prompt进行LLM问答,得到视频的拍摄信息。

DeepseekVL的prompt为:You are a powerful image captioner. Create detailed captions describing the contents of the given image. Include the object types and colors, counting the objects, object actions, precise object locations, texts, doublechecking relative positions between objects, etc. Instead of describing the imaginary content, only describing the content one can determine confidently from the image. Do not describe the contents by itemizing them in list form. Minimize aesthetic descriptions as much as possible.

chatgpt的prompt为:You are a powerful video captioner. The next 3 phases describe the first, middle, and end frame of the video. Please summarize the following 3 paragraphs into 1 paragraph, which can provide a detailed description of the video information.

1. 视角

prompt

there is a caption of a video: The video begins with an aerial view of a vibrant suburban neighborhood, showcasing approximately 50 houses, 10 multi-story apartment buildings, along with distinct red-roofed and green-roofed structures amidst lush greenery and tree-lined streets. Transitioning to a historical fortification in the middle frame, the scene unveils a grand stone wall encircling a spacious green area, punctuated by scattered trees and a central prominent tree, juxtaposed against modern buildings in the backdrop. Finally, the video concludes with another aerial perspective, this time capturing a historic fortification nestled within a hilly landscape, featuring stone buildings with thatched roofs, picnic areas amidst a grassy terrain, and distant houses and trees, all illustrating the captivating blend of history and nature. Each phase offers a unique glimpse into different environments, from suburban living to historical landmarks, all captured through stunning aerial photography.

The shooting angles of a video are divided into:

  1. Low Angle Shot
  2. horizontal Shot
  3. Dutch Angle Shot
  4. High Angle Shot
  5. Bird’s-eye / Aerial Shot

Which perspective does the video most belong to, please return one number only

chatgpt answer (True)

The perspective of the video belongs to option 5: Bird’s-eye / Aerial Shot.

DeepSeek-VL-7B answer (True)

The video belongs to the perspective of 5. Bird’s-eye / Aerial Shot.

True answer is Bird’s-eye / Aerial Shot

2. 运镜

prompt

there is a caption of a video: The video begins with an aerial view of a vibrant suburban neighborhood, showcasing approximately 50 houses, 10 multi-story apartment buildings, along with distinct red-roofed and green-roofed structures amidst lush greenery and tree-lined streets. Transitioning to a historical fortification in the middle frame, the scene unveils a grand stone wall encircling a spacious green area, punctuated by scattered trees and a central prominent tree, juxtaposed against modern buildings in the backdrop. Finally, the video concludes with another aerial perspective, this time capturing a historic fortification nestled within a hilly landscape, featuring stone buildings with thatched roofs, picnic areas amidst a grassy terrain, and distant houses and trees, all illustrating the captivating blend of history and nature. Each phase offers a unique glimpse into different environments, from suburban living to historical landmarks, all captured through stunning aerial photography.

The camera movement are divided into:

  1. Zoom
  2. Pan
  3. Tilt
  4. Dolly
  5. Truck
  6. Pedestal
  7. Rack focus
  8. Fixed

Which movement does the video most belong to, please return one number only

chatgpt answer (True)

Based on the description provided, the video most likely primarily utilizes option 2: Pan.

DeepSeek-VL-7B answer (False)

The video most belongs to the movement of 8. Fixed.

True answer is Pan

3. 景别

prompt

there is a caption of a video: The video begins with an aerial view of a vibrant suburban neighborhood, showcasing approximately 50 houses, 10 multi-story apartment buildings, along with distinct red-roofed and green-roofed structures amidst lush greenery and tree-lined streets. Transitioning to a historical fortification in the middle frame, the scene unveils a grand stone wall encircling a spacious green area, punctuated by scattered trees and a central prominent tree, juxtaposed against modern buildings in the backdrop. Finally, the video concludes with another aerial perspective, this time capturing a historic fortification nestled within a hilly landscape, featuring stone buildings with thatched roofs, picnic areas amidst a grassy terrain, and distant houses and trees, all illustrating the captivating blend of history and nature. Each phase offers a unique glimpse into different environments, from suburban living to historical landmarks, all captured through stunning aerial photography.

The scene type are divided into:

  1. Extra Long Shot
  2. Long Shot
  3. Medium Long Shot / Panorama
  4. Medium-shot
  5. Medium Close-Up
  6. Close-up
  7. Macro shot

Which scene type does the video most belong to, please return one number only

chatgpt answer (True)

Based on the description provided, the video most likely primarily utilizes option 3: Medium Long Shot / Panorama.

DeepSeek-VL-7B answer (True)

The video most belongs to the scene type of 3. Medium Long Shot / Panorama.

True answer is Medium Long Shot / Panorama

十个视频测试结果

结论

LLM在判断视频拍摄视角方面表现不佳:当标注中出现视角相关的文字描述时才有一些效果

LLM在判断视频拍摄运镜方面表现差异大:大多数视频是固定视角,ChatGPT准确率远高于DeepSeek-VL

LLM在判断视频拍摄景别方面表现较好,当视频主题为非常见物体时,人类和算法都不好判断景别

综上,“视角”和“运镜”两部分信息需要人工标注,“景别”可使用大语言模型标注。
但,当同时需要“视角”、“运镜”、“景别”三种信息时,建议由人工一起标注完成。

1. 燃烧的火焰

在这里插入图片描述

caption by deepseek7B

caption

The video features a dynamic display of bright yellow and orange light streaks against a dark background. The streaks appear to be emanating from the center of the image, creating a sense of movement towards the right side. The light is concentrated in the upper half of the image, with the intensity gradually decreasing towards the bottom. There are no discernible texts or countable objects in the image. The relative positions of the light streaks suggest a radial pattern emanating from the center, and the image does not contain any other objects or elements that can be confidently identified. The overall impression is that of a digital abstract, reminiscent of energy or plasma, with vibrant yellow and orange hues dominating the scene and a vertical orientation that fills the entire frame. The image is purely abstract and lacks any discernible texts, countable objects, or specific object actions. The watermark “shutterstock” indicates that it is a digital creation from the Shutterstock platform.

chatgpt answer

视角:5: Bird’s-eye / Aerial Shot.
运镜:8: Fixed
景别:7: Macro shot

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:1. Extra Long Shot

推荐答案

视角:该视频主体非常见物体,无法辨别拍摄视角
运镜:8: Fixed
景别:该视频主体非常见物体,较难辨别,推荐 6. Close-up

2. 小镇鸟瞰图

在这里插入图片描述

caption by deepseek7B

caption

The video depicts three distinct aerial views: a neighborhood with a variety of colored houses and buildings, a historic fortification with a large, green open space in the center surrounded by a high stone wall, and a bird’s-eye view of the same fortification featuring multiple buildings and walls with thatched roofs. Each frame showcases a unique architectural and natural landscape, with the neighborhood featuring a mix of residential and commercial structures, the fortification displaying a blend of historical and contemporary elements, and the bird’s-eye view highlighting the fortification’s intricate layout and green spaces. The images are watermarked with “Shutterstock” and unique identifiers.

chatgpt answer

视角:5: Bird’s-eye / Aerial Shot.
运镜:4: Dolly
景别:3: Medium Long Shot / Panorama.

DeepSeek-VL-7B answer

视角:5: Bird’s-eye / Aerial Shot.
运镜:1: Zoom
景别:3. Medium Long Shot / Panorama

推荐答案

视角:5: Bird’s-eye / Aerial Shot.
运镜:1. Zoom
景别:2. Long Shot 转 3: Medium Long Shot / Panorama.

3. 多人交流

在这里插入图片描述

caption by deepseek7B

caption

In a modern office with large windows showcasing a cityscape, four business professionals in professional attire engage in a discussion. The man on the left is bald, wearing a dark suit, and speaking. Beside him stands a woman with dark hair, wearing a light brown suit, holding a tablet. Next to her is another woman with long dark hair, wearing a dark suit, and smiling. On the far right is a man with dark hair, wearing a beige suit, and smiling. They are all standing in a semi-circle formation.

chatgpt answer

视角:2: Horizontal Shot.
运镜:8: Fixed
景别:4: Medium-shot.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:4. Medium-shot

推荐答案

视角:2: Horizontal Shot.
运镜:8: Fixed
景别:4: Medium-shot.

4. 情侣视频

caption by deepseek7B

caption

A bride and groom are standing on stone steps in front of a white chapel with a cross on the top, surrounded by trees and grass. The man is wearing a dark suit with a red tie, a white shirt, and a pink tie, while the woman is wearing a white wedding dress with a long train. The sky is partly cloudy.

chatgpt answer

视角:4: High Angle Shot.
运镜:8: Fixed
景别:2: Long Shot.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:4. Medium-shot

推荐答案

视角:2. horizontal Shot,水平偏低的角度
运镜:8: Fixed
景别:3. Medium Long Shot / Panorama

5. 绿幕圣诞女孩

在这里插入图片描述

caption by deepseek7B

caption

In the video, a woman is dressed in a red dress with white fur trim at the waist and cuffs, and a white hat with a single white feather. She has long dark hair and is holding a white object in her hands. She is wearing black high heels and stands in the center of a green screen background.

chatgpt answer

视角:2: Horizontal Shot.
运镜:8: Fixed
景别:5: Medium Close-Up.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:4. Medium-shot

推荐答案

视角:2: Horizontal Shot.
运镜:8: Fixed
景别:3. Medium Long Shot / Panorama

6. 三文鱼烧烤

在这里插入图片描述

caption by deepseek7B

caption

The video depicts a red piece of meat cooking on a grill during all three phases, with the description becoming repetitive as it focuses on the meat’s color and cooking method.

chatgpt answer

视角:2: Horizontal Shot.
运镜:8: Fixed
景别:7: Macro shot.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:6. Close-up

推荐答案

视角:4. High Angle Shot
运镜:8: Fixed
景别:6. Close-up

7. 电工喝水

在这里插入图片描述

caption by deepseek7B

caption

The video shows a man in a yellow hard hat standing in front of a tall red and white electricity tower on a clear day, with a clear blue sky above him.

chatgpt answer

视角:4: High Angle Shot.
运镜:8: Fixed
景别:4: Medium-shot.

DeepSeek-VL-7B answer

视角:4: High Angle Shot.
运镜:1: Zoom
景别:1. Extra Long Shot

推荐答案

视角:1. Low Angle Shot
运镜:8: Fixed
景别:4: Medium-shot.

8. 草地人群聚会

在这里插入图片描述

caption by deepseek7B

caption

In a park on a sunny day, a crowd of people are sitting and standing on a grassy area. Many individuals are wearing jeans, t-shirts, and carrying backpacks, while a few hold drinks. The scene features tall buildings in the background, with trees lining the edge of the area. The crowd gazes towards something out of frame, and there are scattered trees throughout the park. Some people are holding sunglasses, and some wear handbags.

chatgpt answer

视角:5: Bird’s-eye / Aerial Shot.
运镜:2: Pan
景别:2: Long Shot.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:4. Medium-shot

推荐答案

视角:2. horizontal Shot
运镜:8: Fixed
景别:3. Medium Long Shot / Panorama

9. 船与行人

在这里插入图片描述

caption by deepseek7B

caption

The video shows two boats docked along a canal in a city, with a clear blue sky on one side and buildings on the other. The boat on the left has a white body with a red stripe and the number 6, while the boat on the right has a white body with a blue stripe. There are multiple people standing on the sidewalk along the canal, and the buildings have a European architectural style with multi-storied designs. Behind the fence, cars are parked along the street.

chatgpt answer

视角:5: Bird’s-eye / Aerial Shot.
运镜:8: Fixed
景别:2: Long Shot.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:3. Medium Long Shot / Panorama

推荐答案

视角:2. horizontal Shot
运镜:8: Fixed
景别:2: Long Shot.

10. 鸟立枝头

在这里插入图片描述

caption by deepseek7B

caption

In the video, a vibrant green and blue bird with a thin, light brown branch is perched in a serene natural outdoor setting, surrounded by a soft blur of warm and natural hues, suggesting a peaceful environment.

chatgpt answer

视角:1: Low Angle Shot.
运镜:8: Fixed
景别:6: Close-up.

DeepSeek-VL-7B answer

视角:1. Low Angle Shot
运镜:1: Zoom
景别:1. Extra Long Shot

推荐答案

视角:1. Low Angle Shot
运镜:8: Fixed
景别:6: Close-up.

  • 5
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值