VR系列——Oculus Audio sdk文档：一、虚拟现实音频技术简介（7）——虚拟现实的混合场景

VR模拟空间环境的声音真实性，需要小心处理VR体验效果，考虑声音动态范围、衰减和声音到达时间，如声音的看和听之间的延迟需要考虑，而Oculus不支持一些扬声器、人声等的定向源声音、瀑布、河流、人群等区域源，特别是对多普勒效应不支持，需要更高级的SDK提供支持，有些如背景音乐、按钮点击、人的呼吸或心跳声，可以考虑不用空间化，同时音频会受限主机系统的性能限制，造成音频延迟，破坏沉浸感。

对于声音设计，VR的混合场景是一门艺术也是一门科学。以下的建议中可能也包括警告。
As with sound design, mixing a scene for VR is an art as well as a science, and the following recommendations may include caveats.

创造性的控制（Creative Control）

实时性不是必要的最终目标！随时记住这一点。随着计算机环境的明朗，什么是一致的，且/或“正确”的可能在审美上不是理想的。音频团队必须小心，不要强制执行VR体验的正确的死板理念，而使自己重新走入困境。
Realism is not necessarily the end goal! Keep this in mind at all times. As with lighting in computer environments, what is consistent and/or “correct” may not be aesthetically desirable. Audio teams must be careful not to back themselves into a corner by enforcing rigid notions of correctness on a VR experience.

在考虑例如动态范围、衰减曲线和直接到达时间等问题时，是尤为真实的。
This is especially true when considering issues such as dynamic range, attenuation curves, and direct time of arrival.

精确的三维定位（Accurate 3D Positioning of Sources）

声音必须小心地放置在三维声场中。在过去，通常接近的位置是足够的，因为是完全严格地通过筛选和衰减定位的。一个对象的默认位置可能是他的臀部或触碰到地上的脚，如果有声音从这些位置响起，例如“裆步”或“脚声”，将与空间化所不协调。
Sounds must now be placed carefully in the 3D sound field. In the past a general approximation of location was often sufficient since positioning was accomplished strictly through panning and attenuation. The default location for an object might be its hips or where its feet met the ground plane, and if a sound is played from those locations it will be jarring with spatialization, e.g. “crotch steps” or “foot voices”.

定向源（Directional Sources）

Oculus音频SDK不支持定向声源（扬声器、人声、车喇叭等）。然而，更高级的SDK通常会使用基于角度的衰减模型，控制方向的密封性，对定向声源建模。定向的衰减应该在空间化效应之前发生。
The Oculus Audio SDK does not support directional sound sources (speakers, human voice, car horns, et cetera). However, higher level SDKs often model these using angle-based attenuation that controls the tightness of the direction. This directional attenuation should occur before the spatialization effect.

区域源（Area Sources）

Oculus音频SDK不支持区域声源，如瀑布、河流、人群等。
The Oculus Audio SDK does not support area sound sources such as waterfalls, rivers, crowds, and so on.

多普勒效应（Doppler Effect）

多普勒效应是声音场增强或减弱的明显变化。VR体验可以通过改变基于声源与听者之间的相对速度的重播，模拟这一现象。然而在这个过程中，很容易不经意引入人为加工。
The Doppler effect is the apparent change of a sound’s pitch as its source approaches or recedes. VR experiences can emulate this by altering the playback based on the relative speed of a sound source and the listener, however it is very easy to introduce artifacts inadvertently in the process.

Oculus音频SDK没有对多普勒效应的原生支持，而一些高级的SDK支持。
The Oculus Audio SDK does not have native support for the Doppler effect, though some high-level SDKs do.

声音传输时间（Sound Transport Time）

在现实生活中，声音传输需要时间，所以在看和听之间往往会有明显的延迟。例如，在你听到声音之前，大概在330毫秒前你会看到100米外枪开火的枪口闪光。建模传播时间会带来一些额外的复杂性，而且可能反而会使事物显得更不现实，因为我们被大众媒介所限制，则相信：虽然有距离但大声的动作应该立即被听到。
In the real world, sound takes time to travel, so there is often a noticeable delay between seeing and hearing something. For example, you would see the muzzle flash from a rifle fired at you 100 meters away roughly 330 ms before you would hear it. Modeling propagation time incurs some additional complexity and may paradoxically make things seem less realistic, as we are conditioned by popular media to believe that loud distance actions are immediately audible.

Oculus音频SDK支持到达时间。
The Oculus Audio SDK supports time-of-arrival.

非立体音效（Non-Spatialized Audio）

并非所有的声音都需要被空间化。许多声音是静态的或相对的，如：
Not all sounds need to be spatialized. Plenty of sounds are static or head relative, such as:

用户界面元素，如按钮点击、哔哔声、转换，以及其他信号。
背景音乐
旁边
身体的声音，如呼吸声或心跳声
User interface elements, such as button clicks, bleeps, transitions, and other cues
Background music
Narration
Body sounds, such as breathing or heart beats

在创作的过程中，这些声音应该被隔离，因为它们很可能是立体声，所以在混合的过程中它们不应该被不经意地推到三维位置的音频管道。
Such sounds should be segregated during authoring as they will probably be stereo, and during mixing so they are not inadvertently pushed through the 3D positional audio pipeline.

性能（Performance）

每个必须放置在三维声场中的额外声音的空间化，会引起性能的影响。这代价大小因平台而异。例如，对于高端PC，空间化50个以上的声音是合理的，而在移动设备上可能只能够空间化1~2个声音。
Spatialization incurs a performance hit for each additional sound that must be placed in the 3D sound field. This cost varies, depending on the platform. For example, on a high end PC, it may be reasonable to spatialize 50+ sounds, while you may only be able to spatialize one or two sounds on a mobile device.

世界上有一些声音可能不利于空间化，即使是放置在三维空间。例如，非常低的隆隆声或嗡嗡声具有很差的方向性，可以作为筛选和衰减的标准立体声。
Some sounds may not benefit from spatialization even if placed in 3D in the world. For example, very low rumbles or drones offer poor directionality and could be played as standard stereo sounds with some panning and attenuation.

周围环境（Ambiance）

传统非VR游戏的沉浸通常是不可能的，因为很多游戏玩家或PC使用者依赖低质量的桌面音响、较差环境隔离的家庭影院，或语音聊天的游戏头戴设备。
Aural immersion with traditional non-VR games was often impossible since many gamers or PC users relied on low-quality desktop speakers, home theaters with poor environmental isolation, or gaming headsets optimized for voice chat.

现在，具备耳机、位置跟踪、全视觉沉浸，比以往更重要，因为声音设计者关注用户的音频体验。
With headphones, positional tracking, and full visual immersion, it is now more important than ever that sound designers focus on the user’s audio experience.

（具备这些，）意味着：
This means：

恰当地空间化声音源
适当的声音范围，疏密恰当
避免用户疲劳
适当的音量，适合长期聆听
室内环境效应
Properly spatialized sound sources
Appropriate soundscapes that are neither too dense nor too sparse
Avoidance of user fatigue
Suitable volume levels comfortable for long-term listening
Room and environmental effects

音频杂音（Audible Artifacts）

由于三维声音在空间中传播，不同的HRTF和衰减功能会变得活跃，可能在音频缓冲边界引入不连续性。这些不连续性往往表现为点击、弹出或涟漪。减少声音传播的速度，并且确保声音有广谱内容，它们可能在一定程度上被屏蔽。
As a 3D sound moves through space, different HRTFs and attenuation functions may become active, potentially introducing discontinuities at audio buffer boundaries. These discontinuities will often manifest as clicks, pops or ripples. They may be masked to some extent by reducing the speed of traveling sounds and by ensuring that your sounds have broad spectral content.

延迟（Latency）

虽然延迟影响VR的各个方面，但是它通常被看作是图形化的问题。然而，音频延迟一样可能是破坏性的，打破沉浸感。依据主机系统、底层音频层的速度，对于使用高端、低延迟音频接口的高性能PC机，从缓冲区提交到音频输出的延迟可能是2毫秒那么短，而最坏的情况，可能长达几百毫秒。
While latency affects all aspects of VR, it is often viewed as a graphical issue. However, audio latency can be disruptive and immersion-breaking as well. Depending on the speed of the host system and the underlying audio layer, the latency from buffer submission to audible output may be as short as 2 ms in high performance PCs using high end, low-latency audio interfaces, or, in the worst case, as long as hundreds of milliseconds.

由于音频源与听者头部之间的相对速度增加，高系统延迟成为一个问题。在一个相对静止的场景中，一个缓慢移动的观众，音频延迟则难以检测。
High system latency becomes an issue as the relative speed between an audio source and the listener’s head increases. In a relatively static scene with a slow moving viewer, audio latency is harder to detect.

效果（Effects）

如滤波、均衡、变形、镶边等效果是VR体验中一个重要的部分。例如，一个低通滤波器可以模拟水下游泳的声音，在水下高频损失能量比在空气中更多；或者，变形可以用来模拟迷失方向。
Effects such as filtering, equalization, distortion, flanging, and so on can be an important part of the virtual reality experience. For example, a low pass filter can emulate the sound of swimming underwater, where high frequencies lose energy much more quickly than in air, or distortion may be used to simulate disorientation.