How to do listening test

wenli2011

于 2017-03-29 11:12:00 发布

阅读量399

点赞数

本文链接：https://blog.csdn.net/wenli2011/article/details/68061980

版权

A test session should not last for more than 20-30 min. Longer tests should be divided into two parts with a break.
Suggest that no more than 10 to 15 trials per test session.
The content with short 10s is ideal; 20s is okay.
Match the loudness of all items under test; use reference level for playback
It is critical for subjects to be trained to hear the impairments you are trying to test.
Hidden reference: The MUSHRA test (BS-1534) uses the original unprocessed programme material with full bandwidth as the reference signal( as a hidden reference).
"Anchor" signals: The MUSHRA test (BS-1534) use two additional "anchor" signals. The standard anchor is a low-pass filtered version of the original signal with a cut-off frequency of 3.5 kHz; the mid quality anchor has a cut-off frequency of 7 kHz.
- Additional anchors are intended to provide an indication of how the systems under test compare to well-known audio quality levels.
- Should not be used for re-scaling results between different tests.
- These anchors must be known to be detectable to expert listeners but not to inexpert listeners.
- These anchors are also for the sensitivity of all other aspects of the experimental situation.

Your scores need to be reproducible- if you're asked to retake the test next time, your scores should be approximately the same and consistent.
Aim for about 1 hour for a test. Listener fatigue can ruin your results
Make sure all speakers works well.
No more than 20 assessors are often sufficient.

Report should be discarded
- Any listener who fails to identify the hidden reference more than 15% of the time must be discarded (BS-11116)

- Any listener who scores the hidden reference below 90, more than 15% of the time, must be discarded. Any listener who scores the 7kHz anchor above 90, more than 15% of the time, must be discarded (BS-1534 (MUSHRA))

It must be empirically and statistically shown that any failure to find differences among systems is not due to experimental insensitivity because of poor choices of audio material, or any other weak aspects of the experiment, before a "null" finding can be accepted as valid. It may be necessary to program special trials with low or medium anchors for the explicit purpose of examining subject expertise. (ITU-R BS.1116-3)

Discrimination: A measure of the ability to perceive differences between test items.

Reliability: A measure of the closeness of repeated ratings of the same test item.

Standard	Name	Page
Recommendation BS.1116	Methods for the subjective assessment of small impairments in audio systems	http://www.itu.int/rec/R-REC-BS.1116/en
Recommendation ITU-R BS.1534-3 (10/2015)	Method for the subjective assessment of intermediate quality level of audio systems	http://www.itu.int/rec/R-REC-BS.1534/en