There a numerous critiques of the basic testing methodology we use for TAS, HiFi+, Playback and TPV in the audio realm. While these critiques can and have been critiqued, I think it would be interesting to try some additional testing methods. After all, no test method is perfect. There is the logical problem here that trying a test method, if it has low discrimination capability, may seem to reveal something about the equipment or the listener that is in fact a revelation about the test. With that caveat in mind, reasonable people should be able to look at any results not as definitive but as interesting input. As one user said "this is a hobby, it is supposed to be fun." Well, let's have some fun.
So, my question: can knowledgable, thoughtful people help us craft a practical alternative test?
My understanding is that the core issue of interest is something like "open trial listening observations (what we normally do) may simply reveal a reviewer bias, not actual differences between components". A related idea is "many of the components testing using observational listening (again, what we normally do) describe differences between components that simply do not exist." So, could we devise a testable hypothesis and method to address this?
For example, we might state the testable hypothesis as "in a blind test, reviewers will not be able to distinguish between two components A and B with a 95% confidence level".
We would then do a test of, say, 20 trials with a reviewer. An assistant would have two test components -- A and B. The assistant would flip a coin, and if it comes up heads, insert A into the test setup while the reviewer is out of the room. When the coin comes up tails, the assistant would insert B into the rig. The assistant records the trial number and whether A or B was in circuit. The assistant leaves the room. The reviewer enters the room and listens to the rig. The reviewer can listen as long as he wishes to any music. When he/she has identified A or B, he/she records this next to the trial number. The reviewer then leaves the room. The assistant enters the room and flips the coin, etc. At the end, the number of correct and incorrect identifications of A or B is determined. If 14 or more correct answers are provided, the products under test are considered distiguishable at the 95% confidence level. In that case, the hypothesis would be considered incorrect.
I don't know if this is the best hypothesis or the right test method. I'm asking for suggested improvements. I chose what is basically a single-blind approach because I can't see how to make this double-blind without introducing another piece of equipment that a) we don't have a b) could introduce additional problems into the system. But, I'm asking for input because I'm not an experimenter and science may have easy answers to these questions.
Thanks for any help you can provide.