double-blind listening

robven77@aol.com -- Sat, 07/16/2011 - 16:38

 double-blind testing is a cornerstone of the scientific method, and i have difficulty understanding why it is considered so heinous among audiophiles.  if it is difficult to carry out in practice, and many such efforts are flawed, that does not invalidate the technique or its potential usefulness.  therefore, my present position is that double-blind testing must be dethroned by a superior method of evaluation, and as i see it, non double-blind testing seems riddled with intrinsic flaws that are inherently uncorrectable, as long as human beings are involved in the evaluation process.
 
even more difficult for me to understand is why the flaws of current non-double-blind testing are not so readily apparent to audiophiles.  i would like this topic to be free from vitriol, and i see a lot of ego and belief-system defenses render this topic so emotional as to be unproductive.  therefore, i will assume that every reviewer on this site is beyond reproach, ethically, and i the concerns i have are not directed at them, but rather are directed to the flaws inherent in non-double-blind testing.
 
first off, among non-holy men, money is a powerful corruptor; this need not be elaborated on.  for a review to be meaningful, it must be free from the potential influence of factors such as money, reviewer-manufacturer friendship, advertising considerations, whether the reviewer got a sweet deal on his equipment, etc.  let me give you an example of this that i face.  i am a physician, and it so happens that i carry medications in my office for my patients when 1) i believe them to be superior to products available elsewhere to the patient and 2) when i find that generic products are being sold to my cash-paying patients at a charge many times what i pay for the product myself.  my policy is to simply give the products away when they are low-priced (usu example #2)- say under $10-15, when the patient will be faced with paying $50 or more- or i show the patient an invoice of what i paid for the product and give it to them at my price.  the reason?  i cannot afford to have the patient question my motives for recommending the product i carry-even the appearance of a possible financial influence is unacceptable to me.  i don't want the patient asking himself, "did he advise this for my benefit or because he made a buck?"   the patient paid me to do my absolute best for him, just as i paid the subscription cost to have the reviewer do his best for me, and in neither case should there be any question of additional, non-transparent motives or forces at work.
 
similarly, i do not appreciate that when i read a review, i have to wonder whether the reviewer got a good deal on the sample he tested, whether he was of the belief that tubes are inherently better than transistors or v/v, and this subtly influenced his opinion, whether his buddy is the manufacturer, or whatever.  of course, this must occasionally happen, even if it never has and never will on this site.  the possibility of bias and influence that i am unaware of, and which the reviewer may even be unaware of, casts doubt on the reliability of the process itself.
 
i'm sorry, i am old enough that for me to believe that i should place blind faith that the reviewers will not fall victim to the many  potential pitfalls of a reviewing system that is inherently flawed is not something that i am inclined to do.  and seriously, can it be argued that with the potential for such bias, the system is not inherently flawed?  to believe in this process requires that i have faith that the reviewers are not just incapable of corruption, but also immune to a myriad of psychological influences that affect pretty much every human being, to some degree.  how many priests, doctors, lawyers, politicians, executives, etc have proven, to our surprise, to be corruptible, that we should expect that ANY group of people are incorruptible?  that is utter naivete.  
 
i will await some attempts to see results from double-blind testing, and i admittedly am no authority on its flaws as applied to audio.  the forums i have read usually just degenerate into name-calling and insult-slinging between the two camps, and there is not much meat, just a lot of hot sauce to that recipe.  do  the reviewers even acknowledge the above potential problems?  all i see is offense taken that their integrity is being assailed.  which, in fact, it often is-like i said, the debate is unpleasant and largely unproductive, as far as i have seen.  i have no qualms with the reviewers, just the system that, as i see it, is intrinsically flawed, and as a reader it leaves me unable to place much faith in the reviews, just as i would expect one of my patients would question whether the $40 medicine he purchased from me was influenced by factors other than what was best for him, as a patient.

robven77@aol.com -- Sat, 07/16/2011 - 16:45

thanks

fkrausz -- Mon, 07/18/2011 - 13:31

If there really one universally applicable method for doing science, the field would be a lot easier.  Let's consider for a moment double-blind testing to compare two amps.  Just to give ourselves something concrete to think about, suppose that the two amps are identical except that one of them compresses dynamics somewhat -- say, like an average FM radio station does these days. The listeners in the test will easily and repeatably distinguish the two amps when, and only when, they compare the two amps reproducing a musical fragment with a significant dynamic range.  But since the test is (presumably) not prepared by somebody who knows in advance what the differences between the two amps are, such musical fragments will only constitute a small fraction of the test samples. So, quite likely, the test will show no significant differences between the amps -- even though a listener hearing an extended, non-blind comparison would probably perceive the compressed amp as sounding very different from the uncompressed amp.
And you can substitute any difference you care to think of for dynamic compression.  (I recall an old Stereo Review double-blind amp test that failed to distinguish tube from transistor amps, even though the output impedance of tube amps usually causes a significant alteration of the effective frequency response of the amp-speaker system.)  The point is that, unless the difference between components under test is utterly pervasive, the statistics of the double-blind test aren't valid unless the music samples being compared always reveal the difference, which requires knowing what the audible differences are before setting up the test.
Possibly one could take the snippets of reviews that say things like "when the violins came in during the second part, the Freebisher interconnect exaggerated the rosiny sheen..." -- and then do a double-blind test to see if the reviewer who wrote that could actually consistently distinguish the components under test playing just that passage of music.  But please, who cares?  Do you really expect Car and Driver to do a double-blind test to see if their test driver really gets better road feel on turns from one car or another?
Just enjoy the magazines, try to get some personal experience of the equipment yourself, and read the opinions of the reviewers you find credible.

Mr Plus -- Mon, 07/18/2011 - 14:30

I take (and see) no offense in genuine, frank and open discussions about review methodology. In fact, I think it's the only way we can move on. However, it's worth quoting Sayre's Law at this point: "In any dispute, the intensity of feeling is inversely proportional to the value of issues at stake."  
 
Rather than simply pitch in with knee-jerk gainsaying of what you say, I think it's best to describe potentially 'why' people reject double-blind tests in audio, from the perspective of the kind of person who rejected double-blind testing even before I was involved in the business. 
 
Although it's easy to think there are objectivists and subjectivists at loggerheads with one another, in fact there is a continuum of viewpoints on this topic, from the extreme ("listening tests are redundant because measurement tells us everything anyway") to the objective ("double-blind listening tests alone describe the performance of a device") to the pragmatic ("blind tests are useful, but don't supply the complete picture") to the subjective ("blind tests are flawed") and back to the extreme again ("any form of test is invalid because it undermines the natural listening experience"). Each position is 'well manured', in that the arguments have been nurtured and grown in a rich compost of argument, and that everyone seems to think everyone else's opinion is made of horse manure.
 
Many audiophiles from the pragmatic to the subjective-extreme hold that their ears are a reliable, often final arbiter of sound quality. And the results of ad hoc listening tests at a dealer or at home will demonstrate this to the listener, even though those results might entirely contradict the findings from a double-blind test. Faced with such a evidentiary dichotomy, many will choose the prima facie option. While self-evident positions like this can be pointing in the wrong direction (cholera and 'miasma') or the right one ("...that all men are created equal"), they are immensely durable. This also explains why such things are prone to be strongly defended, because being told your world view is effectively worthless isn't taken lightly. 
 

Alan Sircom
Editor, Hi-Fi Plus Magazine
London, England
editor [at] hifiplus [dot] com

robven77@aol.com -- Tue, 07/19/2011 - 19:57

 thank you, fkrausz and mr plus, for your thoughtful answers.  
an observation over the past few days:  i bought the past several years of the bound for sound magazine issues, and felt far more trusting reading reviews and opinions by someone who accepts no advertising.  now, maybe the issue of "perks" is still there, but the mere fact that bound for sound does not take advertising inherently gives that magazine more credibility, in my mind.  

Mr Plus -- Wed, 07/20/2011 - 07:16

There is no intrinsic reason why the integrity of a test be contingent upon how it is funded. A wealthy, yet corrupt, publisher could potentially afford to print a title with no advertising or integrity. On the other hand, a poor, but principled publishing house could make a magazine of great integrity but relied on advertising to keep that integrity alive. It seems like circular logic, but editorial integrity ultimately comes down to the integrity of the editor.

The same applies to individual reviewers, and holds irrespective of how they come to acquire the products they use in the execution of their duties as reviewer. Having been active in hi-fi in Britain in the 1980s and seen the destructive actions of those who made a big thing of owning their own equipment and using that to give them liberty to push their own agenda, I would rather a reviewer who borrows everything but retains their personal integrity than one who hides behind a shield of mock-credibility.

Alan Sircom
Editor, Hi-Fi Plus Magazine
London, England
editor [at] hifiplus [dot] com

discman -- Wed, 07/27/2011 - 06:22

I think robven77 has articulated some key points that have been threaded through previous versions of this conversation, but not nearly so clearly. The issue might be phrased something like this:
 
1. There are reasonable people who don't trust the human participant in observational testing (due to failings of human perception, corruption, unconcious bias, etc). While there is data showing that these failings can happen with any methodology, observational testing seems most suspect.
 
2. There are reasonable people who distrust methodologies which decrease the relevance of observations to actual music listening (quantitative measurements are the most extreme version of this: they eliminate the human almost entirely, but at the price of conveying information that is very hard to interpret in terms of musical experience).
 
My understanding is that the beef people have with double-blind testing falls into group 2. It isn't that double-blind testing is "heinous", it is that it is very hard to do and on top of that is much less like actual listening. If you don't have problem #1, double-blind seems inferior given that the reviewer can simply listen to the device under test and then write out his observations. But, note carefully, if you have problem #1, this won't be satisfying for you.
 
There are two other, related, issues. First, double-blind testing is more geared to understanding whether two devices are different, especially gross differences. For example, in pharmaceutical testing, we want to know if a drug does something more than a placebo (i.e. nothing). Note that while we tend to use the word "science" here as if it meant the realm of existence above all others, in reality "science" often doesn't apply well to affairs of aesthetics and humanity which ironically is the realm we care most about. Observational testing is more geared to describing the details of how a device performs musically, with special attention to subtleties. There are differences of purpose in the methodologies that, again, I don't think make one heinous and one angelic any more than a hammer and a screwdriver have those qualities.
 
The related issue might be called "certification". If you want to know whether product A is "better" than product B, I think double-blind testing has the edge because the format is more intrinsically geared to identify winners. Really either approach could work this way, but the inherent A vs. B format is an assist here. OTOH, if you assume that "better" is often an artificial notion (because most products embody a set of engineering tradeoffs that mean A is better at some things and worse at others) then double-blind isn't advantaged and can be misleading. Observational testing keeps the focus in the right place, some might say.
 
I think it helps to judge all this by looking at what you ultimately will use from a review. If it is the description of the sound or the judgement of a human about the musicality of the sound, you're pretty much standing on observational methodology regardless of other approaches which accompany the effort. Now you can do observational stuff well or poorly, and we could ask whether double-blind actually helps with that. But I think, for those willing to look at it, the difficulty of getting away from observation often comes as a surprise. Once you've accepted that issue, you then realize that in a lot of ways the biggest issue is language. It is just plain hard for reviewers to convey sonic experiences to another being using written words.

prepress -- Sun, 08/07/2011 - 12:32

 I suppose that I am somewhere in the middle on this. I, too, have seen discussion on this question deteriorate into name-calling and other such junk. Proponents of DBX often came off as arrogant and condescending; the subjectivists came off as naive and willfully ignorant. As someone who considers audio an interest rather than a hobby, is not an expert of any kind, and has no axe to grind either way, I believe that both sides have something to offer. But in the end the question for me is do I like the equipment under consideration, which is followed closely by, "can I afford it?"
 
With no previous experience or knowledge of these things and no measurements to go by, I bought my first serious CD player back in 1992, a California Audio Labs Tercet Mk. IV. I'd gone to listen to CAL's Icon player, and was pleased with what I heard on the various tracks used from the CDs I'd brought. But the Tercet was there for comparison, and the dealer hooked it up to the same system and repeated the tracks previously heard on the Icon. There was no contest. The difference (improvement) was obvious even to me. I bought the Tercet ($700 more expensive, as I found out AFTER hearing both players). In this case, DBX wouldn't have offered anything new. But perhaps with two components with similar sound but dissimilar prices DBX might be more helpful, especially if, unlike my CD player experience, I know the price or features beforehand; that would be more likely now, as I tend to research potential purchases.

Keladrin -- Tue, 04/24/2012 - 07:42

I think the whole question arises from the issue that audio comparisons are frought with problems and at the end of the day not straightforward to do properly. Its a stream of transient information and as such the humann memory buffer is small (or it would quickly fill up). If we are comparing pictures then we just look at the two side-by side and write down the differences at our leisure. But even here we dont look at one subjectively then the other. If we are serious about comparing audio quality rather than just writing some subjective journalism then professional blind trials are the only way to go, as said before it's certainly open to question, but then the other approach even more so. The other thing I would like to see is summarisation of proper independant trials/papers in theaudio press rather than the typical interview with the company 'techie' who has a vested interets in promoting the technology of development. What would you trust more - a hoover salesperson with some technical knowledge or an independant consumer trial as done by Which (or an independant scientific body)?
Kevin

All content, design, and layout are Copyright © 1999 - 2011 NextScreen. All Rights Reserved.
Reproduction in whole or part in any form or medium without specific written permission is prohibited.