First post! I got here after reading Robert Harley's invitation to participate in a recent TAS.
I hear a lot from objectivists that medical DBT's are the standard of evidence, therefore we audiophiles should submit to DBT's as the best way to evaluate equipment. As a physician, it is my job to read and critique journals, so I am VERY familiar with the ins and outs of medical DBT's.
While I have no sympathy for the most hardcore objectivists, the very same "partisan hacks" Mr. Harley was ranting about, I believe that DBT's need to be applied intelligently before the results can be interpreted. But the way most audio DBT's are conducted, these are about as unscientific as the subjectivists they are trying to debunk.
As pointed out in another post by Jonathan Valin - medical DBT's are massive. They involve thousands of patients with strict entry criteria (the disease being studied is strictly defined, you can not have other medical conditions which may interfere with data interpretation, you must be of a certain age, etc etc). These studies are carefully designed, take months or years to complete, months to analyze, and then months for the peer review process and finally publication.
Audio DBT's are not. We never know how sophisticated the listening panel are, whether they know what to look for, and whether individual variations in hearing, perception, chronic diseases which may affect hearing - have been identified and controlled for. We do not know if the test material (music) being played is familiar to the listener. We don't know if non-verbal cues (which can be used to lead AND mislead) are present. And finally, the evaluation period is all too brief. We all know that it can sometimes take weeks of listening to material we are familiar with, on systems we are familiar with, before we get to know the effect of a particular change. How are we expected to identify the changes in such a short period of time, and in an unfamiliar system?
The lack of identification of potential sources of bias, the lack of scientific rigour exposes most DBT proponents as sham merchants keen to offer scientific window dressing on a testing methodology which is of limited use.
The next difference is this - in a medical DBT, we know what we are looking for. The trials are designed to demonstrate the primary or secondary endpoint. For example - if a drug is purported to reduce the incidence of stroke, the primary endpoint would be number of new strokes per year in the treatment and control group. The study design would identify other stroke reducing drugs in both groups, and specify what is permissible and what isn't. We don't just take a new drug, give it to 5,000 patients and give placebo to 5,000 controls, and look at both populations to see what happens.
In an audio DBT, what is the primary endpoint? Does the testing panel even know what they are supposed to be looking for? Is there a score sheet that says "image width was xxx meters" or "frequency response: skewed to bass or treble"? Well, there isn't. You are expected to notice a difference, whatever it is, and then use that as a basis for comparison.
Another point about the size of the sample. In medical DBT's, we often do a power calculation before we start recruiting. A power calculation tells us how many subjects we need to recruit before the DBT becomes statistically meaningful. For example, a dose of 10,000 Grays of radiation only needs a small sample size to demonstrate the harmful effect. But what about much smaller doses, like the radiation you get from a chest X-ray? You need to know how many people to study before you even begin the study.
So what sample size is needed to demonstrate a relatively subtle audio tweak, such as the effect of various interconnects? Nearly everyone except the most naive listener can hear differences in loudspeakers, so you can get away with a small sample size. How many people do you need to test to demonstrate the difference between 0.01% THD and 0.02% THD? Most audio DBT's involve maybe 5-10 listeners. Is this enough to demonstrate the difference?
Despite the medical DBT being held up as the gold standard, I can tell you that many of them are either uninterpretable or poorly generalizable because of various failings in study design, study sample, and so on. Many of them tweak the statistics to make the differences seem more impressive than actually measured. Of course, it is possible to tweak the stats the opposite direction, for example to minimize the number of adverse outcomes. Simply redefine your endpoint, and there you go. I also know enough about these academics to know that their motives are not always pure and there may be all kinds of conflicts of interest.
I should also say: in audio DBT's, there is a strong bias towards the null hypothesis (that intervention X made no difference). Medical DBT's would be the same - if they were as poorly designed as audio DBT's. In fact, there have been a number of medical DBT's that have shown the null hypothesis when all of us in clinical practice, anecdotally know that the intervention makes a difference with our patients. In such cases, I ignore the study and tell people that I know that xxx works. Eventually another DBT may come along that changes the conclusion. It happens all the time!
In the end, you may call me a "super-objectivist". I think that audio DBT's have their place, but only well designed ones. Most DBT's I read about are absolutely pathetic.