The following is my editorial from The Absolute Sound Issue 183 (not yet published) on blind listening tests.
The Blind (Mis-) Leading the Blind
Every few years, the results of some blind listening test are announced that purportedly “prove” an absurd conclusion. These tests, ironically, say more about the flaws inherent in blind listening tests than about the phenomena in question.
The latest in this long history is a double-blind test that, the authors conclude, demonstrates that 44.1kHz/16-bit digital audio is indistinguishable from high-resolution digital. Note the word “indistinguishable.” The authors aren’t saying that high-res digital might sound a little different from Red Book CD but is no better. Or that high-res digital is only slightly better and not worth the additional cost. Rather, they reached the rather startling conclusion that CD-quality audio sounds exactly the same as 96kHz/24-bit PCM and DSD, the encoding scheme used in SACD. That is, under double-blind test conditions, 60 expert listeners over 554 trials couldn’t hear any differences between CD, SACD, and 96/24. The study was published in the September, 2007 Journal of the Audio Engineering Society.
I contend that such tests are an indictment of blind listening tests in general because of the patently absurd conclusions to which they lead. A notable example is the blind listening test conducted by Stereo Review that concluded that a pair of Mark Levinson monoblocks, an output-transformerless tubed amplifier, and a $220 Pioneer receiver were all sonically identical. (“Do All Amplifiers Sound the Same?” published in the January, 1987 issue.)
Most such tests, including this new CD vs. high-res comparison, are performed not by disinterested experimenters on a quest for the truth but by partisan hacks on a mission to discredit audiophiles. But blind listening tests lead to the wrong conclusions even when the experimenters’ motives are pure. A good example is the listening tests conducted by Swedish Radio (analogous to the BBC) to decide whether one of the low-bit-rate codecs under consideration by the European Broadcast Union was good enough to replace FM broadcasting in Europe.
Swedish Radio developed an elaborate listening methodology called “double-blind, triple-stimulus, hidden-reference.” A “subject” (listener) would hear three “objects” (musical presentations); presentation A was always the unprocessed signal, with the listener required to identify if presentation B or C had been processed through the codec.
The test involved 60 “expert” listeners spanning 20,000 evaluations over a period of two years. Swedish Radio announced in 1991 that it had narrowed the field to two codecs, and that “both codecs have now reached a level of performance where they fulfill the EBU requirements for a distribution codec.” In other words, Swedish Radio said the codec was good enough to replace analog FM broadcasts in Europe. This decision was based on data gathered during the 20,000 “double-blind, triple-stimulus, hidden-reference” listening trials. (The listening-test methodology and statistical analysis are documented in detail in “Subjective Assessments on Low Bit-Rate Audio Codecs,” by C. Grewin and T. Rydén, published in the proceedings of the 10th International Audio Engineering Society Conference, “Images of Audio.”)
After announcing its decision, Swedish Radio sent a tape of music processed by the selected codec to the late Bart Locanthi, an acknowledged expert in digital audio and chairman of an ad hoc committee formed to independently evaluate low-bit rate codecs. Using the same non-blind observational-listening techniques that audiophiles routinely use to evaluate sound quality, Locanthi instantly identified an artifact of the codec. After Locanthi informed Swedish Radio of the artifact (an idle tone at 1.5kHz), listeners at Swedish Radio also instantly heard the distortion. (Locanthi’s account of the episode is documented in an audio recording played at workshop on low-bit-rate codecs at the 91st AES convention.)
How is it possible that a single listener, using non-blind observational listening techniques, was able to discover—in less than ten minutes—a distortion that escaped the scrutiny of 60 expert listeners, 20,000 trials conducted over a two-year period, and elaborate “double-blind, triple-stimulus, hidden-reference” methodology, and sophisticated statistical analysis?
The answer is that blind listening tests fundamentally distort the listening process and are worthless in determining the audibility of a certain phenomenon.
As exemplified by yet another reader letter published in this issue, many people naively assume that blind listening tests are somehow more rigorous and honest than the “single-presentation” observational listening protocols practiced in product reviewing. There’s a common misperception that the undeniable value of blind studies of new drugs, for example, automatically confers utility on blind listening tests.
I’ve thought quite a bit about this subject, and written what I hope is a fairly reasoned and in-depth analysis of why blind listening tests are flawed. This analysis is part of a larger statement on critical listening and the conflict between audio “subjectivists” and “objectivists,” which I presented in a paper to the Audio Engineering Society entitled “The Role of Critical Listening in Evaluating Audio Equipment Quality.” You can read the entire paper here http://www.avguide.com/news/2008/05/28/the-role-of-critical-listening-in-evaluating-audio-equipment-quality/. I invite readers to comment on the paper, and discuss blind listening tests, on a special new Forum on AVguide.com. The Forum, called “Evaluation, Testing, Measurement, and Perception,” will explore how to evaluate products, how to report on that evaluation, and link that evaluation to real experience/value. I look forward to hearing your opinions and ideas.
Robert Harley
1992 - sixteen years ago.
Do you think that cables and testing procedures may have changed a wee bit since then?
I came across an article in a 1954 edition of Audio that says that stereo is just a fad...
Steven Stone
Contributor to The Absolute Sound, EnjoytheMusic.com, Vintage Guitar Magazine, and other fine publications
I attended the cable “test” you cite, held at the 1991 New York Audio Engineering Society convention. I put the word “test” in quotation marks because nothing was being “tested” to make new discoveries. The “test” was purely a sham and a fraud intended to ridicule audiophiles. The “test” was actually part of a “workshop” that featured one presenter after another suggesting that cables made no audible difference. One of the presenters worked for New York’s consumer-fraud agency. He stated that the agency’s official position was that audio cables were a scam, and that any retailer who told a customer that one cable sounded better than another would be subject to prosecution by his office.
To summarize, the listening “test” was conducted in a ballroom that seated several hundred people (it was perhaps 250,000 to 300,000 cubic feet in size). The loudspeakers were woefully inadequate for the space (they were small Thiel models), and were placed atop flimsy folding tables. Most of the listeners sat to the left of the left loudspeaker, or to the right of the right loudspeaker. The panelists’ microphones picked up the signal reproduced by the loudspeakers during the listening “test” which was then amplified by the ballroom’s PA system. The “test” subjects thus heard the sound from the loudspeakers along with a delayed version of the sound through the PA system. The “test” was a model of incompetence. But listening conditions are of no importance when the outcome is known in advance.
My detailed account of the “test” was published in Stereophile as an article called “Audio McCarthyism” and can be found here http://www.stereophile.com/asweseeit/107/index.html
Here is what made ME sceptical about MY hearing abilities:
In my studio there were 2 identical preamps/compressors, both connected to microphones in the same room.
So I fiddeled with the knobs of one of the preamps listening through headphones and I actually heared the difference in sound when doing so... just after a while i noticed that I used the wrong preamp - that one was muted at its output. Believe me, I clearly "heared" changes in compressor settings in a muted device. But not through my ears, of course.
Conclusion: My brain created / invented soundfifference which was physically NOT there. Just because it "knew" what turning these knobs "should" sound like.
I call that a "producer's knob." Most studios have at least one to keep producers from mucking up perfectly fine mixes.
Here's a site with a very interesting article that's long, but near the end there's a very interesting hypothesis on why A/B/X tests don't work.
Its a large PDF file, but it's safe...
http://855215829737223393-a-1802744773732722657-s-sites.googlegroups.com...
Steven Stone
Contributor to The Absolute Sound, EnjoytheMusic.com, Vintage Guitar Magazine, and other fine publications
Well, you know if your disinterested wife does not hear any difference
immediately, your not buying it ! :]
Curtis
Ironic title for the article.
I have read the article and only the top of the looong discussion.
All I can say is, this article deserves to be published in a magazine for 'audiophile' buyers of high-priced equipment. They will gobble it up. It is custom-tailored for its audience.
It tells a long and convoluted story about the failure of a controlled test to identify an audible variable. The test failed because of an unidentified flaw that the writer makes no attempt to discover. Instead, he prefers to try and convince the (all too gullible and eager to be fooled) readers that the test has no flaw.
On that basis he extrapolates to his 'conclusion' that all tests of that nature are ineffective. Talk about crooked thinking!
No matter what imagined or real flaws there are to controlled testing methodologies, I am here to tell you that the flaws in uncontrolled (sighted) listening tests are, not several times, not dozens of times, but thousands of times larger in magnitude. People who do not grasp this simply do not understand the human mind. The imagination will so overwhelm the content of the nerve impulses travelling from the sense organs to the brain, that all conclusions drawn from such uncontrolled listening have zero value. Zero. I am sorry. Sadly for us all, the conscious mind makes it its business to convince us that the conclusions it draws after the imagination has done its work are not perceptions, but Reality.
By all means, continue to refine the methodology of controlled testing. But wherever there is a disagreement in the results of uncontrolled and controlled testing, the only sensible options are to accept the controlled test, or re-conduct the controlled test. In the case of Swedish Radio, once the result was proven to be flawed, the test needed to be done again, preferably by (at least one) independent organisation, ideally in another country.
"No matter what imagined or real flaws there are to controlled testing methodologies, I am here to tell you that the flaws in uncontrolled (sighted) listening tests are, not several times, not dozens of times, but thousands of times larger in magnitude. People who do not grasp this simply do not understand the human mind. The imagination will so overwhelm the content of the nerve impulses travelling from the sense organs to the brain, that all conclusions drawn from such uncontrolled listening have zero value. Zero. I am sorry. Sadly for us all, the conscious mind makes it its business to convince us that the conclusions it draws after the imagination has done its work are not perceptions, but Reality."
Pedantically speaking, the difference beetween invalid and even a little bit valid is the same as the comparison between zero and any non-zero number. The difference is an irrational number. Seems fitting since basing *anything* on inherently grotesquely-flawed evidence in the face of well-formed evidence is irrational. In fact a Single Presentation evaluation fails to conform to the definition of a test, since there is no immediate reliable reference.
I think that it is very telling that Harley's 5/28/2008 editorial is based on a single test that was done almost 20 years ago.
Nobody knows how many DBTs have been done since the early 1990s, but the number is probably up in the 100's of thousands. A great many of them were related to the development of perceptual coders, so they had positive outcomes. Yet so many highly visible DBT critics like Fremer, Atkinson, and Harley keep on about maybe 5 DBTs that happened way back when.
It's also very telling that Harley descended into name-calling directed towards people who that he disagrees with.
Ah, the DBT problem. It just won't go away.
As a magazine editor, I'm damned if I do (having worked on a magazine that ran single-blind AB tests, I know that the results of such evaluations are met with incomprehension and disinterest from casual readers and often open hostility from enthusiasts, because single-blind is still 'one blind too many') and I'm damned if I don't (emails from people who delight in telling me just how flawed my methodology is). That being said, I have one thing in my favor - the latter group also always delight in telling me how they would never buy my magazine anyway, so – from a purely commercial perspective – I guess I don't. DBT, that is.
Funnily enough, I don't have a problem with DBTs, but I suspect in most cases they lose their special powers once they leave the confines of the designer's tool box. As soon as a product hits the real world, its evaluation by a reviewer should be an analog of its evaluation by a prospective buyer. In some cases, because the reviewer is a prospective buyer. And that evaluation process - flawed though it may be to some - is a sighted listening test.
Alan Sircom
Editor, Hi-Fi Plus Magazine
London, England
editor [at] hifiplus [dot] com
I'm just editing an interview I conducted with Meridian Audio co-founder Bob Stuart in which he talks about the limitations of blind testing with regard to psychoacoustics. Stuart has some fascinating insights into the questions, and draws the distinction between short-term and long-term listening.
The interview will appear in TAS Issue 194 (August cover date, June 23 mail date).
Mr Stuart is a senior member of a company that makes extremely expensive audio electronics. The vast, vast majority of DBT test results show that his products are overpriced by a factor of ten to a hundred, and so are a direct commercial threat to him.
The industry's best commercial response to the DBT threat is to completely ignore it, and just occasionally toss into the ring some excuse why it doesn't apply to his product.
Anything he has to say ont he topic should be examined by journalists with care bordering on cynicism.
I am trying to find your white paper but your link does not get me there. Is there a more up to date link? Thanks.
Rydalc
I've asked the webmaster to fix the link.
When, in my doctoral training, I was learning research methodology, my preceptors gave me a book called "Unobtrusive Measurement". The central tenent of this book was that it was impossible to measure a phenomenon without changing it. Naturally, the more "unobtrusive" the measurement process, the more limited the damage to assessing the actual phenomenon. There is the constant tension in science between two equally valid goals. One is to control the variables in question sufficiently to be able to make the conclusion that it is the variables being assessed that are being measured, rather than extraneous confounds. The other is to assess a real, natural phenomenon, rather than an artificial experience that cannot be generalized outside out of a tightly controlled laboratory situation. This affects all science, even the so-called "hard sciences". When it comes to investigating human perception, emotion and cognition, understanding this tension becomes increasingly important.
It is with this perspective that I evaluate ABX testing as valid proof of the audibility of differences between components. It clearly comes up short. Take the time to consider the difference between ABX testing and how people listen to music at any other time. Clearly, we are talking about very different experiences. Therefore, the generalizability of ABX results is highly limited. The failure of people to detect differences during ABX testing shows that this is not a difference that ABX testing can reveal. No more. No less. Anyone who thinks this type of testing is the last word on the audibility of different audio components is not looking at the entire picture.
BTW JV, if nobody ever asked you how you felt during a drug trial, it obviously wasn't psychiatric medication.
I've asked the webmaster to fix the link.
I think the DBT showing that 16/44 is transparent is fascinating. It strikes me as an excellent test since it successfully excludes other factors. For example, if I compare the SACD layer with the CD layer in a player, I do not know whether I am comparing how well the player performs with SACD vs CD, or whether the mastering of the SACD vs CD was better or worse, or if the volume is slightly different. The 16/44 transparency test bypasses such issues.
I understand that many have dismissed it for one reason or another - eg that the equipment used was not good enough. But where is the study that uses better equipment and arrives at a different conclusion? If the difference is as obvious as you say, surely that would not be difficult to do?
If there is such a study, please point me to it. If there is not, that lack (some time later) is as significant as the original DBT.
Tim
I'm not aware of any formal DBT of standard-resoution digital audio with high-resolution digital audio. The difference between 44.1kHz/16-bit digital audio and 176.4kHz/24-bit is obvious, in my experience. It is, in fact, so obvious that no one (no one that is a disinterested experimenter, that is) has bothered to organize and conduct it.
Your last sentence reminds me of the cartoon by B. Kliban showing a professor at a blackboard full of mathematical equations in front of a classroom in which all the students are fish. The caption is "Proving the Existence of Fish."
For more on the difference betwen standard- and high-res digital audio, and on blind listening tests, see my interview with J. Robert Stuart of Meridian Audio in the August issue (mails on June 23).
Has anyone posting had their hearing tested.
Before anyone complains about the inability of 16bit/ 44.1kHz audio to record and reproduce all the frequencies humans can hear I suggest you get a real hearing test. If you are over 40 you probably have hearing loss and this is a natural consequence of aging. So forgot about hearing frequencies above 20khz.... or even 16khz if you are lucky.
http://en.wikipedia.org/wiki/Presbycusis
Anyone contesting this needs to get a hearing test and then they maybe they can talk about hearing differences from 18khz and up!!
The benefits of high-sample-rate digital audio are not conferred by extended bandwidth, but because the sampling rate determines the time-domain performance. That's because standard-resolution digital audio requires steep anti-aliasing filters in A/D conversion and steep reconstruction filters in D/A conversion. Such steep filters smear transient energy over time. The digital filters for 96kHz sampling are very different from those required of 44.1kHz sampling, avoiding this problem.
See "'Anti-Alias Filters adn System Transient Response at High Sample Rates" and "Controlled Pre-Response Anti-Alias Filters for use at 96kHz and 192kHz" by Dr. Peter Craven, along with "Coding for High-Resolution Audio Systems" by J. Robert Stuart, all available at www.aes.org.
Also see the AES papers on this phenomenon by Mike Storey of DCS in the mid-1990s.
It's a long-established phenomenon.
> The difference between 44.1kHz/16-bit digital audio and 176.4kHz/24-bit is obvious,
> in my experience. It is, in fact, so obvious that no one (no one that is a disinterested
> experimenter, that is) has bothered to organize and conduct it.
Well, and with every respect, it strikes me as less obvious than your fish example. There are sane people who are unsurprised by the Meyer/Moran results. I think there are a considerable number of people who would find such a test interesting; if it is as easy as you suggest I am really surprised that nobody has yet conducted it.
Tim
I have read (most of) this discussion with great interest. Let me state at once that I, as a medical doctor, am a firm believer in blinded tests. And in contrast to Jonathan Valin I believe the blinded tests can be carried out in audio. Dear Valin: You really don’t have to always run drug tests on tens of thousands of patients over several years. Have you ever tried to check for the difference of action on the conduction time of the AV-node by saline and Adenosine? You will have the answer in less than 20 seconds. And you never asked how the patient felt? Have you ever tried to check for side effects of medications? Take betablockers. Patients taking part in studies involving betablockers are almost always asked at regular intervals if they feel muscular fatigue, feet coldness, head ache, have sleep disorders. Unblinded, the answers are usually very different from the blinded. I am very embarrassed by you inappropriate comment.
And dear dr. Gregor Samsa, you dismiss blinding of tests involving perception and emotion? I am astonished. It is very important for such tests to be blinded. Every doctor know – except you two guys, apparently - that bias is extremely important to keep out of those test. Blinding is the only way. I am embarrassed again.
And you, Robert Harley. You say that the results of the old Stereo Review test on amplifiers gave absurd results. How do you know – oh yes, because unblinded you hear a big difference. Therefore, blinded audio tests don’t work, or at least, the test is inappropriate to the situation. Let us look at that test. It started as an unblinded test, and there was a lot of difference between the amplifiers. Then the test continued blinded, and the differences vanished completely. The testing situation was unchanged except for dropping the curtain (blinding). So (and this is also for my two not so dear collegues), was the test valid before blinding and not after? Oh, maybe the test audience became stressed? Everybody knows that you shouldn’t stress audiophiles, because then they loose their ability to hear otherwise obvious differences.
There are tests considering that problem. In 1982, Karl Erik Stahl (Radio & Television, no. 11) published a very elegant study. He designed a test object consisting of 6 cheap Op amps (remember TL 074?), 12 m cheap stereo cable, 10 low-rated and cheap capacitors, and a lot of ordinary resistors. Amplification was 1. Measured THD was reasonably good, but not outstanding, frequency response +/- 0.05 dB. The object was placed between the preamplifier and amplifier of several “golden ears” in Sweden. The tests were set up as AB-tests and took place in their homes, with every hi-fi component in their usual places. A switch controlled a random generator that switched between the test object and a straight wire. The test persons decided when to make a switch, but they never knew if the generator switched the test object in or out. They were allowed to listen for seconds, minutes, hours or days before they did a switch, and make as many switches as they wanted. Nobody, at any time was able to identify the test object (I have a copy of the test if anyone is interested. At least you, dr. Valin should read it. It is in Swedish).
Then we are back to unblinded tests. What do you, Robert Harley, know about Dopamin and Endorphins. What induces their secretion, and what kind of effects do they produce? Typical they induce a feeling of well-being and even euphoria. Smoking stimulates Dopamin production, making it very difficult to quit. Can the sight of a beautiful woman induce production. Certainly. What about a luxurious car? A shining, big expensive amplifier? Obviously! Check this link: http://serendip.brynmawr.edu/exchange/node/2048
Dear Robert. A much better explanation to the results of the Stereo Review test is that the test persons had a rush of Endorphins during the open phase, making it impossible for them to make sensible judgements. It is very much like having sex. There is no other way to avoid such confounding factors than to blind the test. Dear (barely) dr. Valin and dr. Samsa. Please go back to the University and repeat the last four years. And Robert Harley – next time you have sex, turn up the little, cheap Pioneer. It will sound like a Mark Levinson.
Am i wrong? How do you know that?
If listeners were unable to hear the effect of inserting the worst op-amps, capacitors, cables, and other components in the signal path, this suggests that there are no quality differences between any components, and that we should all be listening to the cheapest products on the market. It also suggests that everyone in the world who hears a difference between any two components is deluded by your "rush of endorphins."
I propose that it is more reasonable to believe that there's a sonic difference between a bottom-of-the-line Pioneer receiver and a pair of Mark Levinson monoblocks (to cite the specific products in the Stereo Review test) than to believe in the "mass delusion" theory of audio.
Sorry, forgot to comment the specific points you (Robert Harley) made:
“If listeners were unable to hear the effect of inserting the worst op-amps, capacitors, cables, and other components in the signal path, this suggests that there are no quality differences between any components, and that we should all be listening to the cheapest products on the market”
Not the cheapest, but the cheapest of the properly built products (my first audio product could only be described as a noise generator - but it was cheap).
“It also suggests that everyone in the world who hears a difference between any two components is deluded by your "rush of endorphins."
No, by no means, no. If you hear a difference in properly conducted tests (which usually means blinded), you can trust there is a difference. This difference was also present before and after the blinded test, but now it is proved to be caused by the test object. Then of course, sometimes the difference is big enough to make blinded tests superfluous (picking shades of red when the reds are surrounded by different colors compared to picking red vs white).
“I propose that it is more reasonable to believe that there's a sonic difference between a bottom-of-the-line Pioneer receiver and a pair of Mark Levinson monoblocks"
Scientifically, this statement is a hypothesis. To know if it is true or not, you have to run valid tests. Any unblinded tests are not valid because, with the statement above, you already expressed bias toward the Mark Levinson monoblocks. In fact, this is a hypothesis/test scenario which is very typical in medicine. A new medicine is developed. Based on the present knowledge on biochemistry, it is assumed to be better than previous medications, and it is subjected to tests. Because peoples lives depend on it, the FDA will never allow it if the comparative tests are not completely blinded (double blinded)
The argument that you're putting forth is that more money necessarily equates better sound, even if there is no perceptible proof of said better sound. I think this line of thinking is one fundamental reason why audiophilia is lambasted the way it is.
I'll present another expensive hobby of mine for comparison: photography. No one will make fun of a photographer for spending $6000 on a digital SLR and a $2000 lens that does essentially the same thing as a $100 point-and-shoot, because it's easy to show that the $6000 camera does it better, faster, more accurately, etc. You don't see people marketing a camera that costs $100,000 because it uses exotic components that don't make a difference in anything relevant to photography, because no one would buy the damn thing. So why do people buy $100,000 amplifiers? Or $10,000 speaker cables?
Certainly, more research, better design, and higher-quality components all likely contribute to an objectively better and pricier final product, especially if it's less prone to drifts in manufacturing tolerances, is capable of playing louder, causes less interference, and lasts a lifetime. But if the difference is sonically indistinguishable at sane listening levels, it doesn't sound better.
In this extreme case, I have a little trouble believing that there was no diffence between the Pioneer and Levinsons. Even my skeptical ears heard a pronounced difference between my Yahama RX-V1800 and NAD M3, but at the time I hadn't bothered to level-match them at all, so who knows.
But this would be an interesting and probably pretty easy experiment: package some Belden cable sold by Blue Jeans Cable in an Audioquest Everest wrap and package the Audioquest cable in the nondescript white jacket of the Belden. I am willing to bet that most if not all audiophiles will think the repackaged Blue Jeans Cable sounds better (even though they probably sound the same).
There is a very strong expectation bias, especially if you're shelling out the kind of cash that many audiophiles do. "I spent 3 paychecks on this. It HAS to sound better. Or why would it cost so much?"
The greatest things that editors at audiophile publications can do for our wonderful hobby is to start debunking some of the BS that exists out there. Until that happens, there will always be more people who would rather discredit audiophiles than those who would want to go out and hear for themselves what good sound really is.
a comparison of photography and audio is apples to oranges
one is the reception of info the other is capture of information
and BTW it has been tested that most cannot tell the difference between a point and shoot and a high end DSLR. and a true Phtotographer can take pictures with whatever is handed to them, though they prefer their own epuipment, it is an artform afterall
Ultimately the issue at hand is the fact of a person who cannot listen to a sound unless it comes from a $100,000 sound system, most people just think thats sad and wasteful!
an audiophile in car terms is someone who refuses to drive in anything less than a Rolls Royce.
I would hope that if an audiophile truly loved music, then he would be able to listen to music he loved on whatever equipment was used to reproduce it. Even cheap audio reproduction these days has gotten pretty damn good.
If an audiophile winces at music coming out of a less than a six-figure sound system, then he's missed the whole point of the hobby, which is to enjoy music, not the equipment that is used to play it.
After all, how many musicians do you know use a $1000 equipment cables or $10,000 guitar amps or $100,000 microphones? Why does an "audiophile" need a $100,000 system to reproduce that music competently?
DBT don't provide the results we so deparately crave, ergo the test is flawed.
If on the other hand a DBT did reveal a difference between components/wires/whatever, audiophiles would trumpet the reuslt as if it were a message directly from the Supreme Being (whoever she is).
Oh well. Life goes on.
Larry
"Digital finishes what the transistor began" James Boyk
I should probably tell you a bit more about Stahl’s test. Karl Erik Stahl is the founder of the Audio Pro Hi-Fi company. He is a very competent audio constructor, but later founded Intertex Data AB. He was recently voted one of the world’s “Top 100 Voices of IP Communications". Especially his subwoofers at the time were very advanced. His test panel included some of the most prominent “golden ears” in Sweden. Some were part of the high end audio industry. And one, a designer of a highly rated preamplifier, in fact was able to pick the test object during the first test rounds. Several modifications were made on the test object to try to indentify why he was able to pick it. It then turned out that there was a 0.08 dB dip in the left channel in the treble range, and 0.06 dB in the right channel. This was corrected, and the test person was not able to identify the test object any more. According to Stahl, it was never reported before that it was possible to detect such tiny linear deviations. This test person probably had exceptional discriminative hearing abilities.
You prefer not to believe in the “mass delusion” theory of audio. There is no “mass delusion” theory of audio, there is a “mass delusion” theory of about everything we perceive. You can predict that placing a car seat in a Rolls Royce will feel more comfortable than trying exactly the same seat in a small Honda. You can predict that women will pick a man photographed in front of an expensive Lexus rather than the same man photographed in front – yes, you guessed it – of a small Honda. I know, because that test has been done. An esoteric, expensive perfume smells better than a cheap one, even if it turns out that it was exactly the same perfume in both bottles. You perceive a color very differently depending on which colors surrond it. And yes indeed, an expensive amplifier in brass and brushed steel weighing more than 50 kg will, if sighted, (almost) always outperform a cheap Pioneer.
Read the Stereo Review test again. Even the sceptics were beginning to hear differences during the open part. Nobody is immune to this effect. And, as pointed out higher up in the thread, it is much lager than commonly believed. Or as Siegfried Linkwitz puts it, “it is very difficult not to hear what you expect to hear”. Dopamin and/or Endorphine activity is clearly not the single reason for this dramatic effect, but they obviously play a part.
So, Robert Harley, you are very naïve in so completely disregarding this effect. I doesn’t matter if you are a very trained listener. As some magicians put it; professors are more easily fooled than ordinary people.
Dr. Skjaerpe:
I think you will enjoy an article I co-authored a few years ago: http://www.dagogo.com/Borden07154.html
Regards,
Larry Borden
"Digital finishes what the transistor began" James Boyk
Thank you for the link. Very nice article. You dug a bit deeper than me. Even more could be said about these complex, behavioral relationships, and there is problably a lot more to be learned. The real danger is when one believe ones decisions are based solely on sound reasoning and unbiased perception. One will be an easy target for people who know how to manipulate biasing factors.
Here are the views of a reviewer who believes in blind testing:
http://www.goodsound.com/editorial/200905.htm
"Digital finishes what the transistor began" James Boyk
Blind testing often reduces the results of groups into averages. It does not allow the members in that group to be more "skilled" than others, as the results are averaged in with any "unskilled" persons.
This causes a problem, as we cant tell if some people "passed the test" ( for eample guessing the "better" amplifier 5 out of 5 times) or whether they are a "fluke", because if other people didn't get it right (0 out of 5), this takes the group average down to 50/50.
If we got a group of runners and asked then to do the 100m's dash A one did it in 9.8 second while the other stumbled in around 15-20 seconds, we would not average his result in assuming there can be no fastrer runners, as the group average was much lower, nor would we assume his race result was a fluke and that normally he would run 15 seconds or more.
Yet this is what double blind ABX tests are inescapably designed to do, reduce individual results into a group score.
To take a medical example, if a drug test was carried out on a hundred people to see if i had bad side effects, and 50 people died but 50 people did not, would that prove there were no side effects?
Buts thats how many audio ABX suppoters are interprating the results of various tests knocking around.
Instead it tells us that 50 people reacted differently to the other 50 , not WHY.
That is of course correct if you are doing group test. But there is no problem in doing individual tests, like the test I refferred to by Stahl. In that test each test person repeated the test several times, and no single "golden ear" was able to pick the test object. The test person was sitting in his home, with his familiar high end audio equipment in it's usual place with no disturbing group in the same room. Statistics were run on individual persons, and for each person the result was like tossing a coin.
Your medical test example has to be specified a bit more. If it concerns infectious meningitis where death rate untreated is close to 100%, then saving 50 would be a good result. Of course there could be side effects, and one person might die of an allergic reaction to the drug. One would still be happy to have the new drug. If the test was about treating migrene, which has no mortality, 50% death rate would indicate catastrophic side effects.
"for each person the result was like tossing a coin."
Or each person getting it right 50% of the time.
If I took an exam and scored 50% does that mean I got the questions right 50% of the time, or it was blind chance and I really didnt have a clue?
The medical test is an example to show that 50% scores can actually mean more than just chance.
The actual percent score is actually not that important, is the way the score is collected.
For example, on the exam if I ticked boxes for multiple choice answers and got 50% right, then its is possible it was just chance.
If on the other hand i had to specifically answer question in sentances, and got 50 % right then its more likely it was skil.
The same with the medical example, if 50% just said they felt ill, that could be chance, if 50 percent die then its more likely, something to do with the test cause the deaths.
And again the test conditions are important, you can not just label a test double blind ABX and say it covers everything, there ais often a lot more involved in setting up a test from which you will find useful results.
in other words, how the percentage figure is reached determines whether it could be down to chance or not, not the actual figure itself.
People need to understand satistics and probability better.
You are mixing up different test situations quite severely. There are several books on basic statistics that will clearify things for you.
No the point is other people are mixing up test situation and I'm demonstration results are different even when on the surface they sound the same .
There's actually research that has correlated perception of wine quality with the claimed price. Identical wines were perceived as "better" when they were identified as more expensive. This research went further and was using mri scans demonstrated changes in brain activity correlated to the claimed wine price. I would suggest that your belief that unbiased listening tests are less valid the unbiased listening tests would demonstrate the same changes. People are hardwired to believe more expensive is better.
http://www.pnas.org/content/105/3/1050.full
That whole test fails here'
"In every other trial, subjects were instructed to enter a rating of either flavor pleasantness or taste intensity. Thus, a total of four pleasantness ratings and four taste intensity ratings were sampled for each liquid. We used a six-point rating scale (1 = do not like it at all/not intense at all; 6 = like it very much/very intense) (see Fig. 1 B). The timing of rating trials was identical to nonrating trials, except that, after swallowing the liquid and before rinsing, subjects were given 6 s to enter their ratings."
They where using subjective scores to obtain an objective result.
What the hell is pleasentness or taste intesity? These are open to interpretation by the subject and could easilly mean different things to different people.
The point being that although humans can be fooled into thinking something is better, it is only when very basic subjective criteria is used to measurea quick responce and when a person does not have a great deal ok knowledge on the subject.
For a person more experineced with wine those subjective measures used would have been meaningless in dertermining which wines where "better".
The very fact that people buy expensive wines, and other objects and do not rate then above cheaper products shows we are only "fooled" to a limited degree.
I've enjoyed some lovely cheap wines and expensive one, and also some nasty expensive wines and just as nasty cheap ones. A lot of the time i didnt even see the bottle and was able to identify the type of wine, "pleasentness and taset intensity" didnt even have any bearing on it.
I think what many of these double blind test are finding is that the way people describe their experiences is what is flawed, and not the test objects (wine , hifi etc) themselves. Often people group the whole sensation experience together when giving a subjective response to a specific event, and then confuse that whole experience with one outstanding part.
That is why time is often needed in tests, so that we can eliminate the imediate responce of the subject which could be confused by other stimulus.
Reading review after review I always seem to encounter a passage that says to the effect of "This sounds great, and in fact comes awfully close in performance to products that are far more expensive" and then the reviewer will use all of these pseudo-descriptive, synesthetic words to talk about why that more expensive product costs as much more as it does.
Makes me wonder how that review would be written if the reviewer wasn't able to see what he was listening to.
To me, the really important thing is "does this sound like a live performance"? Too much time is wasted on minuscule or nonexistent differences. Even the "best" of whatever a convinced subjective listener prefers does not sound like a live performance. The major reason for that, when good equipment is used, is usually the lack of convincing ambience, or acoustic space.
The Swedish tests you report do appear from your description to be of poor quality. But in general, double blind tests are never accepted by those who "know" what they are hearing. Indeed, they do know what they are hearing in nonblinded tests, and, as psychologists well know, that is precisely why they are biased by it. But they do not understand that. Liquor lovers will not accept that expert tasters cannot distinguish between vodkas on ice. Religious people will never accept arguments against their beliefs. Lovers of $100 per foot cable will never accept that it makes no difference. Whole technologies (SACD, DVD audio) are developed on the fallacy that 22 kHz is not enough for digital recording. Back in the day, lovers of moving-magnet pickups (or was it moving-coil) never accepted that they liked their sound because of the high-frequency peak that such pickups routinely used to have. Back in the day, lovers of tube amps never realized that their "warm" sound was due to second harmonic distortion (which, for a short period, was indeed better than the third harmonic distortion in the very earliest solid state amps) and perhaps also to a mild midrange peak. Bob Carver used to be able to deliberately add distortion to his transistor amps to make them sound like specific tube amps. Back in the day, some were convinced that reproduction up into the ultrasonic (above 15kHz or so, or 10kHz for audiophiles who mostly are, like me, middle-aged males) was audible, even when the recording equipment of the day was unable to capture any of that on the source material. So long as a religion is harmless, like audio, that's OK. It's best not to try to convince others.
Even if miniscule differences can be detected, it's the significant differences that matter. These include: loss of accuracy in mp3 codecs (which most of the newer generation, sadly, considers the acoustic norm), inner-groove distortion, printthrough, and other limitations in LP's, and, as I mentioned at the beginning, the lack of ability to reproduce a convincing acoustic space. Yamaha and others have made good progress in the latter, but the best we have at the moment is usually to ADD an acoustic space on top of the listening room. We are some way away from being able to REPLACE the listening room's acoustic (indeed, multitrack "Dolby 7.1" and its ilk is a step back, as it does not have acoustic realism as an objective, but usually is intended to thrill teenagers).
Some audiophiles seem to like the artificial sound of "pure" 2 channel stereo, and eschew multiple speakers. But the old experiments way back in the 1930's showed that more than 2 were better, but 2 became the standard for practical reasons. Perhaps a really pure audiophile should consider going back to just one speaker : ) The Beatles in mono are coming out on 9/9/09.
Do learn about Fourier transforms, and about psychology. And about listening rooms.
Do have fun with your hobby. If you enjoy the sound better with a tube amp, an ultrasonic reproducer, a thick cable, a green line around the rim of the CD, or whatever, that's fine with me. I don't want to spike your balloon.
I agree with almost all of what you are saying, except for the comment on the Swedish test. It is probably the most intelligent test set up in audio I have seen, taking into consideration most of the audiophiles' complaints about other set ups (not being stressed by a rigid set up, short vs. long listening intervals, using equipment they know (except for the test object)).
Concerning equipment, I also agree with you that it is the significant differences that matters - which today mean lodspeakers. Therefore some ballons shoud be punctured, like the salesman's balloon when he is trying to convince a customer to spend most of his savings on a shining, big amplifier instead of investing it where it counts most - the loadspeakers
Thank you for you comments. You are of course right to add loudspeakers--they are still a major weakness. We've only advanced a little from the 1960's when Acoustic Research developed speakers with darn-near flat frequency response. Today's separate subwoofer systems often move away from that goal; many have astonishingly large irregularities in frequency response, for which there is no excuse. Speaker development is not encouraged by today's magazine reviewers who "test" loudspeakers by listening to how well they reproduce explosions and crashes in Hollywood movies.
I also agree with your comments re. salesmen, etc. And I was only trying to be kind about the criticism of the Swedish test!
I don't recall any reviewers for The Absolute Sound judging loudspeakers by using "explosions and crashes in Hollywood movies." Our reference standard is the name of the magazine: The Absolute Sound---the sound of acoustics instruments in an acoustic space.
Biases permeate our life, the only way to disable the (subconscious) application of biases is to perform a double-blind study. There's such an incredible body of scientific work, on all aspects of life, that demonstrates the power and resilience of biases and participant expectations. To suggest that it's not a valid methodology for reviewing audio-kit is utterly absurd.
For more on this subject, see my interview with Meridian's Bob Stuart in the current issue of The Absolute Sound (August cover date).
I'll have a look when it finally reaches Australia, maybe in October. I will be interested to see whether your contribution to the article shows the slightest respect for the input to this blog by people with a good understanding of the value of DBT in the realm of audio.
Mr Stuart is a senior member of a company that makes extremely expensive audio electronics. The vast, vast majority of DBT test results show that his products are overpriced by a factor of ten to a hundred, and so DBT's are a direct commercial threat to him. After all, the industry's best commercial response to the DBT threat is to completely ignore it, and just occasionally toss into the ring some excuse why it doesn't apply to their products. Anything Mr Stuart has to say on the topic should be examined by journalists with care bordering on cynicism. Your interview is an ideal opportunity to show your critical view.
However, having said that, I am aware that most audio journalists would see employment by a high-end audio company as a credible career path option, so I don't expect even the slightest crticism or challenge from the interviewing journalist.
Dear Robert,
I'm a university professor writing on the MPEG tests. I found your story quite interesting, but I can't seem to track down a source for that Bart Locanthi tape you mention. Where might I find a copy? Or is it written down anywhere else besides your editorial? How did MPEG people respond to Locanthi's findings?
Thanks. I assume you can look up my email if you'd prefer to respond privately.
Yours,
An interested reader.
Jonathan:
The Bart Locanthi tape was played during a workshop on low-bit-rate coding at an Audio Engineering Society convention. It's been a long time, but if forced to guess as to the time and location, I would say the convention was in Los Angeles between 1993 and 1996. It might be possible to look at the programs of past conventions at www.aes.org and figure out which convention it was from the presence of the workshop. You can then buy recordings of the workshop from the AES. If I recall correctly, Ron Striecher was the workshop chairman. Incidentally, Locanthi formed an ad hoc committee within the AES to independently evaluate (through listening tests) perceptual codecs. He and others in the Los Angeles audio community were concerned that standards were being set without adequate vetting through critical listening. This was probably prompted in part by the hubris of the developers of MP3 (Karlheinz Brandenburg in particular) who used phrases such as "psychoacoustic redundancy" and "informational irrelevance" in his papers, and who seemed to have a complete disregard for the effect these codecs had on the listening experience.