The following is my editorial from The Absolute Sound Issue 183 (not yet published) on blind listening tests.
The Blind (Mis-) Leading the Blind
Every few years, the results of some blind listening test are announced that purportedly “prove” an absurd conclusion. These tests, ironically, say more about the flaws inherent in blind listening tests than about the phenomena in question.
The latest in this long history is a double-blind test that, the authors conclude, demonstrates that 44.1kHz/16-bit digital audio is indistinguishable from high-resolution digital. Note the word “indistinguishable.” The authors aren’t saying that high-res digital might sound a little different from Red Book CD but is no better. Or that high-res digital is only slightly better and not worth the additional cost. Rather, they reached the rather startling conclusion that CD-quality audio sounds exactly the same as 96kHz/24-bit PCM and DSD, the encoding scheme used in SACD. That is, under double-blind test conditions, 60 expert listeners over 554 trials couldn’t hear any differences between CD, SACD, and 96/24. The study was published in the September, 2007 Journal of the Audio Engineering Society.
I contend that such tests are an indictment of blind listening tests in general because of the patently absurd conclusions to which they lead. A notable example is the blind listening test conducted by Stereo Review that concluded that a pair of Mark Levinson monoblocks, an output-transformerless tubed amplifier, and a $220 Pioneer receiver were all sonically identical. (“Do All Amplifiers Sound the Same?” published in the January, 1987 issue.)
Most such tests, including this new CD vs. high-res comparison, are performed not by disinterested experimenters on a quest for the truth but by partisan hacks on a mission to discredit audiophiles. But blind listening tests lead to the wrong conclusions even when the experimenters’ motives are pure. A good example is the listening tests conducted by Swedish Radio (analogous to the BBC) to decide whether one of the low-bit-rate codecs under consideration by the European Broadcast Union was good enough to replace FM broadcasting in Europe.
Swedish Radio developed an elaborate listening methodology called “double-blind, triple-stimulus, hidden-reference.” A “subject” (listener) would hear three “objects” (musical presentations); presentation A was always the unprocessed signal, with the listener required to identify if presentation B or C had been processed through the codec.
The test involved 60 “expert” listeners spanning 20,000 evaluations over a period of two years. Swedish Radio announced in 1991 that it had narrowed the field to two codecs, and that “both codecs have now reached a level of performance where they fulfill the EBU requirements for a distribution codec.” In other words, Swedish Radio said the codec was good enough to replace analog FM broadcasts in Europe. This decision was based on data gathered during the 20,000 “double-blind, triple-stimulus, hidden-reference” listening trials. (The listening-test methodology and statistical analysis are documented in detail in “Subjective Assessments on Low Bit-Rate Audio Codecs,” by C. Grewin and T. Rydén, published in the proceedings of the 10th International Audio Engineering Society Conference, “Images of Audio.”)
After announcing its decision, Swedish Radio sent a tape of music processed by the selected codec to the late Bart Locanthi, an acknowledged expert in digital audio and chairman of an ad hoc committee formed to independently evaluate low-bit rate codecs. Using the same non-blind observational-listening techniques that audiophiles routinely use to evaluate sound quality, Locanthi instantly identified an artifact of the codec. After Locanthi informed Swedish Radio of the artifact (an idle tone at 1.5kHz), listeners at Swedish Radio also instantly heard the distortion. (Locanthi’s account of the episode is documented in an audio recording played at workshop on low-bit-rate codecs at the 91st AES convention.)
How is it possible that a single listener, using non-blind observational listening techniques, was able to discover—in less than ten minutes—a distortion that escaped the scrutiny of 60 expert listeners, 20,000 trials conducted over a two-year period, and elaborate “double-blind, triple-stimulus, hidden-reference” methodology, and sophisticated statistical analysis?
The answer is that blind listening tests fundamentally distort the listening process and are worthless in determining the audibility of a certain phenomenon.
As exemplified by yet another reader letter published in this issue, many people naively assume that blind listening tests are somehow more rigorous and honest than the “single-presentation” observational listening protocols practiced in product reviewing. There’s a common misperception that the undeniable value of blind studies of new drugs, for example, automatically confers utility on blind listening tests.
I’ve thought quite a bit about this subject, and written what I hope is a fairly reasoned and in-depth analysis of why blind listening tests are flawed. This analysis is part of a larger statement on critical listening and the conflict between audio “subjectivists” and “objectivists,” which I presented in a paper to the Audio Engineering Society entitled “The Role of Critical Listening in Evaluating Audio Equipment Quality.” You can read the entire paper here http://www.avguide.com/news/2008/05/28/the-role-of-critical-listening-in-evaluating-audio-equipment-quality/. I invite readers to comment on the paper, and discuss blind listening tests, on a special new Forum on AVguide.com. The Forum, called “Evaluation, Testing, Measurement, and Perception,” will explore how to evaluate products, how to report on that evaluation, and link that evaluation to real experience/value. I look forward to hearing your opinions and ideas.
Robert Harley
Seek out a mentor whom you can shadow to learn the ins and outs of your proposed career. This person should be able to honestly communicate the advantages and disadvantages of your career choice while guiding you through each step along the way. Anytime during but especially near the end of your educational training, you should check into available apprenticeships in your chosen field. Ask questions and gather information.
Visit Now :- Norfolk Jobs
@mcdo "I still contend that if one component sounds better than another, or even just different, then this difference should be audible even to a listener who isn't told what he is currently listening to. If differences cannot be reliably detected, then for all practical purposes they don't exist."
A little late, but it depends on the listener. As an anolgy, a painter looks at a subject or a painting and notices subtle variances of colors, whereas I would not, unless I were to make a coscience effort to do so. In audio, some of us are like the artist, we have spent a life time listening to the subtlety of what we hear. Most people do not do this.
As an example of this, I was at a night club listening to a very good band and out of curiosity I began listening to the sound reflecting off the wall behind me. With a little concentration, it was very easy to focus on the reflected sound which was clear and distinct from the primary sound. I asked my date to listen to this, but she couldn't hear it. I rather doubt that anyone else was aware of these reflections. I don't have "golden ears" and there is nothing special about me, but I suppose that my "lifetime" of listening made a difference.
The point being that we hear, but we do not listen.
Also, our hearing is like our sight. While we can see a wide field of view, we can only focus on a single object within our view. As an example, people were shown two videos. Both videos wre identical except that one had an extra person in it. Almost no one noticed the difference bewteen the two videos even though there was a distinct difference. The same applies to audio. If we are not focused on what changed, we will not notice it. Still, it is possible that subconsciously we are a aware of a difference, but are not sure what it is.
Another reason why DBT are not valid is because most of us know the characteristic sound of our own system and DBT's are usually given on a different system. As an example of why this is important, consider your car. Many times, when our car makes a "strange" noise, no one can hear it but the owner. They become accustomed to what sounds their car makes and they are acutely aware of anything out of the ordinary.
The editorial brings up some important issues with blind listening tests, but they are easy to cure. I use blind listening tests all the time and use a technique that pretty much eliminates the flaw Mr. Harley describes. I simply have the listener first listen to the 2 sound sources being compared knowing which is which. Give them as much information as they want or need to help them hear any differences that should be present, things like: manufacturers claims, expert testimonials, technical details etc. The listener can take as long as they want to identify for them selves what differences they believe they are hearing and could identify again. They are welcome to take 5 minutes or 5 years it doesn't matter. Once the listener is confident they can identify an audible difference, then have them listen blind and see if they can correctly identify which is which 100% of the time. If they can't, than the difference they were hearing before was the result of an expectation bias and NOT a real perceivable difference.
Throughout the research and testing I did for the AES presentation I gave at the 129th AES convention on comparative listening, one thing came up over and over again. There can be a huge disparity between blind listening and non-blind listening. I was typically working with experienced audio engineers that make their living making critical listening decisions all day. We were comparing different ADDA converters, sample rates, external clocks, analog circuitry etc. It happened again and again. Listeners would claim to hear all kinds of dramatic differences while doing non-blind listening and then NOT be able to identify which was which once they were listening blind. Expectation bias is profoundly influential to sensory perception. If you really want to know what you can REALLY hear or not hear, properly conducted blind listening tests are the only way to even have a chance of finding out.
The problem is that there is not much incentive for people to go through or accept a process that proves they can not hear all the things they thought they could. I encountered similar stubbornness from some of the test takers that I feel is present in Mr. Harley's editorial. after the blind listening results were presented they would start making excuses and try to figure out ways to discredit the test. Something wrong or unfamiliar with the monitoring, something inadequate in the signal path masking the difference blah blah etc. Of course, none of those things were issues when they were doing the non-blind listening and claiming to hear all kinds of differences. I get it, I've been making records for 25 years and have a high expectation of my own critical listening skills. I was initially very uncomfortable when I first discovered this disparity between non-blind and blind listening. I was being forced to embrace the reality that I couldn't hear all of the things I thought I could and didn't like it. Ultimately, I am much happier functioning more in sync with the truth about the threshold of perceivable differences with my hearing.
It feels good to live in the world were you can hear all kinds of extraordinary subtle differences in audio, its better to live in a world where you know the truth.
EV
One does not have to identify a difference 100% of the time to prove there's a difference. One only needs to do so more frequently than one would if one was randomly guessing. To achieve statistical significance when the endpoint is clear (i.e., death in a medical study) is easier than when the endpoint is determined by a test panel that is not 100% sensitive. In other words, if a listening panel is asked to identify audible distortion, it makes all the difference how accurately they can identify them when they actually exist (i.e., sensitivity and specificity of the listening panel). Unfortunately, the Swedish Radio study demonstrates that such panels are not very sensitive (0% actually in that study). The differences we audiophiles care about are often small. The small differences in sound, however, can result in significant differences in the musical experience. These blinded studies which ask low sensitivity test panels to identify small differences will always be negative not because the differences don't exist but, rather, because the studies are underpowered to detect them. It's basic statistics.
It is true that different people will have different thresholds of perceptibility with their hearing that coincide with various factors such as expertise or hearing loss due to age or abuse. I think the Swedish Radio study can only be characterized as negative or positive based on the criteria they were testing for. If Swedish Radio was testing for the sensitivity of audiophile experts to their codec than the test most certainly would skew negative. If Swedish Radio was ONLY interested in what the average Swedish radio listener could hear when presented with 2 unidentified sound sources with no instruction of what they were listening for, then the results are accurate. We have no data of the demographics of the test group. There may well have been some audiophiles in that group who did not hear the difference simply because they didn't know what they were listening for. I would also be curious to know if any of the 0% test subjects that could hear a difference when it was being pointed out to them, could then identify it blindly after that or not.
It is also true that in most blind test contexts any percentage above random chance indicates a statistical trend. In the realm of audio, the descriptions that often accompany subjective listening experiences are quite dramatic. If a listener is describing a subjective listening experience as having "significant differences in the musical experience", I think it is reasonable to expect that listener to be able to readily identify those "significant" differences 100% of the time. The hyperbole that frequently gets thrown around when evaluating audio gear can be confusing and misleading. I think properly conducted blind listening tests can help reduce the confusion.
ev
What's your take on this...?
When I have tested wires or equipment (and heard a difference) I would ask other people (who heard a difference) to describe what they heard. In all cases, their description of what they heard paralleled mine.
As for expectations, I've tried expensive wires and DIY wires on my friends system with both of us expecting improvements over the lamp cord he was using. In the end, we both agreed that his lamp cord was the best for his system which was contrary to expectations. This is only one example of this phenomena. This doesn't mean that expectations don't influence people in some cases.
Many years ago I suggested a different type of test to replace the DBT. Most people claiming to hear differences do it at home, on their own equipment, and in their own time. As an example, I suggested that if they claim that two wires sound different, that someone disguises the wires to look the same and add a third wire as a comparison. The third wire would be identical to one of the two wires being compared. If they could tell which two wires were the same, then it would support their claim. The advantage of this is that only one factor would be changed compared to the multitude of changes in a DBT. Their system would be the same, their testing procedure would be the same, and the wires being tested would be the same that they had made the initial claim for. In comparison, DBT's change too many conditions (variables) which is hardly scientific. For instance, during a DBT the person taking the test is "under the gun" and hence, this psychological pressure can affect their decision. If you have any doubt that this makes a difference, then compare it to taking a test in college. A persons apprehension during a test in school can greatly affect their performance. I think that statement would be accepted without question.
It is difficult to say what the predominant influence was in the subjective experiences you and the other people were describing when evaluating the same wires or equipment. It is equally possible that you and the others were all hearing the same actual audible differences or being similarly influenced by manufacturer's claims printed on the packaging. There is no way to even have a chance of knowing without blind listening.
Based on the neurological studies I have read, psychological influences are always affecting our sensory perception. The question isn't weather or not they are there, but what is the predominant influence. In some cases there could be an actual audible difference that is so close to the threshold of human perceptibility that it gets overwhelmed by psychological influences. There are also times when the audible differences are well within the realm of human perception and no amount of psychological influence can overcome it. So basically one will supersede the other. Here is a good example of how that works. Back in the 70s a research scientist named McGurk discovered a somewhat astonishing thing while studying speech recognition. He discovered that human beings do not only use their hearing to recognize speech, they also use their eyes. Here is the astonishing part. When recognizing speech, the information we are receiving from our eyes will supersede the information we are receiving from our ears. If you have any doubt about the human brains ability to MASSIVELY manipulate and alter our perception of what we are hearing, watch this video. http://www.youtube.com/watch?v=ypd5txtGdGw. I just about fell off my chair the first time I saw this. I now always close my eyes when doing any sort of critical listening.
The test you are recommending is typically referred to as the "ABX" style blind test. They are great and more directly address the issue of weather or not someone is actually hearing a difference. I agree that people will have the best possible chances of hearing very subtle differences when listening in their own very familiar listening environment. I have also witnessed things that would tend to support the idea that the anxiety of being tested can be distracting enough to degrade one's level of perception. I have found for myself, now that I have taken countless blind listening tests that it no longer feels like a test for me any more. It is just a process I use to clearly identify what I am hearing or not hearing. Blind listening tests for some have a somewhat negative stigma. Sometimes people abuse blind listening tests to try and embarrass people or try to prove that their hearing isn't as good as they say it is. I think it is a mistake to use blind listening tests to try and prove what people CAN'T hear. I am trying to use them to help improve the sensitivity of what I or others CAN hear. Being able to accurately identify where one's level of sensitivity is, helps in that process.
ev
Kudos to EV. His posts are the most informative I've read regarding the blind testing brouhaha. Many here may be aware of the blind test that Mike Lavigne conducted in his home with Monster speaker cables vs. his Transparent Opus MMs. The way the test procedure was described, it pretty much mirrored what EV describes...listening to both cables unblinded and getting comfortable with the "obvious" differences, then blinded. Mike is what most folks would consider a hardcore audiophile with extensive listening experience. Bottom line, he could not consistently discern which was which. No doubt many have poked and will poke holes in the blind test process that Mr. Lavigne employed and that's fine. If the blind test supporters, the objectivists, actually win the age old argument it will pile more ballast on the USS High End...a ship that's already taking on water.
So much music...so little time.
Robert Harley: After announcing its decision, Swedish Radio sent a tape of music processed by the selected codec to the late Bart Locanthi, an acknowledged expert in digital audio and chairman of an ad hoc committee formed to independently evaluate low-bit rate codecs. Using the same non-blind observational-listening techniques that audiophiles routinely use to evaluate sound quality, Locanthi instantly identified an artifact of the codec. After Locanthi informed Swedish Radio of the artifact (an idle tone at 1.5kHz), listeners at Swedish Radio also instantly heard the distortion. (Locanthi’s account of the episode is documented in an audio recording played at workshop on low-bit-rate codecs at the 91st AES convention.)
Ok: I don't know much about audio, but I do know that I would fail any programmer or analyst job interview candidate who couldn't label and explain - without hesitation! - Mr Harley's profound logical error.
It's called cherry picking - finding a single example that "proves" your point and using it as if it is conclusive evidence for a general rule. E.g. a Scandanavian exchange student at school was gay and beat you up when you tried to make fun of him, so everyone in Scandanavia is violent and gay. It's favourite tool in racist arguments of all kind (not just against Scandanavians...) and of drug and vitamin companies (you run ten pointlessly small test groups and use the couple that are favourable you, throwing away the rest.) It's probably the most famous logical error there is, and anyone who can't recognize it probably shouldn't be commenting on blind test regimes without doing some more reading on basic scientific methodology....
I was not able to find the author's paper, but let me state of all the books, papers, posts I have seen,
only two variables are addressed in dbt/abx testing, sight (go blind) and manufacturer.
Unfortunately, medical science is not addressed (my sis is deaf and her ex is an MD), which
includes other variables, such as cochlea fatigue and habituation to stimuli. By not addressing
these and other variables, dbt/abx testing will automatically skew the conclusion towards no sonic difference.
Repeatability will also create the same mistakes and skew the responders individual responses.
As a design engineer I just cannot accept such inaccurate, non-scientific tests. I wish I could.
Why should cochlea fatigue and habituation to stimuli be different in blinded and nonblinded tests, and how could they skew test results?
XXXX
For two reasons TSK,
1) The conditions are quite different when in store VS performing an audio dbt/abx test. Science definitions, requirements are
that the conditions must be exactly the same for the results to be binding. This is a basic scientific definition.
2) We are focusing on the accuracy of a dbt/abx test. We all know the weaknesses of sighted listening, but
no one wants to focus on dbt/abx testing problems. Luckily Mr. Harley has. By the way, it is the audio dbt/abxers
who have to prove their test is accurate. Claiming repeated high degree of confidence means nothing.
Check out the "expert's" websites, posts, and one will not find any mention of variables except sight
and not knowing the manufacturer. Other variables, such as cochlea fatigue and/or habituation to stimuli
skew the responders responses toward 50/50 response, in otherwards just guesses. And one can have a
so called high level of confidence even though the responders skewed their responses. However, the test is false,
not scientific. Repeating the test with the same or other variables not covered does not make the tests accurate.
It does make for good marketing though.
Well, what about peers you ask?
If the final test does not mention, cover, other variables, then the peers have not addressed the problems.
A key point is that if no one addresses other variables, all the variables, whether not knowingly or with full knowledge,
one will automatically perform the test incorrectly, not scientifically, and arrive at false conclusions, even with a high
level of confidence. Let me state that again a little differently.
If one does not know about other variables, all the variables, one will automatically perform the test incorrectly
and skew the responders responses toward no sonic difference.
It does not take much to skew or manipulate the outcome, conclusion of a dbt test.
I take the liberty to copy part of one of my earlier posts in this thread:
"In 1982, Karl Erik Stahl (Radio & Television, no. 11) published a very elegant study. He designed a test object consisting of 6 cheap Op amps (remember TL 074?), 12 m cheap stereo cable, 10 low-rated and cheap capacitors, and a lot of ordinary resistors. Amplification was 1. Measured THD was reasonably good, but not outstanding, frequency response +/- 0.05 dB. The object was placed between the preamplifier and amplifier of several “golden ears” in Sweden. The tests were set up as AB-tests and took place in their homes, with every hi-fi component in their usual places. A switch controlled a random generator that switched between the test object and a straight wire. The test persons decided when to make a switch, but they never knew if the generator switched the test object in or out. They were allowed to listen for seconds, minutes, hours or days before they did a switch, and make as many switches as they wanted. Nobody, at any time was able to identify the test object".
The settings in this test were identical to the test objects original setup except for the "black box" which was switched in or out randomly. Can you come closer to a non-biased setup? Of course you can bias a blinded setup, but there are a lot of very cleverly set up tests. All give the same results provided that some basic qualities of the test object are met (see one of my earlier postings). You are talking about variables as if millions exsisted. That is completely untrue. In fact, it is not too difficult exlude variables. Check the Stereo Review test again. To avoid cochlea fatigue, use short test periodes. On the other hand, do you really think cochlea fatigue is a problem? In that case high end equipment is worth their price only during the first few minutes. In the middle of a symphoni, when fatigue hits your ears, you could as well listen to your kitchen radio.
It is obvious in music, an event over TIME, that listening or experiencing small snippets of music or tones is ineffectual and short-sighted, in every conceivable way. The 1982 test above cites no music whatsoever, and people can only show the mettle of their listening prowess by listening to MUSIC. 'To avoid cochlea fatigue'... that's rich. The very test process cited is to fatigue the ear and brain.
A test object or a tone is NOT music. Music is what the ear and brain of man understand intrinsically. And there are millions of variables in a 20 minute song. To ask us to exclude these variables is to reduce music to simple tones, and therefore no longer music. Asking a human, in any time frame, to test anything else other than music, is pointless and wrong. Without listening to music over an extended time, in a familiar surrounding, with familiar controls against the variable, we have nothing about nothing.
-Glotz
Sorry for the phrase "you don't seem to know what you are talking about". It was not polite. My point was to counter your claim that "no one wants to focus on dbt/abx testing problems". A lot of persons have done just that and have tried to perfect the test situation. See my example from Sweden
Unfortunately, if you had read and understood my previous post, you would not have posted what you did. Let's take a look.
""In 1982, Karl Erik Stahl (Radio & Television, no. 11) published a very elegant study. He designed a test object consisting of 6 cheap Op amps (remember TL 074?), 12 m cheap stereo cable, 10 low-rated and cheap capacitors, and a lot of ordinary resistors. Amplification was 1. Measured THD was reasonably good, but not outstanding, frequency response +/- 0.05 dB. The object was placed between the preamplifier and amplifier of several “golden ears” in Sweden. The tests were set up as AB-tests and took place in their homes, with every hi-fi component in their usual places. A switch controlled a random generator that switched between the test object and a straight wire. The test persons decided when to make a switch, but they never knew if the generator switched the test object in or out. They were allowed to listen for seconds, minutes, hours or days before they did a switch, and make as many switches as they wanted. Nobody, at any time was able to identify the test object".
The settings in this test were identical to the test objects original setup except for the "black box" which was switched in or out randomly. Can you come closer to a non-biased setup? Of course you can bias a blinded setup, but there are a lot of very cleverly set up tests."
---------
Now tell us how many and which, all the variables the author addressed.
What other conclusion did you expect?
Cheers.
ps. If you felt the phrase was inappropriate, which it is, why did you not just edit it out?
Please, could you explain to me which variables could play a role in, and invalidate Stahl's setup?
Re-read my previous posts and check with your example.
Cheers.
Maybe I am the ignorant now, but I don't find any variables mentioned in your previous posts being pertinent to Stahl's experiment. Again, please explain which variables you are talking about.
I see you have not removed your insult, which is easy to perform. :)
Your comment "pertinent" is inappropriate. I again suggest you re-read my previous posts listing a couple
of variables involved in audio dbt/abx testing and see if the author even mentions them,
let alone addresses them.
Cheers.
My offending post is now edited. Concerning variables, do you mean cochlea fatigue? That is not a variable. It is a constant in that it applies to both sighted and blinded test. Not? Why not? Please, educate me.
And why on Earth is using the word "pertinent" inappropriate? But let me rephrase the sentence to: " -- but I don't find any variables mentioned in your previous posts being relevant to Stahl's experiment". Ok?
Yes it does affect both, blinded is also affected. We are interested in blinded studies.
If your quote is accurate to the article:
"The tests were set up as AB-tests and took place in their homes, with every hi-fi component in their usual places. A switch controlled a random generator that switched between the test object and a straight wire. The test persons decided when to make a switch, but they never knew if the generator switched the test object in or out. They were allowed to listen for seconds, minutes, hours or days before they did a switch, and make as many switches as they wanted. Nobody, at any time was able to identify the test object".
If an accurate quote from the article, the author mentions nothing of any variables, because he does not know of any. In fact, the participants will also know of none, and automatically fail the test. Where are all the records, both inclusive and individual? What was the spl levels, time durations, how many ABs per individual, how often? These are just a few. Where is the rigor? I worked in the lab, taught/tudored electrical engineering students, tested military communications etc.
I hate to be harsh, but so far, this test is anything but scientific. It is worth no more, probably less because of automatic skewing, than sighted listening. I sure would like to see the rest of this test. Can you post the entire article on a webpage and link it to us?
Cheers.
Here is a link to the pdf-files of the test: http://dl.dropbox.com/u/10878676/Hur%20l%C3%A5ter%20dom.pdf. I told you it was from Sweden, and they are in Swedish. I should probably tell you a bit more about who designed the test. Karl Erik Stahl is the founder of the Audio Pro Hi-Fi company. He is a very competent audio constructor, but later founded Intertex Data AB. He was recently voted one of the world’s “Top 100 Voices of IP Communications". Especially his subwoofers at the time were very advanced. His test panel included some of the most prominent “golden ears” in Sweden. Some were part of the high end audio industry.
I am astounded by your comments. Since you cite your credentials, I will cite mine. I am a professor in Medicine and have been running scientific studies for many years. I have supervised several doctors to their PhD. I know a lot about sound design of scientific studies. If you don't believe me, I can give you my name and you can look me up in Medline. I have been an audio enthusiast most of my life. When you call Stahl's study anything but scientific, it is obvious that you have very limited understanding on designing such studies. Stahl's study is one of the most intelligent study-setups I have come across.
Automatic skewing? What do you mean by that? Spl levels, time durations, how many ABs per individual, how often? The beauty of the study was that these parameters were entirely up to the test participants. You should know that the commonest complaints (on variables) from persons in a blinded test panel, is that the listening periods were too short or too long, switchings were done at the wrong places, one was not able to adjust spl to one's individual preferences, there were too few ABs or too many, the acoustics were unfamiliar, would prefer the well know listening room at home. In this case everyone adjusted the test situation to their preferences. The equipment was in their usual place in their living room. Variables and biasing factors were eliminated. Statiscs were done for each individual.
What do you know about cochlea fatigue? In reasonably short listening sessions, such fatigue resolves in a matter of minutes during silence. Each time they began another listening session, their ears should have recovered completely. At least at the beginning of each session they should be able to hear differences. Nobody did, and nobody claimed that the tests were flawed. Remember, some of them were "golden ears" used in the industry to check equipment quality. Long exposure to loud sound could leave a longer lasting or permanent damage. I am convinced that the highly skilled test panel in Stahl's test didn't drive their ears into fatigue since their hearing after years of listening seemed to be fully intact. In fact, one person was able to blindly identify a 0.08 dB dip in the left channel in the treble range, and 0.06 dB in the right channel. This guy had exceptional discriminative hearing abilities. After eliminating these dips, he could no longer hear any differences.
It is ok to be harsh, but in your case harshness backfired.
Hi Tsk,
Based on your quotes and comments, I will attempt to answer your comments in order. By the way I also was affiliated with Zenith, and dealt with the PR guys, so I know the marketing tatics.
1) I could care less who Mr. Stahl is. According to your quotes the author does not address Any variables (except probably sight and not knowing the manufacturer). As such the test is skewed towards no sonic difference because even one variable not addressed will skew the responders and the test results toward no sonic difference and with high confidence. True it was 1982 when he performed his test.
2) And I have been in the field for some 40 years, my sis is deaf and teaches deaf, my ex brother in law as MD as well. I was also in discussions at research hospital. Now that we have that past us, lets take a look at the experiment.
>> "Automatic skewing? What do you mean by that?"
First, let me state that the public is left in the dark, and ignorant of any other variables besides sight and not knowing the manufacturer. Now notice the author let the participant set their preferences.
Not knowing, the public will automatically do the natural, to set parameters that will skew the audio test towards no sonic difference, as we will see below. And not one passed the test (according to the author?) which is not surprising. Not surprising that the .06db guy failed since test conditions are different than normal listening and all the variables are not addressed, thus not scientific. Here are some article quotes, according to you.
"They were allowed to listen for seconds, minutes, hours or days before they did a switch, and make as many switches as they wanted."
And
" Spl levels, time durations, how many ABs per individual, how often? The beauty of the study was that these parameters were entirely up to the test participants."
Good way to skew the test, by letting the unsuspecting and ignorant participants setup the parameters, spl, time durations, how many ABs, how often etc while ignorant of both the variables and how they influence the test.
I am surprised that you kept back that recovery time is related to SPL levels, among other conditions. It can take over a day, but I see you only revealed to the public what supported your position, not all the science. Unfortunately, the public did not know this information when they took his test, did they, but yet they could set the spl level and time intervals, and how often.
>>"You should know that the commonest complaints (on variables) from persons in a blinded test panel, is that the listening periods were too short or too long, switchings were done at the wrong places, one was not able to adjust spl to one's individual preferences, there were too few ABs or too many, the acoustics were unfamiliar, would prefer the well know listening room at home. In this case everyone adjusted the test situation to their preferences. The equipment was in their usual place in their living room. Variables and biasing factors were eliminated. Statiscs were done for each individual."
Interesting that they can now control what they do not understand. And interesting the claim that "variables and biasing factors were eliminated, which is totally false. In fact, just the opposite as I have outlined above. If even one variable is not addressed, the results can and will be skewed towards no sonic difference.
What statistics were done for each individual? Being vague is not science Tsk.
As one can see, the test is so vague, a lack of controls, does not address even the basic variables. Lacks rigors by letting participants determine the parameters when they are ignorant of virtually all of the variables. One is guaranteed a false result.
In conclusion it is not surprising the resistance to dbt/abx testing as it is used for marketing purposes/manipulating public opinion.
How do we know.
Well, quite a few "scientists"/"experts" have been witnessed (even by the Feds) on different forums, falsifying data and attempting to cover it up multiple times, altering scientific measuring type tests that disagreed with their stance to discredit the measuring tests. In fact some have covered for each other by attacking the other's competitors, thus helping their sales.
These all have the same MO, they push dbt/abx testing as accurate and scientific when they are flat out flawed (but good for marketing purposes). I am also disippointed in that you only provided the science that benefited your position instead of all the science.
Cheers.
I am sorry, but I don’t have time to translate this rather long document.
From what you write, I dare guess that you didn’t do many scientific studies yourself, that you didn’t write scientific protocols, and first of all that you never were a referee on scientific studies. I have done all those things, and I have been opponent in many doctoral thesis discussions where one of the most important tasks is to critically evaluate the methodology. My experience is of course medicine, not sound reproduction, but the basic rules for a high quality study protocol is very similar.
Stahl’s methodology is of a high standard. He adhered to the guidelines published by Lipshitz and Vanderkoov (High resolution subjective testing using a double blind comparator. Journal of the Audio Engineering Society, no 7/8 1981) except that he used an AB setup, not an ABX setup. He accepted 7 of 7 correct guesses as a positive test (test person able to hear a difference), 16 of 20 or 63 of 100 correct guesses. Confidence level 99%.
You read my posts like the devil reads the Bible. “Unsuspecting and ignorant participants”? Here is what I wrote “the test panel included some of the most prominent “golden ears” in Sweden. Some were part of the high end audio industry”. Where did you find those unsuspecting and ignorant persons? And of course they did know that they were part of a study. They had to activate the random switch generator.
You accuse me of cheating. You wrote: “I am surprised that you kept back that recovery time is related to SPL levels, among other conditions. It can take over a day, but I see you only revealed to the public what supported your position, not all the science”. Here is what I wrote: “Long exposure to loud sound could leave a longer lasting or permanent damage” You continue: “Unfortunately, the public did not know this information when they took his test, did they, but yet they could set the spl level and time intervals, and how often”. Do you really believe that professional “golden ears” are that ignorant? Every teenager I know, know that subjecting your ears to high and prolonged sound (music) will damage you hearing.
You accuse me of being dishonest: “I am also disippointed in that you only provided the science that benefited your position instead of all the science”. I do not know of scientific studies that prove me wrong (that blinded tests are the de facto scientific standard). Please, provide some references.
You say that the way the test was done will clearly skew the test to a no difference result. You base this manly on that the test is not considering “the variables”. Could you please, please, list those variables for me. Don’t refer to what you said in earlier posts (for clarity and for me). I suggest that you wait with your next answer until you can do that, and have found the references I asked for. If not, this thread is getting ridiculous and I will use my time on more important tasks
Ah, interesting:
Stk: "Here is what I wrote: “Long exposure to loud sound could leave a longer lasting or permanent damage”"
Let's see what you actually stated and what I addressed/replied too. Quote.
Stk: "What do you know about cochlea fatigue? In reasonably short listening sessions, such fatigue resolves in a matter of minutes during silence. Each time they began another listening session, their ears should have recovered completely. At least at the beginning of each session they should be able to hear differences."
Anyone see tsk post "Long exposure to loud sound could leave a longer lasting or permanent damage” in the above quote? That is what I addressed with my comments, your lack of full disclosure. Also notice the vague expression "At least at the beginning of each session they should be able to hear differences."
Again, how long is the interval? "should be able"? Certainly not scientific.
Secondly, your claims of testing, referee etc are obviously not in the audio field, not your specialty, via your ignorance of the variables, rigors involved not being addressed. Your refereeing, testing, etc has no bearing as a result. You should obviously understand that different fields/arenas have different variables to deal with Tsk. You might also want to read some medical information presented earlier by an MD, medical information, and/or involved in trials, which also does not value your test presented in this string.
Were you at this test in 1982? No you say. So you have no idea of the acoustics of the room, and if they played any role in masking musical information, thus desensitizing the test.
You sure have a lot of faith in the test for someone whose specialty is Not in the audio field, who does not know the variables, was not at the testing event for an acoustical check, little controls/rigors exercised per the author's own comments (you quoted) etc etc.
References? Check research hospitals who deal with Otology. (I did over the years.)
As such, I cannot go along with your faith/opinion. The test is certainly is not real science by any stretch of the imagination. However, I am sure the marketers, PR people, love these tests as it is money in their pockets. I am also sure you will comment further ................. :)
Cheers.
Oh dear, oh dear, oh dear. You quote part of my text and are very angry with me because not everything I said (about long exposure to loud sounds) was within your quote? I beg you to forgive me. Next time I will try to keep important stuff within your quotes.
I think it is wise to end this discussion now. it has come too close to insanity
Your response is not surprising, to say the least.
Again generalized, non-scientific statements, sidestepping
the variables, or only partial science.
So you state the SPL used only required a few minutes
recovery, which I clearly and openly addressed in my previous post (along
with not coming forth with all the science) and you claim later in the post,
Stk: "Here is what I wrote: “Long exposure to loud sound could
leave a longer lasting or permanent damage” You continue:
“Unfortunately, the public did not know this information
when they took his test, did they, but yet they could set the
spl level and time intervals, and how often”. Do you really
believe that professional “golden ears” are that ignorant?
Every teenager I know, know that subjecting your ears to
high and prolonged sound (music) will damage you hearing."
Nothing between the extremes, Stk? After all,
the paricipants were allowed whatever SPL, time exposure etc
they wished according to your post, therefore across the entire spectrum.
And what and how much medical knowledge do they know concerning
varying, across SPL levels etc????
I also gave you a fair, extra chance to reply to my initial response,
and post all the science, on your own.
Instead, the same extremist comments that favored your position.
(And let's not forget about the other variables you failed to address. See previous posts.)
Imo, the audio dbt test you posted is
not scientific nor acceptable in any sense of the word.
One cannot claim science and then disregard, or pick and choose the science to believe.
See previous posts for medical comments that
would disagree with your assessment of the test as well.
Cheers and all the best Tsk.
Luckily, I was able to find a more readable copy of the pdf-file of the test. However, it is still in Swedish: http://dl.dropbox.com/u/10878676/How%20do%20they%20really%20sound.pdf.
Enjoy
"It's called cherry picking - finding a single example that "proves" your point and using it as if it is conclusive evidence for a general rule. "
First of all, I do measurements, so I am a meter guy. But I want things fair, and I have seen too many on different forums using "science" and audio dbt/abx testing as a cover for marketing (federal investigator has witnessed as well). I am not accusing the author of such, so don't worry.
However, the proper contention is just the opposite of yours. It is always the claim, in every single post that I have ever seen, that audio dbt/abx "proves,"something, or a general claim almost as fact.
As such, it only takes one example to poke a hole in an audiodbt/abx claim. And Mr. Harley's example is good to read. By the way, among the many ethical mis-conducts I have witnessed, a highly degreeed "expert" attempted to alter a test (another forum) to discredit it, and none of the dozen or so colleagues/objectivists chastised him when caught. So I am careful what to believe, obviously from the "pet rockers" but also the scientists and those claiming to be audio experts.
Cheers.
It is obvious in music, an event over TIME, that listening or experiencing small snippets of music or tones is ineffectual and short-sighted, in every conceivable way. The 1982 test above cites no music whatsoever, and people can only show the mettle of their listening prowess by listening to MUSIC. 'To avoid cochlea fatigue'... that's rich. The very test process cited is to fatigue the ear and brain.
A test object or a tone is NOT music. Music is what the ear and brain of man understand intrinsically. And there are millions of variables in a 20 minute song. To ask us to exclude these variables is to reduce music to simple tones, and therefore no longer music. Asking a human, in any time frame, to test anything else other than music, is pointless and wrong. Without listening to music over an extended time, in a familiar surrounding, with familiar controls against the variable, we have nothing about nothing.
-Glotz
I am glad you read the 1982 test, but I am sorry that you didn't understand much. Of course they listened to music for as long (or short) periodes as they wanted, never to test tones. You should be aware that that some prefer short intervals, some prefer longer. One important point of the test was to listen in familiar sorroundings using familiar controls.
I see that you also are hooked on varialbles, millions, infact. If you could list the first 200 000, I would be more than satisfied. Yo do know what a variable in a comparative study is, don't you?
"I am glad you read the 1982 test, but I am sorry that you didn't understand much. Of course they listened to music for as long (or short) periodes as they wanted, never to test tones. You should be aware that that some prefer short intervals, some prefer longer. One important point of the test was to listen in familiar sorroundings using familiar controls.
I see that you also are hooked on varialbles, millions, infact. If you could list the first 200 000, I would be more than satisfied. Yo do know what a variable in a comparative study is, don't you? "
----
Interesting post TSK; especially since you knew so little about audio testing that you had to request variables/rigorous testing methods yourself in a previous post. Not surprising you could not explain them, which supports why you presented such a flawed audio test to begin with.
So how is it that you claim to be qualified to respond to the above poster in such a condescending manner when you presented such a flawed, lack of rigors, audio test yourself.
Cheers.
This whole debate about blind vs. non-blind listening is absolutely fascinating to me. It has been quite entertaining following the spectacular flame fest between Steve S and TSK :)
The thing that is so interesting about this issue is how emotionally charged it is. One would think that this would be a pretty mundane conversation about the various technical/physical considerations involved, but the debates never seem to maintain that kind of composure. I have witnessed similar emotional intensity in arguments about religion or politics. I believe the reason for this is that, in the same way religion or politics can call into question an individuals foundation of morality, the blind vs. non-blind listening issue can call into question an individuals fundamental understanding of how human beings perceive sound. This is a profoundly significant issue for people who are passionate about audio.
For the folks who are uncomfortable with blind listening tests, it seems there is no amount of information, examples, "evidence", test results etc. that will change the fundamental idea that a person can simply listen to two sound sources and make an accurate personal preferential determination based on what their ears tell them. For the blind listening proponents, the idea of drawing conclusions on the technical performance of audio equipment based on "casual" subjective listening experiences, feels insufficient and unsubstantiated.
Regardless of what side of the argument one is on, the debate is truly fascinating. I enjoyed TSK's summary/translation of Stahl's 1982 test setup (sadly I don't speak Swedish so I can't read the whole thing). There are things about the approach that seem quite clever to me. I also enjoy Steve S's commitment to making sure tests are done in a meticulous disciplined way. I would love to hear from Steve S. how to set up a proper scientific blind listening test that would give accurate results.
Thank you for all of the spirited debate!!
ev
After reading every post in this thread, I can only come away with one overriding conclusion: "tskjaerpe" needs to stop feeding the troll that is "Steve S."
After reading every post in this thread, I can only come away with one overriding conclusion: "tskjaerpe" needs to stop feeding the troll that is "Steve S."
If you understood what you read, as you claim, why are you at a loss to present
any scientific evidence to support your/tsk position and undermine the evidence
I have presented in previous posts?
Being unable to scientifically respond and instead calling someone a "troll "
leaves the question as to your motives.
Cheers.
-
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
MicrosoftInternetExplorer4
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
Hello all. I just joined only to respond to this thread. People certainly are very "emphatic" about their views! Without knowing that there were so many ongoing "discussions" (arguments) going on about the results and validity of ab/abx/scientific tests etc, we recently performed our own crude "can you tell the difference" tests. As a point of reference, I have a 20 year old Adcom GFA 500 / dahlquist DQ 20 / 15" Velodyne sub system. He has a Bryson amp / Mac pre amp/ Magnaplaner system with a sub. We have gone to each others houses with parts and cables and done "close your eyes" tests on various components. Here are some of the results we came up with. First off we started this because my assertion was that you can't stop and take five munutes to swap cables or components and then say which sounds better. It has to be instantly switchable between the two. Here's what started our journey. He said that the difference between the new high end interconnect cables he bought were amazing when compared to his old monster cables. So, here is what we did. We both did about 10 a/b tests each on the following tests. 1. Mono setting. One channel with new high end interconnect RCA cables the other with 20 year old Radio Shack cheapies. (regardless of the direction arrows 2. Cheap Panasonic 5 disc changer vs. Rotele SA 3. SA cd vs. ipad 128kbps 4. 20 year old 16 gauge radio shack speaker cables vs. high end $ / foot thick as my little finger cables. The results were always 50% +/- 1%. Our results seem to be the same as just about everyone elses.. Yes, my DQ 20's sound different than his MP's but not necessarily better or worse. Personal preference. BUT, the interesting thing that struck me the most was one test that I did. During one round of testing I didn't make any changes when a/b'ing the components and the results were STILL 50%! He thought he could hear a difference! In other words, the human brain is an amazing organ! No, we are not claiming to be experts on any level but know a good sounding system when we hear it. This page seemed to have the largest compilation. http://www.head-fi.org/t/486598/testing-audiophile-claims-and-myths . And if you really want to drive yourself crazy, try this test. http://mp3ornot.com/ . I guess what I take from this is the following; Assuming you are like me and cannot afford a $100K price is no object system there seems to be a point of diminishing return. You could got to Best Buy and spend $150 on a system for your teenager to play cd's in her room. But once you start spending 4 figures on a system it's hard to justify spending 5 or 6 figures unless you have a room designed for it. The differences are very, very subtle. One quote that I saw reagrding interconnects summed it up for me and seems applicable to the entire subject,. which I guess explains why I use the Rotele cd player vs the Panasonic. It just looks cooler.
"This is not to say that some people will not derive great enjoyment from the fact that they have spent as much on their cables as mere mortals can afford for their whole system, but this is "enjoyment", and has nothing to do with sound quality. This is about prestige and status, neither of which affect the sound."
Any thoughts?
Sorry about the lines of code. Don't know why it does that. Today (before the NCAA tournament and a lot of beer) we test the Rotel against a 10 year old Sony Discman!
$2000 Rotel CD player vs. 15 year old Sony Discman.... a/b'd through a new Mac preamp and speakers and Sennhauser headphones. No discernible difference. Does that mean that the dac makes no difference.and there has been no change in technology in 15 years? I'm certainly not going to replace the Rotel with the DM in my rack but DAMN!! HELP!!
ramaroodle,
""This is not to say that some people will not derive great enjoyment from the fact that they have spent as much on their cables as mere mortals can afford for their whole system, but this is "enjoyment", and has nothing to do with sound quality. This is about prestige and status, neither of which affect the sound."
Any thoughts?"
Based on the neurological studies I have read, People who have an unwavering belief that a $20,000 Odin Supreme Reference Power Cable will make a dramatic audible difference, WILL hear a dramatic difference. In my opinion, it is more likely that those dramatic differences are a result of psychological expectations, than actual perceivable, audible differences emanating from the speakers/headphones. That doesn't make the listening experience any less "real" or dramatic for the listener. Sensory experiences derived from psychological influences are every bit as "real" to the listener as actual perceivable differences and that may very well be worth the $20,000 to some.
"$2000 Rotel CD player vs. 15 year old Sony Discman.... a/b'd through a new Mac preamp and speakers and Sennhauser headphones. No discernible difference. Does that mean that the dac makes no difference.and there has been no change in technology in 15 years? I'm certainly not going to replace the Rotel with the DM in my rack but DAMN!! HELP!!"
I had a similar experience when comparing DACs. In my case it was a $8500 DAC vs. $300 DAC. When we did the first round of listening tests, nobody could correctly identify the different DACs. We decided to switch to different source material for the listening test. Instead of using a final mastered 44.1K/16bit audio file from a favorite CD, we used a 96K/24bit unmastered audio source and suddenly everybody could correctly identify the two DACs. Lesson learned!! Source material can make a huge difference when doing blind listening tests. In this case, all involved agreed that the limiting that is so common in modern CD mastering was masking the audible differences in the DACs we were testing.
In general, It sounds like I had a very similar experience as you when I first started experimenting with blind listening tests. I had this "wait a minute… You mean all this expensive audio gear doesn't make any real difference at all??!!" type moment. It was confusing and disconcerting at first. After spending more time refining the blind listening test process, I found that I could dramatically improve my overall sensitivity when testing blind and was able to correctly identify some pretty subte stuff. In addition to the source material, these are the other things that have made a huge difference in increasing sensitivity for me.
Don't just start listening blind. We found it incredibly difficult to try and correctly identify sound sources if not given an opportunity to listen unblind first. We always start by listening to the 2 sound sources knowing which is which so we can try to clearly establish what differences we are hearing and then try to pick them out blind.
Use short passages of music for doing the comparison. We found that one could greatly increase perceptibility if we looped a short section of music and were always comparing that same passage when switching from one source to another. In addition, different sections of a sound source may sound more dynamic or have a wider stereo image because the arrangement has changed and that can be misconstrued as a difference in the equipment being compared.
Our experience is consistent with your thought that being able to switch instantaneously between sources also helps improve one's ability to correctly identify very subtle differences in sound sources. It seems, human sonic memory is somewhat fleeting.
Level matching is probably the most common mistake I see people make when doing blind listening tests. If the two sound sources are not matched .1db or better the test is pretty much useless. Even the slightest disparity in volume will create the illusion that one sound source is more "present" or "clearer" than the other.
The basic idea is to isolate the one variable you are testing for and try as best as you can to make sure all the other variables are consistent/identical as possible. ergo: same source material, same level, same musical passage, identical chain (except for the part being tested) etc. etc.
That is where I am at with it thus far. I am always looking for opportunities to refine the process. Let me know if there are any other techniques that have been helpful for you!
EV
Thanks for the reply. I guess my issue is that all I want is a good listening experience, not necessarily be able to "pick out the fly shit in the pepper" as the saying goes. Yesterday, after doing the discman test we tried another "blind" experiment. I hooked up one of my old dahlquist dq 6's to one channel and a dq 20 on the other and literally blindfolded my friend. At first he thought the dq 6 was the more expensive dq 20 but you are correct in that the vol level makes a huge difference. Once we got them leveled it was obvious that the 20's had a much more listenable sound and was the clear winner. (thank god).
The source material issue is something that we havent considered. Most of my listening is of CDs and other digital formats such as wav and mpeg, acc and mp3. and rarely my old vinyl collection. So i think i dont need to invest money to be able to hear a very subtle difference when I listen to a 96/24 recording maybe once or twice a year. Plus, mastering is not necessarily a bad word.
What I have taken from all of this is that even after 20 years I love my system and that the end of the pipeline is the loudspeaker and is the biggest difference maker. Yes, it is a little disconcerting to know that if you are playing a good quality cd someone could spend $30k on amps and preamps $5000 worth of cables, $4000 of cd players etc only to need a PhD to tell the difference from $1000 worth of gear played through the same speakers. I say that because for that much, on eBay I could buy my Adcom amp and preamp and a set of dq20's and a Sony discman and 16 gauge radio shack cables. The only thing I'd never give up is my 15" velodyne sub. It is " my precious".
PS. In our discman test we level matched both units and played 2 of the same cd cued to the same place.
My thoughts on the original question:
1. There is not such a thing as an unflawed test only some are more flawed than others. Any test is only good at doing what it is designed to do. A personal review (as that commonly done in magazines) is not really a test at all as it is steeped in potential flaws.
2. If Blind tests are generally flawed then I can think of many ways an individual subjective test done by even a seasoned professional can be flawed. Let’s assume the test is non-blind, i.e. sound levels are not necessarily matched and labels are not hidden – just what are the advantages of this behaviour? Do we just accept that the reviewer has some magical ‘gift’ that allows them to cut through the usual psychological errors – golden ears maybe? I just don’t believe this in the majority of cases.
3. If only one person can reliably hear a difference in a blind trial then the result is valid. It so happens that this individual is usually the person who either has the best ears and/or is most knowledgeable about what to listen for. If that person can tell the difference then it is not the idea of the blind trial that is flawed, only the fact that the initial implementation failed to take this person’s views into account – a blind test is only as good as it’s intentions and the conclusions can only be interpreted as far as the intentions go. If you want to see if the general public can reliable tell a difference then it will tell you this. If it is designed to see if a particular seasoned professional can tell a difference it will also tell you this (this is not to say the two results can conflict in interesting examples like that quoted).
On balance if the seasoned professional can reliably tell a difference under blind testing I am more likely to believe this than the result using 1000’s of general public – the whole idea that numbers matter in blind testing is a bit of a myth, it’s the quality of the individuals (appropriate to the test) that counts more, after all if we asked the general public what TV program they preferred it would probably be Coronation Street or the likes! Quantity vs. quality in = quantity vs. quality out! Common sense actually.
4. The fact that sound quality is so subjective is not an argument against blind trials it is an argument for them – by design a proper blind trial will tend to cancel out the ‘subjective’ and ‘circumstantial’ factors, to give a clearer, unbiased picture, even with just a single listener. Any other kind of ‘test’ is not valid as it is not really a test anyway, only unregulated personal subjective hearsay, by definition. If the hearsay is backed up by a proper trial (n this case featuring the same listener) then it becomes objective information and can’t be disregarded, it’s a simple as that!
Kevin
.
I guess my issue is that it is just disconcerting (and fascinating) that the differences are so subtle for such a huge monetary delta. OK, maybe there is a difference but it is so small that it seems almost trivial. Kind of like, if the goal is to go 200 mph, does it matter if you use a Ferrari or a Mustang to get there? Human nature and behavior says that it does even though 3 hrs on the highway in a Ferrari would wear you out! I saw a listing for a $140K power amp! $20K for a cd player? $30K for cables? If you can afford that, more power to ya but if you need a blind test to differentiate it from a $1K system when they are both output to my Dahlquist DQ 20's it is just....? I'm not sure what adjective to use. People seem to argue over the validity of blind tests and weather or not they can tell a difference, but rarely does anybody say the difference is any better or worse. Ok, so you can hear a difference. That's not the issue for me, but obviously for some it is. Who cares if you can hear a difference? Which do you like better? I might hear a little more highend through $30K cables but it might tire me out and I'd want to eq it out. Or, maybe the $100K amp is a little warmer and negates the difference anyway. Drives me CRAAAAZY! I was looking at old issues of Audio Critic from the 80's and they were having the same arguement then. Like someone else said, it's like telling a religious person that you're an athiest and watching their head explode.
Again, somebody try this. http://mp3ornot.com/ . Can you differentiate them? I can't, but I still have a hard time telling my ipad to convert all files to 128kbps and using my Panasonic cd player or discman vs my Rotel.
We humans are strange creatures, no? Subjectivity is a bitch.
Andy
Hi Andy,
Thoroughly agree here - the question of 'which is better' is a difficult one and full of problems.
The question of 'can you hear a difference' is usually the first one and can rule out the necessity to make a decision on which is actually best. The problem is if a difference is found there may be some tradeoffs as well as improvements. You have to examine the design goals thorougly and have to first establish whether the change is moving in the direction as expected from the science. If it's not then you have to re-examine the science. If you are getting good agreement then you are beginning to understand the bigger picture and can understand how design relates to performance. It is important to have clear design goals, however, frequency response neutrality perhaps being a good one, for example.
Audio design is really an exercise in applied common sense and intelligence, together with controlled observation. Its very easy to lose your way entirely or end up into a blind alley or trap if loosely applied observation reigns supreme!
The MP3 test - thanks for this. I will have to try it myslef but I won't be using PC speakers! Despite the vogue speakers are by far the most overriding factor determining sound quality and no amount of increased signal resolution will change this fact!! You infact have to degrade the signal pretty badly to introduce the kind of distortions that the average speaker does all the time.
Kevin
' I might hear a little more highend through $30K cables but it might tire me out and I'd want to eq it out. Or, maybe the $100K amp is a little warmer and negates the difference anyway.'
This is very indicative of what has gone a bit 'astray' wih 'high-end' audio. Cables should not be giving you a little more high-end, 30K or otherwise. If they are then they are doing something to the audio signal, something your £2 lamp chord should not be doing, let alone a 30K cable setup. Oh but you can get a 100K amp that negates this difference! What on earth is an amp doing changing the frequency response from neutral also, unless it has tone controls. The uninformed would easliy believe that the cable and amp have some special properties that by some careful design allows them to sound good when matched together, but a proper trial would have shown that lamp-cord + a basic £100 amp would also give the same result, and also you wouldn't be limited to the specific combination of components. This is the validity of the blind trial - it exposes truth behind whether you need to pay 100 times the cost for something that is a botched design from the outset and actually has more limitations. If you ever see the claim that a cable is so designed that it limits your choice of amp it is a sure sign to get alarm bells ringing!!
Kevin
I Just seen this article that supports the view that ABX tests are flawed:
http://www.positive-feedback.com/Issue56/abx.htm
The main argument is:
The "X" in the ABX is either A or B, randomly selected, the listener needs to identify whether that "X" is "A" or "B". Unfortunately human beings do not have the ability to compare three sonic events sequentially. One must keep a sonic memory of sample "A" they just listened to so they can compare it to the sample "B" and then listen to "X" and try to decide if it sounds more like "A" or "B". It is the introduction of this third sound that makes it impossible for human beings since we can compare two different sounds as long as we don't wait too long however our sonic memory cannot juggle three no matter how many times one is allowed to go back and forth. Thus ABX tests usually get null results, and cause listening fatigue.'
I must say I am shocked about this so-called expert's lack of understanding around the concept of ABX testing. What you are actually doing in ABX is a series of side-by side trials with X against A or B to see if you can identify which matches. There is infact no need to hold 3 sonic memories at all. At most you are comparing what you are hearing with what you have just heard (1 sonic memory) and this is no worse than any other test in this respect. The only additional item you are memorising is whether you indeed thought X matched the other sample or not, and since you can freely switch between the two you can test this over and over again (no memory is actually necessary at all). You simply just need to say how confident X matches the previous sample heard in the cases of first listening to A then B, with X in between in both cases. It's completely scientific and relevant to the situation in audio where there is a very transitory memory and equally as good as just comparing A with B. The ABX protocol simply makes the test easier to set up as you don;t have to randomise AB pairs and take these randomisations into account in the result - the test is more straightforward to set up and interpret.
I have studied statistical analysis in depth during my extensive experience and I just find that discrediting ABX as kind of voodoo shows an uncomplete conceptual understanding of science and only serves as to discredit further the snake-oil peddlars that don't like the idea that ABX doest's support their misconceptions. The inconvenient truth is that any other kind of testing is worthless as it's not any kind of test at all only unregulated subjective hearsay taken on faith. I sometimes despair that some people have not evolved beyond the dark ages of myths and magic!
Kevin