Post by Martin John Butler on Feb 5, 2016 10:29:00 GMT -6
I rarely visit Gearslutz anymore, but touch base there every once in a while if a manufacturer comments. Steven Slate was talking about blind tests of Slates preamp emulations against the hardware. I replied with this, risking typical GS vitriol. It's lengthy, so grab some coffee, but I feel that more knowledgeable people than me have finally said what I struggled to express well previously.
Last month's Stereophile editorial page was about this very issue, blind testing. Obviously, Mr.Slate swears by them. "See, they can't tell a difference, in fact, they prefer the plug-in" could be his mantra. But unfortunately, there's more to the story than mets the eye, and ears in this case.
First, there's no accounting for the individual's abilities as a critical listener, even if they're experienced AE's.
Second, there's the "test" factor, some people respond negatively to testing itself, elevating all sorts of things, like blood pressure, anxiety, insecurities, etc.
Third, IME, differences between things audio don't always reveal themselves immediately, it might take a few hours, days, or weeks, but once noticed, you can pinpoint it again and again. Like a small yellow egg stain on an otherwise clean tie, your eyes, (or ears in this case), go right to it.
This is such a contentious debate, the conclusion I always end up with is to trust my own ears, and if others feel differently, well, good luck to you, honestly.
This article from last months Stereophile Op Ed page is worth every minute it takes to read, as I couldn't have said it better myself. John Atkinson was an expert before Steven Slate wore diapers. I like Mr. Slate, I like and use many of his products, and I love that he's trying to bring innovation to the musician without deep pockets, but let's face it, he's 50% P.T. Barnum as well.
So here ya go:
To the Simple, Everything Appears Simple, by John Atkinson (Editor, Stereophile Magazine)
"My spirits sank as I read the comments on Stereophile's Facebook page. In the November issue, we had published reviews of UpTone Audio's USB Regen device by Kalman Rubinson, Michael Lavorgna, and myself. Michael and Kal had enthused about the positive effect the USB Regen had made, but I could detect no measurable difference. On Facebook, Dan Madden had written, "I think a device like this would need a blind listening test to verify that a listener could hear the difference in a statistically measurable way, in a very high percentage of times."
I have no argument with that statement. But then, Madden went on to say, "Have someone hook up this gizmo on YOUR system, and then have you listen to it with the same song 10 times with and without it connected randomly, and if you get the 'better sound with it' right 9 times out of 10 then I would be convinced that it makes a difference to the sound."
Sounds like a simple test, but designing a blind test that can be used to confirm or deny that a real but small audible difference exists is far from simple. In the formal statistical analysis of the test results, you can't prove a negative; you can conclude only that, under the circumstances of the test, no difference could be detected. By contrast, a statistically significant positive identification can be regarded as universal proof that a difference is detectable. But that analysis depends on the test examining just one variable—the difference being examined—and, as I have repeatedly discussed in this magazine, the blind-testing methodology itself can be an interfering variable in the test. The fact that the listener is in a different state of mind in a blind test than he or she would be when listening to music becomes a factor.
Rigorous blind testing, if it is to produce valid results, thus becomes a lengthy and time-consuming affair using listeners who are experienced and comfortable with the test procedure. Otherwise, the results of the test become randomized, hence meaningless.
In the words of famed mastering engineer Bob Katz: "There is no such thing as a 'casual' blind test. Blind tests are a serious business. Experimenters need training how to perform blind tests well. Blind tests can fail (produce statistically invalid results) if the experimenter neglected one critical detail. Weeks of intensive study are required to learn how to perform blind tests. Then weeks of preparation to create the test. Then weeks of testing to follow."
Some probably think it paradoxical for the editor of a magazine based primarily on the concept of judging audio components by listening to them under sighted conditions to be commenting on blind-testing methodology. However, since the very first blind listening test I took part in, in 1977, organized by the late James Moir for Hi-Fi News magazine, I have been involved in well over 100 such tests, as listener, proctor, or organizer. My opinion on their efficacy and how difficult it is to get valid results and not false negatives—ie, reporting that no difference could be heard when a small but real audible difference exists—has been formed as the result of that experience.
There is, in fact, a formal discipline devoted to the design of blind tests, based on recommendations formulated by the International Telecommunications Union in its document ITU-R BS1116-3 (footnote 1). Katz was summarizing the ITU guidelines and their consequences; the context for his comments was a workshop at the 139th Audio Engineering Society Convention (footnote 2), held last October in New York, on the audibility of possible improvements in sound quality made by recording and playing back audio with bit depths greater than the CD's 16 and sample rates higher than the CD's 44.1kHz.
This is a contentious subject. On the Stereophile website forum last summer, reader David Harper wrote, "Humans do not hear any difference between 16-bit/44.1kHz and any higher bit/sampling rate. This is established fact."
Harper was referring to a 2007 paper by E. Brad Meyer and David R. Moran that "proved" that there was no sonic advantage to high-resolution audio formats (footnote 3). Their conclusion ran counter to the experience of many recording engineers, academics, and audiophiles, but other than doubts over their methodology and the fact that their source material was of unknown provenance, Meyer and Moran's paper seemed to be the final formal word on the matter.
Until now. The AES workshop in which Bob Katz was taking part also featured presentations by legendary recording engineer George Massenburg (now a Professor at McGill University, in Montreal) and binaural recording specialist Bob Schulein. But it was the first presentation—by Joshua Reiss, of Queen Mary University, in London, and a member of the AES Board of Governors—that caught my attention.
Some 80 papers have now been published on high-resolution audio, about half of which included blind tests. The results of those tests, however, have been mixed, which would seem to confirm Meyer and Moran's findings. However, around 20 of the published tests included sufficient experimental detail and data to allow Dr. Reiss to perform a meta-analysis—literally, an analysis of the analyses (footnote 4). Reiss showed that, although the individual tests had mixed results, the overall result was that trained listeners could distinguish between hi-rez recordings and their CD equivalents under blind conditions, and to a high degree of statistical significance".
Last month's Stereophile editorial page was about this very issue, blind testing. Obviously, Mr.Slate swears by them. "See, they can't tell a difference, in fact, they prefer the plug-in" could be his mantra. But unfortunately, there's more to the story than mets the eye, and ears in this case.
First, there's no accounting for the individual's abilities as a critical listener, even if they're experienced AE's.
Second, there's the "test" factor, some people respond negatively to testing itself, elevating all sorts of things, like blood pressure, anxiety, insecurities, etc.
Third, IME, differences between things audio don't always reveal themselves immediately, it might take a few hours, days, or weeks, but once noticed, you can pinpoint it again and again. Like a small yellow egg stain on an otherwise clean tie, your eyes, (or ears in this case), go right to it.
This is such a contentious debate, the conclusion I always end up with is to trust my own ears, and if others feel differently, well, good luck to you, honestly.
This article from last months Stereophile Op Ed page is worth every minute it takes to read, as I couldn't have said it better myself. John Atkinson was an expert before Steven Slate wore diapers. I like Mr. Slate, I like and use many of his products, and I love that he's trying to bring innovation to the musician without deep pockets, but let's face it, he's 50% P.T. Barnum as well.
So here ya go:
To the Simple, Everything Appears Simple, by John Atkinson (Editor, Stereophile Magazine)
"My spirits sank as I read the comments on Stereophile's Facebook page. In the November issue, we had published reviews of UpTone Audio's USB Regen device by Kalman Rubinson, Michael Lavorgna, and myself. Michael and Kal had enthused about the positive effect the USB Regen had made, but I could detect no measurable difference. On Facebook, Dan Madden had written, "I think a device like this would need a blind listening test to verify that a listener could hear the difference in a statistically measurable way, in a very high percentage of times."
I have no argument with that statement. But then, Madden went on to say, "Have someone hook up this gizmo on YOUR system, and then have you listen to it with the same song 10 times with and without it connected randomly, and if you get the 'better sound with it' right 9 times out of 10 then I would be convinced that it makes a difference to the sound."
Sounds like a simple test, but designing a blind test that can be used to confirm or deny that a real but small audible difference exists is far from simple. In the formal statistical analysis of the test results, you can't prove a negative; you can conclude only that, under the circumstances of the test, no difference could be detected. By contrast, a statistically significant positive identification can be regarded as universal proof that a difference is detectable. But that analysis depends on the test examining just one variable—the difference being examined—and, as I have repeatedly discussed in this magazine, the blind-testing methodology itself can be an interfering variable in the test. The fact that the listener is in a different state of mind in a blind test than he or she would be when listening to music becomes a factor.
Rigorous blind testing, if it is to produce valid results, thus becomes a lengthy and time-consuming affair using listeners who are experienced and comfortable with the test procedure. Otherwise, the results of the test become randomized, hence meaningless.
In the words of famed mastering engineer Bob Katz: "There is no such thing as a 'casual' blind test. Blind tests are a serious business. Experimenters need training how to perform blind tests well. Blind tests can fail (produce statistically invalid results) if the experimenter neglected one critical detail. Weeks of intensive study are required to learn how to perform blind tests. Then weeks of preparation to create the test. Then weeks of testing to follow."
Some probably think it paradoxical for the editor of a magazine based primarily on the concept of judging audio components by listening to them under sighted conditions to be commenting on blind-testing methodology. However, since the very first blind listening test I took part in, in 1977, organized by the late James Moir for Hi-Fi News magazine, I have been involved in well over 100 such tests, as listener, proctor, or organizer. My opinion on their efficacy and how difficult it is to get valid results and not false negatives—ie, reporting that no difference could be heard when a small but real audible difference exists—has been formed as the result of that experience.
There is, in fact, a formal discipline devoted to the design of blind tests, based on recommendations formulated by the International Telecommunications Union in its document ITU-R BS1116-3 (footnote 1). Katz was summarizing the ITU guidelines and their consequences; the context for his comments was a workshop at the 139th Audio Engineering Society Convention (footnote 2), held last October in New York, on the audibility of possible improvements in sound quality made by recording and playing back audio with bit depths greater than the CD's 16 and sample rates higher than the CD's 44.1kHz.
This is a contentious subject. On the Stereophile website forum last summer, reader David Harper wrote, "Humans do not hear any difference between 16-bit/44.1kHz and any higher bit/sampling rate. This is established fact."
Harper was referring to a 2007 paper by E. Brad Meyer and David R. Moran that "proved" that there was no sonic advantage to high-resolution audio formats (footnote 3). Their conclusion ran counter to the experience of many recording engineers, academics, and audiophiles, but other than doubts over their methodology and the fact that their source material was of unknown provenance, Meyer and Moran's paper seemed to be the final formal word on the matter.
Until now. The AES workshop in which Bob Katz was taking part also featured presentations by legendary recording engineer George Massenburg (now a Professor at McGill University, in Montreal) and binaural recording specialist Bob Schulein. But it was the first presentation—by Joshua Reiss, of Queen Mary University, in London, and a member of the AES Board of Governors—that caught my attention.
Some 80 papers have now been published on high-resolution audio, about half of which included blind tests. The results of those tests, however, have been mixed, which would seem to confirm Meyer and Moran's findings. However, around 20 of the published tests included sufficient experimental detail and data to allow Dr. Reiss to perform a meta-analysis—literally, an analysis of the analyses (footnote 4). Reiss showed that, although the individual tests had mixed results, the overall result was that trained listeners could distinguish between hi-rez recordings and their CD equivalents under blind conditions, and to a high degree of statistical significance".