ReadScapeS: The Blumlein-conspiracy -- limitations of two- and multichannel sound

ReadScapes

The Blumlein conspiracy
limitations of two- and multichannel sound

written by Ralph Glasgal
illustrations from the Ambiophonics Institute

edited by ThingMan - January 2018

Ralph Glasgal is founder of the Ambiophonics Institute in Rockleigh, New Jersey (USA)
much more interesting information can be found on the website of Ambiophonics

On December 14th, 1931 the EMI sound engineer, Alan Dower Blumlein, filed a British Patent Specification 394325.

It was entitled "Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing systems."

In the usually arcane language common to most patent applications, Blumlein's invention "consists in a system of sound transmission wherein the sound is picked up by a plurality of microphone elements and reproduced by a plurality of loud speakers, comprising two or more directionally sensitive microphones and/or an arrangement of elements in the transmission circuit or circuits whereby the relative loudness of the loud speakers is made dependent upon the direction from which the sounds arrive at the microphones."

Who was Blumlein and how did it all start?

(Bio excerpts taken from Wikipedia)

Alan Dower Blumlein (29 June 1903 – 7 June 1942) was an English electronics engineer, notable for his many inventions in telecommunications, sound recording, stereophonic sound, television and radar. He received 128 patents and was considered as one of the most significant engineers and inventors of his time.
He died during World War II on 7 June 1942, aged 38, during the secret trial of an H2S airborne radar system then under development, when all on board the Halifax bomber he was flying in were killed when it crashed at Welsh Bicknor in Herefordshire.

Alan Dower Blumlein was born on 29 June 1903 in Hampstead, London, to Semmy Blumlein, a German-born naturalised British subject. Semmy was born to Joseph Blumlein, a German of Jewish descent, and Phillippine Hellmann, a French woman of German descent. Alan's mother, Jessie Dower, was Scottish, daughter of a missionary. He was christened as a Presbyterian, though he later married in a Church of England parish. His future career seems to have been determined by the age of seven, when he presented his father with an invoice for repairing the doorbell, signed "Alan Blumlein, Electrical Engineer" (with "paid" scrawled in pencil). His sister claimed that he could not read proficiently until he was 12. He replied "no, but I knew a lot of quadratic equations!"

After leaving Highgate School in 1921, he studied at City and Guilds College (part of Imperial College). He won a Governors' scholarship and joined the second year of the course. He graduated with a First-Class Honours BSc two years later.
During the early 1930s Blumlein and Herbert Holman developed a series of moving-coil microphones, which were used in EMI recording studios and by the BBC at Alexandra Palace.
In 1931, Blumlein invented what he called "binaural sound", now known as stereophonic sound or simply "stereo".
In early 1931, Blumlein and his wife were at a local cinema. The sound reproduction systems of the early "talkies" invariably only had a single set of speakers – which could lead to the somewhat disconcerting effect of the actor being on one side of the screen whilst his voice appeared to come from the other. Blumlein declared to his wife that he had found a way to make the sound follow the actor across the screen. The genesis of these ideas is uncertain, but he explained them to Isaac Shoenberg in the late summer of 1931. His earliest notes on the subject are dated 25 September 1931. The application was accepted on 14 June 1933.

top

Alan Dower Blumlein

Externalisation

Blumlein did not use the word "stereophonic" anywhere in his patent, but he did use the word "binaural." It was well known during the fifty years before Blumlein, that two microphones, spaced the width of the human head, feeding a remote pair of headphones, produced very realistic sound images with solid, stable, directional attributes. The problem was that the sound sources all seemed to lie within ones head or in psychoacoustic parlance, be internalized. What Blumlein sought to do was to externalize this binaural effect using loudspeakers. Externalizing the binaural effect over a full 360-degree sphere is still the Holy Grail of acoustics, particularly among those designing virtual reality video systems that also require an audio counterpart. The now dormant IMAX large screen 3d movie system uses earphones placed about an inch out from in front of the ears as well as speakers behind the screen, behind the audience, and above and below the screen to produce a full (periphonic) acoustic sphere. If home video watchers are prepared to wear earphones as well as have loudspeakers in their home movie theaters this is a very effective technology, but one that is not necessary to realistically reproduce staged musical events as opposed to movies.

Other attempts to externalize the binaural effect over a full sphere or just a circle, include, ambisonics, surround sound and the plethora of computer companies at work generating the virtual reality sound fields for the multimedia applications referred to above. Fortunately, our music problem is, and Blumlein's was, less complex since we need only consider a relatively small part of this sphere and we can assume that all direct music sound sources originate on a single flat stage in front of us (or in electronic scores a flat stage behind as well). In fact, Blumlein's first priority was to provide a better front stage sound for movies shown in theaters.

Blumlein was awarded his patent covering what we now call stereophonic sound reproduction officially on June 14th, 1933. Thus the basic stereo listening triangle is over 75 years old and just as Einstein's theory of relativity eventually refined Newtonian physics, it may be time to reexamine and modify the bedrock concepts upon which Blumlein imaging is based. And what better place to start than with Blumlein himself. Suppose one looked through Newton's treatises and found cryptic comments by Newton hinting that he knew his laws of matter, acceleration and gravity were not fully accurate at very high velocities and masses. We would then be justified in concluding that Newton had some insight into relativity but chose not to confound his contemporaries who had enough to deal with in distinguishing between mass and weight and who in any case found his formulas were always accurate enough to do jobs like getting rockets off the ground. Newton's laws still work very well today despite relativity if you are not too fussy. So it is with Blumlein. Blumlein's patent is salted with innuendoes and hints of things that should come.

Shortcomings

Blumlein knew that his reproduction method using two widely spaced loudspeakers was flawed, but the improvement in sound reproduction over mono was so apparent that there was no need to point out in detail its theoretical imperfections, and in any case he wanted his patent to be awarded and his invention used. However, he seemingly felt compelled to indicate to his technical posterity that he really did know precisely what was right and what was wrong with the stereophonic reproduction method he was proposing. (On the recording side, he had fewer problems and proposed the coincident stereo microphone and what we now call the Blumlein shuffler, both concepts later elaborated on in Ambisonics.) Thus in a paragraph discussing the difference between low frequency phase differences and high frequency intensity differences in providing directional cues, he writes "It can be shown, however, that phase differences necessary at the ears for low frequency directional sensation are not produced solely by phase differences at two loudspeakers (both of which communicate with both ears) (parentheses Blumlein's) but that intensity differences at the speakers are necessary to give an effect of phase difference".

What Blumlein was doing here was indicating that an unavoidable defect could be a virtue in one case. That is, he could not prevent both loudspeakers from having equal access to both ears at low frequencies, (or also having a less predictable access at all higher frequencies), so he came up with a recommended coincident microphone arrangement that counted on this low frequency loudspeaker crosstalk to provide for localization in the relatively narrow low frequency band where the ear can localize only on the basis of interaural phase differences. Thus crosstalk became a necessary evil in the coincident microphoning case. What Blumlein was really saying was that if your microphones produce signals at low frequencies that don't have any phase differences, (as is the case with any coincident microphones) then the loudspeaker crosstalk could save the day but at a cost in higher frequency intensity based localization that Blumlein himself was aware of but could not fully appreciate because of the limited frequency response of the equipment he had to work with.

top

Excuse me? Please explain again!

Study the image on the right in coherence with the text above and below.

The way the loudspeaker crosstalk helps in the low frequency case is as follows. At low frequencies it can be assumed that any sound from one speaker will produce the same sound pressure at both ears since the head is not an effective barrier to long wavelength sounds. But the signal will be slightly delayed in getting to the more remote ear. If now there is a second loudspeaker emitting the same low frequency signal, then when this second pair of soundwaves meets the first pair it will combine with the first pair to form a new soundwave. When two waveforms, that have the same shape, but differ in amplitude and also have a fixed time delay between them, are added together, the result is a new wave shifted in phase. At one ear the louder signal combines with the delayed softer signal. At the other ear the softer signal combines with the delayed louder signal. The results are identical amplitudes but different phase shifts at each ear and thus an interaural phase difference between the ears is created that is proportional to the original intensity difference between the microphones.

Of course, if you use a more common, non-coincident microphone technique, such as a head spaced array, this crosstalk can cause localization blurring. That Blumlein understood that this unavoidable crosstalk caused imaging problems at higher frequencies is clear from some of the other quotes below. He clearly seemed preoccupied with this issue as he prepared his text. In point of fact, we know today that this loudspeaker communication with both ears makes it impossible for standard stereo or its surround sound relatives to create a fully realistic and lifelike stage image. But wait. There is much more to be gleaned from Blumlein. Blumlein's hints to his audiophile posterity continue with "the sense of direction of the apparent sound source will only be conveyed to a listener for the full frequency range for positions lying between the loudspeakers" Thus Blumlein certainly understood that the width of the stage he could create with loudspeakers was limited by crosstalk to the space between those loudspeakers, a serious defect, but one that was not crucial to Blumlein since he was largely concerned with widely spaced loudspeakers in large movie theaters or halls that had fairly narrow screens or stages in comparison to the depth of the theater.

In the context of a patent application however, this is not the sort of observation one would ordinarily include. It is easy to understand why the maximum width of the stereophonic sound image is limited to the angle the speakers subtend at the listening position. Let us assume that a single sound source such as a trumpet is located stage right at 75-degrees. Let us further assume that under these circumstances the sound reaching the left microphone in a stereo recording setup is negligible and therefore no audible sound comes from the left speaker during playback. The trumpet sound blares forth from the right loudspeaker at normal intensity. If the right speaker is at the usual 30-degree angle from the centerline of the normal stereo playback triangle, then the trumpet will appear to be sounding from that position instead of from 75-degrees. This is of course the everyday real life situation where we can easily locate the source of any discrete sound that reaches both ears without impediment.

The left image below shows what was discussed immediately above.

The right image below shows an ambiophonic speaker setup, in which a physical barrier between the loudspeakers eliminates loudspeaker crosstalk, allowing the 75° angle to be maintained throughout reproduction.

Many of us have, however, heard recordings of stereo systems that do sometimes produce images that come from beyond the speakers and some audiophiles believe that if they could only get perfect recordings, speakers, cables and electronics, the image would open out. Blumlein was also loath to admit defeat on this point. He writes "but if it is desired to convey the impression that the sound source has moved to a position beyond the space between the loudspeakers the modifying networks may be arranged to reverse the phase of that loudspeaker remote from which the source is desired to appear, and this will suffice to convey the desired impression for the low frequency sound." (hang on to that word "low") This suggestion makes sense in a particular movie scene where you could briefly reverse the phase of one speaker to move dialog or a sound effect off screen, but we know that leaving one speaker out-of-phase all the time does not work for music reproduction via the stereo triangle.

What Blumlein was suggesting is a primitive form of logic steering thus foreshadowing Dolby Pro-Logic. But he has explained why sometimes images do appear beyond the position of the loudspeakers. Any inadvertent phase reversal of a spot microphone in the recording mix or an out-of-phase driver, or a large phase shift in the crossover network of a three or four way loudspeaker system or a reflection from the wall behind a dipole loudspeaker can convince even experienced listeners that wider stages can be achieved, somehow, using normal stereo technology. Unfortunately, logic steering, surround coding and even multi-channel recording methods cannot achieve the binaural ideal that Blumlein was striving for.

top

Loudspeaker crosstalk in an image:
(observe the changing "shape" of the instrument itsels...)

And there's even more going on...

So far, Blumlein himself has told us that the stereophonic reproduction method has two inherent flaws. There is a third problem that Blumlein seems to have been aware of because of his use of the word "low" in the last quote. This is the image position distortion caused by higher frequency sounds that hit the pinnae from angles that do not correspond to the actual angles of the recorded source. Thus, perhaps Blumlein had trouble moving a birdcall off stage using his phase reversal trick. A related issue is the question of recorded ambience and here Blumlein appears to be struggling with the problem of reproducing such recorded hall ambience from the proper direction. "The reflected sound waves which arise during recording will be reproduced with a directional sense and will sound more natural than they would with a non-directional system. If difficulties arise in reproduction, they may be overcome by employing a second pair of loudspeakers differently spaced and having a different modifying network from the first pair." While the vocabulary may be a bit different, this is a pretty good description of surround sound or Ambisonics and is also the basic starting point for the ambience and imaging system I have called "Ambiophonics".

Click the link above for a mighty interesting alternative to our well-known,
yet flawed stereo reproduction...

See also the image right and the two images above.

Human sound localization is possible using three and only three sonic clues (not counting bone conduction)

Time, including phase and transient edge, differences between the ears. This ITD includes the precedence effect.

Sound level differences between the ears. (ILD)

Single and twin eared pinna direction finding effects.

Each of these mechanisms is only effective in a specific frequency range but they overlap and the predominance of one over the other also depends on genetics, the nature of the signal, i.e. sinewave, pink noise, music, or venue, etc.

For a full range complex sound such as music, experienced live, all three mechanisms are always in play and normally agree. By definition such an experience is said to be realistic or, better phrased for the creative and artistic recording fraternity, said to yield guaranteed physiological verisimilitude. If the three mechanisms are not consistent then we often make errors in localization such as in most earphone listening where the interference with the pinna and head shadow usually result in internalization even if the ITD, including some deliberate ILD crosstalk, is perfect.

Before we get to stereophony, let me discuss the relative strengths of the three mechanisms listed above. Snow and Moir in their classic papers showed that localization of complex signals in the pinna range above 1000 Hz was superior by a few degrees, to localization that relied solely on complex lower frequencies. That is, their subjects could localize bands of high frequencies to within one half a degree but only to one or two degrees at lower frequencies. The accuracy of localization, in general, declines with frequency until at 90 Hz or so, as Bose has demonstrated, it goes to zilch. Remember this when we get to discuss crosstalk.

the fundamental basic setting for Ambiophonics

Mechanisms for localisation

It is important for understanding the workings of Stereophony that you are convinced that all three mechanisms are significant and I would suggest, with Keele, Snow, and Moir, that the Pinnae are first among equals. You should satisfy yourself on some of this by running water in a sink to get a nice complex high frequency source. Close your eyes to avoid bias, block one ear to reduce ILD and ITD, and see if you can localize the water sound with just the one open ear. Point to the sound, open your eyes, and like most people you will be pointing correctly within a degree or so. With both ears you should be right on despite having a signal too high in frequency to have much ITD or ILD. But with two pinnae agreeing and the zero ILD clue, the localization is easily accurate.

Again, if a system like stereo or 5.1 cannot deliver, the ITD, ILD and Pinna cues intact without large errors it cannot ever deliver full localization versimilitude for signals like music. If the cues are inconsistent, localization may occur but it is fragile, it may vary with the note or instrument played, and such localization is usually accompanied by a sense that the music is canned, lacks depth, presence, etc. Mere localization is no guarantee of fidelity.

Let us now look at the stereo triangle in reproduction and the microphones used to make such recordings and see what happens to the three localization cues. Basically Stereophonics is an audible illusion, like an optical illusion. In an optical illusion the artist uses two dimensional artistic tricks to stimulate the brain into seeing a third dimension, something not really there. The Blumlein stereo illusion is similar in that most brains perceive a line of sound between two isolated dots of sound. Like optical illusions, where one is always aware that they are not real, one would never confuse the stereophonic illusion with a live binaural experience. For starters, the placement of images on the line is nonlinear as a function of ITD and ILD, and the length of the line is limited to the angle between the speakers. (I know, everyone, including Blumlein, has heard sounds beyond the speakers on occasion but diatribe space is limited.)

I want to get to the ILD/ITD phantom imaging issue involved in this topic. But let us first get the pinna issue tucked away. No matter where you locate a speaker, high frequencies above 1000 Hz can be detected by the pinna and the location of the speaker will be pinpointed unless other competing cues override or confuse this mechanism. In the case of the stereo triangle the pinna and the ILD/ITD agree near the location of the speakers. Thus in 5.1 LCR triple mono sounds fine especially for movie dialog. In stereo, for central sounds, the pinna angle impingement error is overridden by the brain because the ITD and the ILD are consistent with a centered sound illusion since they are equal at each ear. The brain also ignores the bogus head shadow since its coloration and attenuation is symmetrical for central sources and not large enough to destroy the stereo sonic illusion. Likewise, the comb-filtering due to crosstalk, in the pinna frequency region, interferes with the pinna direction finding facility thus forcing the brain to rely on the two remaining lower frequency cues. All these discrepancies are consciously or subconsciously detected by golden ears who spend time and treasure striving to eliminate them and make stereo perfect. Similarly, the urge to perfect 5.1 is now manifest.

Consider just the three front speakers in 5.1. Unless we are talking about three channel mono, we really have two stereo systems side by side. Remember, stereo is a rather fragile illusion. If you listen to your standard equilateral stereo system with your head facing one speaker and the other speaker moved in 30-degrees, you won't be thrilled. The ILD is affected since the head shadows are not the same with one speaker causing virtually no head shadow and the other a 30 degree one. Similarly the pinna functions are quite dissimilar. (In the LCR arrangement the comb-filtering artifacts now are at their worst in two locations at plus and minus 15-degrees instead of just around 0-degrees as in stereo) Thus for equal amplitudes (such as L&C) where a signal is centered at 15 degrees, as in our little experiment, the already freakish stereo illusion is badly strained. Finally, the ITD is still okay and partly accounts for the fact that despite the center speaker there is still a sweet spot in almost all home 5.1 systems. Various and quite ingenious 5.1 recording systems try to compensate for some of these errors but the results are highly subjective and even controversial. It is also probably lucky that in 5.1 recording, it is difficult to avoid an ITD since a coincident main microphone is seldom used in this environment.

top

the real problem, shown in yet another way...

Technical digression for Recording Engineers

Before getting to side imaging, there are some points on the relationship between microphones and reproductive crosstalk that should be elucidated. Whether crosstalk is beneficial or not depends on what frequency range you are talking about and thus what localization method you are relying on. At high frequencies, in the pinna range, stereo speaker crosstalk is obviously not a benefit. There is no way that this unpredictable pattern of peaks and valleys can enhance localization in a stereo or LCR system This is true whether spaced or coincident mics are used.

Stereo crosstalk can cause a phase shift at frequencies below where comb filtering predominates. That is, two sinewave signals with slightly different delays but with comparable amplitudes will combine to form a new sinewave with some different amplitude and phase angle. I maintain that the phase part of this change is inaudible from 90 Hz on down, nonexistent for the central 10-degrees and virtually non-existent for images from the far right or left, and thus of doubtful audibility in between or in LCR systems. Stereo crosstalk cannot create an ITD for transients captured in coincident mic recordings but it can shift the phase of midbass and low bass. But there is no evidence that the small phase shifts of this type are audible or affect localization. If spaced mics are used, then there is an ITD and crosstalk has little deleterious effect but likewise no benefit.

The ILD is a slightly different story. In the low bass, say below 90 Hz, the phase difference between the direct sound and the crosstalk sound is too small (heads are too small) to cause any significant change in phase and thus change amplitude at an ear when the two signals are added together. So regardless of the microphone used, low bass crosstalk is not the issue. Again, I maintain that the very low bass energy at both ears remains almost the same even if the left and right signals are different in amplitude as in coincident mic'ing. As Blumlein observed, as the frequency goes up the path length difference is equivalent to larger phase angles and so, if there is a difference in amplitude, between the speakers, the signals will go up at one ear and down at the other as the signals are combined on each side of the head. Clearly if the phase shift gets to 90-degrees on up this same crosstalk mechanism becomes detrimental. This boost in mid bass separation is only applicable to phantom stereo images around 15-degrees. In the center there is no crosstalk amplitude asymmetry to take advantage of and at 30-degrees where the speakers are, hopefully, the stereo separation ensures that the crosstalk has little to add to or subtract from.

If spaced microphones are used, the ILD at low frequencies may be minimum especially for omnis. But let us assume that above 90 Hz there is a substantial ILD as well as an ITD. In this case the LF effect of the crosstalk phase change is sort of unpredictable. Again in the 15 degree region there could be enhancement of bass separation but the ITD induced phase shift could counter this. In summary, crosstalk is really only desirable in the case of coincident mic stereo recordings, as Blumlein wrote, and only if restricted to frequencies below 300 Hz or so as I claim.

Surround Sound Localization

Let us consider surround sound localization. Obviously, if a mono signal is placed at 110 degrees it can be localized using pinna, ILD, and ITD even when facing forward. Between the two rear surround speakers you have effectively a stereo pair spanning 140 degrees. In such a situation, if there is a lot of high frequency energy, the pinna will localize to the speakers and it will be difficult for some individuals to hear sound directly behind or in the central rear region. (The new rear surround channel can fix this, but the LCR anomalies as above will then apply.) However, if there is a real ITD and a real ILD between the rear speakers it is theoretically possible to hear a wide stage to the rear as in the frontal stereo illusion. However the crosstalk, and thus the comb-filtering, is extreme at this angle and it starts at a lower frequency thus interfering with the ILD at 800 Hz or lower. If there is an ITD this can help but then the speakers must be properly placed or delay adjusted. Obviously, if 140-degree spacing was a good way to make a stereo stage, front or rear, it would have been done this way long before now.

Finally, let us see what happens when we try to image from the front side speaker to a speaker at 110 degrees on the same side while facing forward. In the case of the pinnae, the pinna facing the speakers can localize to each speaker discretely if the signals are different. If they are correlated or identical, the brain will use some other cue to localize. There may be some gifted individuals who can localize high frequency phantoms between the speakers using one pinna but I can't do it. The higher frequencies also go around the head to produce a head shadow and this at least allows the brain to decide the source is at the loud side.

If there is a time difference, then the two signals from each speaker reach the exposed ear canal and add together to produce garbage and a head shadowed version of this time garbage also reaches the far ear. Basically, regardless of the recorded TD, the ITD the brain perceives is always the ITD based on one's ear spacing. However this is sufficient to localize to the louder side but makes localization between the speakers wishful thinking.

If there is a level difference, then the two signals from each speaker reach the exposed ear canal and add together to produce garbage and a head shadowed version of this level difference garbage also reaches the far ear. Basically regardless of the recorded LD, the ILD the brain perceives is always the ILD based on one's head shadow. However, as above, this is sufficient to localize to the louder side but makes localization between the speakers wishful thinking.

That the above scenario is more or less correct is attested to by the fact that the industry keeps adding more speakers to correct these defects. We have the rear center speaker (6.1), height speakers front and rear (extensions of 5.1. / 6.1 / 7.1), the 10.2 proposals and we also have Atmos now.

top

official 5.1-5.2 / 6.1-6.2 / 7.1-7.2 opstellingen

and still, this will not suffice...