I couldn't agree with you more about the effect of digital jitter on a listener's ability to make the closest emotional connection with music. And in particular "...When those stimuli contain distortions that don't exist in the real world, the brain spends lots of processing power trying to decipher...a coherent picture. With the brain thus occupied,...the very reason we pursue that sensory input in the first place." I would hasten to add that inverted polarity also doesn't occur in nature which irritates and fatigues the brain. Thus it's quite important to hear reproduced music such that its compressions and rarefactions are in sync with the live performance. This also implies that minimum phase speakers with low order crossovers or no crossovers are potentially higher fidelity than those that aren't. And likewise since the speed of sound is frequency independent, all frequencies should arrive at a listener's ear simultaneously which implies that all of a speaker's drivers should be time-alined to the listener's ears. The only way that nondigital crossed over speakers can be time-aligned is with the physical positioning of its individual drivers so that their sonic centers are the same distance from the listener's ears for the frequencies in the center of their individual passbands.
With speakers with non-coincident drivers the time-aliment, if it occurs at all, is only present for each channel at one x-y-z coordinate point in space for each speaker. Thus it's not possible for multiple listeners of the same speakers to be in the sweet spot simultaneously. However, if multiple listeners use headphones, then all of them will be able to hear the music time-alined. Another distortion caused by non-time-aligned drivers is frequencies occurs in the overlap region of the high and low pass region of the speaker's crossovers because they will have different time arrivals at the listener's ear which creates a comb filter and a general roughness to the sound. And if a listener isn't the same distance from both the left and right channels, besides a shifting of the images toward the near channel, the sounds that supposed to be centered between the two channels will be also be comb filtered as will all the sound to some degree or another that emanates from both speakers for two microphone stereo recordings. Although multi-microphone recordings with extreme pan potting to either the left and right channels won't suffer from comb filtering but they're hardly high-fidelity. If one could have a system with a separate speaker for each performer positioned in the listening room as the were on stage and each performer was recorded with a single near field microphone then comb filtering at different listening positions would be eliminated but that's not exactly a practical alternative to stereo recordings. Dummy-head recordings reproduced over headphones. For a single listener positioned between two speakers, with one speaker opposite their left and right ears, again a dummy heady recording is potentially the most accurate in terms of all the above-mentioned criteria. So for multiple listeners two time-aligned minimum phase speakers is probably the most practical path to high fidelity even though there's only one listener at a time in the true sweet spot the sound can be very good and emotionally rewarding for other listeners that are relatively near the sweet spot and even those listening from another room or for that matter outdoors.
In sum, I believe that there are many necessary but not sufficient conditions for high-fidelity reproduction, e.g. flat and wide frequency response, time-alignment, and absolute polarity with minimum phase, freedom from dynamic compression, and low noise, to mention just a few. But for the present, it seems to me that high fidelity is still part science and part art that relies at least as much upon subjective judgments as engineering.
George S. Louis