mp3 decoder tests

  Test Methodology  

Here are the relevant facts about the procedure and software used throughout these tests, and the construction of this website...




This started life as a simple little project. After all, all mp3 decoders are alike, aren't they? I was expecting a bunch of 1-bit differences. I was shocked to find that most produced errors, some obviously audible. Some of the points in these tests will seem picky, but why do people (often huge companies) produce software that simply doesn't to what it says? Maybe I'm just a perfectionist...

Comparing one decode of an mp3 with another yields some spectacular differences. Sometimes, the differences appear larger than they are because, for instance, one decoder has time shifted or phase shifted one tiny frequency component. If you compare the two files by subtraction, that component will stick out, but by listening to the two it may not always be obvious. Comparing two decoders, it's not always obvious which one is correct - that's why I compared lots of decoders together, and also why I used some tests that didn't require a comparison (e.g. 100Hz test, Least Significant Bit test).

Subtracting one file from another in Cool Edit Pro is a simple process. Load the first file, hit copy. Load the second file, hit Mix Paste:100%:invert. If the files aren't in sync, then you invert one, put them both in the multitrack view, and sync them by hand (it's easy when they start and end with impulses). The mixdown then gives the difference. I used 16-bit mixdowns because 1 16 bit number subtracted from another 16 bit number doesn't give a rounding error, so you don't need the 32-bit accuracy. If you find one file is at a slightly different level, you can also correct for this in multitrack view to get the best possible comparison. However, the match can never be perfect, so it's difficult to state the low-level relative accuracy of files decoded at the incorrect level.

There is a slight bug in CEP. Subtracting one file from another works well until you get a clipped (full scale) sample. You can't invert this correctly, because there's 1 more sample available above zero than below (or is it the other way around?). If you invert using the invert function, it just shifts full scale samples one sample down. (i.e. a -32768 sample becomes a +32767 sample because +32768 would require 17-bits and we're using a 16-bit system) Fair enough. Your subtraction will then show some extra 1 sample differences due to the originals being full scale and the arithmetic range not being there to subtract them correctly. However, if you use the mix paste invert function, it doesn't give the correct result. If file A has a negative full scale sample, and file B has an identical negative full scale sample, the difference comes out as another negative full scale sample! The correct answer would be zero. So if you compare two high amplitude decodes (with lots of full scale samples) it can seem that there are huge errors, when in fact there are none.

The CD recording programs which carried out mp3 decoding on the fly were tested by dragging all the mp3 files to the track list, and burning a CD. This CD was then played on a stand-alone audio CD player, connected to the CardD+ digital sound card via a digital coaxial link. The tracks were recorded in Cool Edit Pro with "correct DC on record" disabled. The results were synced, saved, and compared as normal. I didn't rip the CD audio tracks using my PC CD-ROM drive because it's a naff drive and occasionally makes mistakes. I can overcome this with Sectorsynchronisation, but this doesn't synchronise silence, a lot of which was included in these tests.

The allocation of colours in the objective sound quality table may seem arbitrary, but it's intended to split things that people might actually hear, from things that only blokes (because it's usually blokes :-) with spectrum analysers will pick up. Generally, where a decoder will harm actual music signals, it's noted in the individual decoder results.

It's generally stated that the encoder has the largest effect on sound quality. However, if you pick a good encoder, but a bad decoder, then the decoder may do more harm.

Of the dithered sine waves, the first triangular shaped dithered one was generated in CEP, whilst the second rectangular shaped one (which only toggles a single bit on and off - the smallest signal you can have in the 16-bit domain) was generated in Matlab. On my system, they are audible, though swapping the Sony DAC for a better Meridian one makes problems around the LSB in the digital domain more audible - the Sony isn't very linear around the last bit - it was one of the first external DACs. (Apparently it was imported from the USA or Japan, and it runs via a huge transformer which converts our native 230V 50Hz supply into the 110V 60Hz it requires! No, I don't know where you can get such a transformer today - it looks home made.

The 24-bit tests were hard work. Cool Edit Pro won't graph waveforms below the 16th bit, even in 24-bit mode, unless you zoom in horizontally (time axis) as well as vertically (amplitude axis). Generating the original test .wav was easy enough, and once I'd saved it in the correct PCM format (24-bit packed intel PCM) mp3enc3.1 happily encoded it. Both MAD and l3dec give 24-bit ASCII HEX output. This is a text file, containing millions of groups of 6 hexadecimal numbers (e.g. FFFFFF). I wrote a script in MATLAB to convert from this format to a 24-bit .wav that Cool Edit Pro could read. Finally I could analyse the output - I usually flipped the last (least significant) 8-bits into the first 8-bits, so I could hear the result easily using a 16-bit DAC. Sadly, after all this effort, MAD wasn't as good as expected, though l3dec was superb. I measured a 126dB dynamic range from a 16-bit wav file, generated by decoding an mp3 to 24-bits using l3dec, then using Cool Edit Pro to noise shaped dither this to 16-bits.

I have never experienced as many system crashes as during these tests! All these stupid players trying to run their start-up utils and sys-tray extensions and splash screens. The worst is HyCD - crashes your system before you even run it, though the little HyCD sampler utility itself doesn't do this if you've killed the rest of it. Sonique draws random lines across the screen when it's finished playing a song. Netscape 4.73 seems hopelessly flakey on my system, and IE 5.5 Beta often seems two steps behind in updating its URL field, so you never know what page you're looking at. Word and Excel never crashed (I don't think I've ever had Excel crash!). Likewise Cool Edit Pro seems rock solid, and if anything else takes the system down while it's running, it'll try to recover next time it's run, though this feature rarely works for me.

This is the first web site I've created using Style Sheets. I found an invaluable guide to Cascading style sheets, though finding out what doesn't work in various browsers (Netscape isn't my favourite browser anymore!) is frustrating. The trick, as always, is backwards compatibility. I write all my HTML in Wordpad, and have been writing web pages since 1994.

Other web sites I've created or maintained include my home page, Immunoporation Ltd, Cell Transfection Technology, Aanvil Audio, UK dealers for Audio Physic loud speakers, LFD, Meracus, and audiophile CDs, The Christian Union at the University of Essex, and the Audio research lab. Somehow a picture of me ended up here.

If you have any questions about these tests, please feel free to email me.