Saturday, October 28, 2006, 01:27 PM - LAMEFor the sake of curiosity, I tested the evolution of encoding speed of Lame using different versions. To simplify things, encoding is done at default settings (128kbps/cbr). Computer features MMX/3dnow/SSE/SSE2, audio file is about 1 minute long.
Here are the results:
By looking at this, it's clear that overall Lame has became slower over time. We tryed to keep speed reasonable, but as we increased quality our speed optimisations were not enough to compensate for the extra computations.
We had a notable speed decrease when releasing 3.94. In 3.94, we switched our default psychoacoustic model from GPsycho to NSPsytune. NSPsytune was already used (starting with 3.90) when using the preset/alt-preset settings. To have a comparison value, here is the "preset cbr 128" speed of versions prior to 3.94:
3.90 --alt-preset cbr 128: 9.62s
3.93.1 --preset cbr 128: 6.55s
So between 3.90 and 3.97, NSPsytune's encoding time decreased from 9.62s to 5.31s. Not bad at all, but still quite slower than early Lame releases.
Sunday, October 22, 2006, 11:39 AM - Audio codingADPCM is a very basic audio coding scheme, and is quite fast to encode and decode. The drawback is that compression is quite limited (fixed to 4 bits per sample), and so is the quality.
As it's based on encoding a difference from the predicted sample, and that prediction is based on previous samples, I already thought about a modified ADPCM encoder that would work on a window of a few samples, in order to select the optimal encoding for this whole window, instead of only considering the current sample.
I've since found a blog post by M.Neidermayer about this, and his explanations are way better than mine.
Now, let's take this further: what about a noise shaped ADPCM encoding? Instead of a simple SNR optimisation we could shape the ADPCM noise in the frequency domain. We could base it on SMR (signal to mask ratio) values, but even just using the ATH (absolute threshold of hearing) would probably be a big improvement.
To do this, we could consider a bigger window of samples. 128 samples would be comfortable regarding frequency resolution, but down to 32 samples would probably be useable. Over this window, we would do a time to frequency transform, and do an election of the best ADPCM encoded vector based on the distance from the ideal frequency based distortion.
The problem is now the computationnal cost of this. Using a coarse 32 samples window, we would need to compute 4^32 quantizations and transforms, which would be prohibitive. (please note how I managed to change ADPCM from a very fast coding scheme to something that would take years to compute)
The open question is now: how to reduce the complexity of this?
Saturday, October 21, 2006, 04:28 PM - BlogHere it is, my first post on my new (first) blog.
I finally decided to set up a blog, in order to be able to voice my opinion and express my ideas about things that do not fit into mp3-tech.org.
I wanted to have a simple flat file blog, in order to retrieve its data in an easier way than using a mysql database. I finally settled up on simplePhpBlog. I first tested it on a local AMP stack, and everything was fine. However, when installing on my webhost, it was clearly not working properly. I lost a few hours figuring out how to have it working. The answer is simply because if you are hosted on online.net, then you first need to create yourself a "sessions" directory at the root of your website.
Now, I still have to set up a few links so this blog will be reachable from my personnal page.