audacity/lib-src/twolame/doc/psycho.txt

Psychoacoustic Models in TwoLAME
================================


Introduction
------------

In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which
are the sonically important parts of the waveform that is being encoded.  The PAM
looks for loud sounds which may mask soft sounds, noise which may affect the level
of sounds nearby, sounds which are too soft for us to hear and should be ignored
and so on.  The information from the PAM is used to determine which parts of the
spectrum should get more bits and thus be encoded at greater quality - and which
parts are inaudible/unimportant and should thus get fewer bits.

In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes
a 'frame'. For each frame the PAM outputs just *32* values
(The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
There are only 32 values to determine how to alloctate bits for 1152 samples - this
is a pretty coarse technique.

The different PAMs listed below use different techniques to decide on these 32
values. Some models are better than others - meaning that the 32 values chosen
are pretty good at spreading the bits where they should go.  Even with a really
bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
All of these models have strengths and weaknesses.  The model 'you' end up using
will be the one that produces the best sound for your ears, for your audio.

Psychoacoustic Model -1
-----------------------

This PAM doesn't actually look at the samples being encoded to decide upon the
output values.  There is simply a set of 32 default values which are used,
regardless of input.

*Pros*: Faaaast. Low complexity. Surprisingly good.
"Surprising" in that the other PAMs go to the effort of calculating FFTs
and subbands and masking, and this one does absolutely *nothing*.
Zip. Nada. Diddly Squat. This model might be the best example of why
it is hard to make a good model - if having no computations sounds OK,
how do you improve on it?

*Cons*: Absolutely no attempt to consider any of the masking effects that
would help the audio sound better.


Psychoacoustic Model 0
----------------------

This PAM looks at the sizes of the 'scalefactors' for the audio and combines
it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.

*Pros*: Faaast. Low complexity.

*Cons*: This model has absolutely no mathematical basis and does not use
any perceptual model of hearing.  It simply juggles some of the numbers of
the input sound to determine the values. Feel free to hack the daylights out
of this PAM - add multipliers, constants, log-tables *anything*. Tweak it until
you begin to like the sound.


Psychoacoustic Model 1 and 2
----------------------------

These PAMs are from the ISO standard. Just because they are the standard,
doesn't mean that they are any good. Look at LAME which basically threw out
the MP3 standard psycho models and made their own (GPSYCHO).

*Pros*: A reference for future PAMs

*Cons*: Terrible ISO code, buggy tables, poor documentation.


Psychoacoustic Model 3
----------------------

A re-implementation of psychoacoustic model 1.  ISO11172 was used as the guide
for re-writing this PAM from the ground up.

*Pros*: No more obscure tables of values from the ISO code. Hopefully a good
base to work upon for tweaking PAMs

*Cons*: At the moment, doesn't really sound any better than PAM1


Psychoacoustic Model 4
----------------------

A cleaned up version of PAM2.

*Pros*: Faster than PAM2. No more obscure tables of values from the ISO
standard. Hopefully a good base to work from for improving the PAMs

*Cons*: Still has the same "warbling"/"Davros" problems as PAM2.


Future psychoacoustic models
----------------------------

There's a heap that could be done. Unfortunately, I've got a set of tin
ears, crappy speakers and a noisy computer room.  If you've got the
capability to do proper PAM testing then please feel free to do so.
Otherwise, I'll just keep plodding along with new ideas as they
arise, such as:

- Temporal masking (there's no pre-echo or anything in TwoLAME)
- Left Right Masking
- A PAM that's fully tuneable from the command line?
- Graphical output of SMR values etc. Would allow better debugging of PAMs
- Re-sampling routines
- Low/High pass filtering