mirror of
https://github.com/cookiengineer/audacity
synced 2025-04-30 07:39:42 +02:00
115 lines
4.3 KiB
Plaintext
115 lines
4.3 KiB
Plaintext
Psychoacoustic Models in TwoLAME
|
|
================================
|
|
|
|
|
|
Introduction
|
|
------------
|
|
|
|
In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which
|
|
are the sonically important parts of the waveform that is being encoded. The PAM
|
|
looks for loud sounds which may mask soft sounds, noise which may affect the level
|
|
of sounds nearby, sounds which are too soft for us to hear and should be ignored
|
|
and so on. The information from the PAM is used to determine which parts of the
|
|
spectrum should get more bits and thus be encoded at greater quality - and which
|
|
parts are inaudible/unimportant and should thus get fewer bits.
|
|
|
|
In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes
|
|
a 'frame'. For each frame the PAM outputs just *32* values
|
|
(The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
|
|
There are only 32 values to determine how to alloctate bits for 1152 samples - this
|
|
is a pretty coarse technique.
|
|
|
|
The different PAMs listed below use different techniques to decide on these 32
|
|
values. Some models are better than others - meaning that the 32 values chosen
|
|
are pretty good at spreading the bits where they should go. Even with a really
|
|
bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
|
|
All of these models have strengths and weaknesses. The model 'you' end up using
|
|
will be the one that produces the best sound for your ears, for your audio.
|
|
|
|
Psychoacoustic Model -1
|
|
-----------------------
|
|
|
|
This PAM doesn't actually look at the samples being encoded to decide upon the
|
|
output values. There is simply a set of 32 default values which are used,
|
|
regardless of input.
|
|
|
|
*Pros*: Faaaast. Low complexity. Surprisingly good.
|
|
"Surprising" in that the other PAMs go to the effort of calculating FFTs
|
|
and subbands and masking, and this one does absolutely *nothing*.
|
|
Zip. Nada. Diddly Squat. This model might be the best example of why
|
|
it is hard to make a good model - if having no computations sounds OK,
|
|
how do you improve on it?
|
|
|
|
*Cons*: Absolutely no attempt to consider any of the masking effects that
|
|
would help the audio sound better.
|
|
|
|
|
|
Psychoacoustic Model 0
|
|
----------------------
|
|
|
|
This PAM looks at the sizes of the 'scalefactors' for the audio and combines
|
|
it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.
|
|
|
|
*Pros*: Faaast. Low complexity.
|
|
|
|
*Cons*: This model has absolutely no mathematical basis and does not use
|
|
any perceptual model of hearing. It simply juggles some of the numbers of
|
|
the input sound to determine the values. Feel free to hack the daylights out
|
|
of this PAM - add multipliers, constants, log-tables *anything*. Tweak it until
|
|
you begin to like the sound.
|
|
|
|
|
|
Psychoacoustic Model 1 and 2
|
|
----------------------------
|
|
|
|
These PAMs are from the ISO standard. Just because they are the standard,
|
|
doesn't mean that they are any good. Look at LAME which basically threw out
|
|
the MP3 standard psycho models and made their own (GPSYCHO).
|
|
|
|
*Pros*: A reference for future PAMs
|
|
|
|
*Cons*: Terrible ISO code, buggy tables, poor documentation.
|
|
|
|
|
|
Psychoacoustic Model 3
|
|
----------------------
|
|
|
|
A re-implementation of psychoacoustic model 1. ISO11172 was used as the guide
|
|
for re-writing this PAM from the ground up.
|
|
|
|
*Pros*: No more obscure tables of values from the ISO code. Hopefully a good
|
|
base to work upon for tweaking PAMs
|
|
|
|
*Cons*: At the moment, doesn't really sound any better than PAM1
|
|
|
|
|
|
Psychoacoustic Model 4
|
|
----------------------
|
|
|
|
A cleaned up version of PAM2.
|
|
|
|
*Pros*: Faster than PAM2. No more obscure tables of values from the ISO
|
|
standard. Hopefully a good base to work from for improving the PAMs
|
|
|
|
*Cons*: Still has the same "warbling"/"Davros" problems as PAM2.
|
|
|
|
|
|
|
|
Future psychoacoustic models
|
|
----------------------------
|
|
|
|
There's a heap that could be done. Unfortunately, I've got a set of tin
|
|
ears, crappy speakers and a noisy computer room. If you've got the
|
|
capability to do proper PAM testing then please feel free to do so.
|
|
Otherwise, I'll just keep plodding along with new ideas as they
|
|
arise, such as:
|
|
|
|
- Temporal masking (there's no pre-echo or anything in TwoLAME)
|
|
- Left Right Masking
|
|
- A PAM that's fully tuneable from the command line?
|
|
- Graphical output of SMR values etc. Would allow better debugging of PAMs
|
|
- Re-sampling routines
|
|
- Low/High pass filtering
|
|
|
|
|