mirror of
https://github.com/cookiengineer/audacity
synced 2025-05-04 17:49:45 +02:00
133 lines
5.4 KiB
HTML
133 lines
5.4 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
|
|
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<meta name="generator" content="AsciiDoc 7.1.2" />
|
|
<link rel="stylesheet" href="./twolame.css" type="text/css" />
|
|
<link rel="stylesheet" href="./twolame-quirks.css" type="text/css" />
|
|
<title>Psychoacoustic Models in TwoLAME</title>
|
|
</head>
|
|
<body>
|
|
<div id="header">
|
|
<h1>Psychoacoustic Models in TwoLAME</h1>
|
|
<span id="revision">version 0.3.11</span>
|
|
</div>
|
|
<h2>Introduction</h2>
|
|
<div class="sectionbody">
|
|
<p>In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which
|
|
are the sonically important parts of the waveform that is being encoded. The PAM
|
|
looks for loud sounds which may mask soft sounds, noise which may affect the level
|
|
of sounds nearby, sounds which are too soft for us to hear and should be ignored
|
|
and so on. The information from the PAM is used to determine which parts of the
|
|
spectrum should get more bits and thus be encoded at greater quality - and which
|
|
parts are inaudible/unimportant and should thus get fewer bits.</p>
|
|
<p>In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes
|
|
a <em>frame</em>. For each frame the PAM outputs just <strong>32</strong> values
|
|
(The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
|
|
There are only 32 values to determine how to alloctate bits for 1152 samples - this
|
|
is a pretty coarse technique.</p>
|
|
<p>The different PAMs listed below use different techniques to decide on these 32
|
|
values. Some models are better than others - meaning that the 32 values chosen
|
|
are pretty good at spreading the bits where they should go. Even with a really
|
|
bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
|
|
All of these models have strengths and weaknesses. The model <em>you</em> end up using
|
|
will be the one that produces the best sound for your ears, for your audio.</p>
|
|
</div>
|
|
<h2>Psychoacoustic Model -1</h2>
|
|
<div class="sectionbody">
|
|
<p>This PAM doesn't actually look at the samples being encoded to decide upon the
|
|
output values. There is simply a set of 32 default values which are used,
|
|
regardless of input.</p>
|
|
<p><strong>Pros</strong>: Faaaast. Low complexity. Surprisingly good.
|
|
"Surprising" in that the other PAMs go to the effort of calculating FFTs
|
|
and subbands and masking, and this one does absolutely <strong>nothing</strong>.
|
|
Zip. Nada. Diddly Squat. This model might be the best example of why
|
|
it is hard to make a good model - if having no computations sounds OK,
|
|
how do you improve on it?</p>
|
|
<p><strong>Cons</strong>: Absolutely no attempt to consider any of the masking effects that
|
|
would help the audio sound better.</p>
|
|
</div>
|
|
<h2>Psychoacoustic Model 0</h2>
|
|
<div class="sectionbody">
|
|
<p>This PAM looks at the sizes of the <em>scalefactors</em> for the audio and combines
|
|
it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.</p>
|
|
<p><strong>Pros</strong>: Faaast. Low complexity.</p>
|
|
<p><strong>Cons</strong>: This model has absolutely no mathematical basis and does not use
|
|
any perceptual model of hearing. It simply juggles some of the numbers of
|
|
the input sound to determine the values. Feel free to hack the daylights out
|
|
of this PAM - add multipliers, constants, log-tables <strong>anything</strong>. Tweak it until
|
|
you begin to like the sound.</p>
|
|
</div>
|
|
<h2>Psychoacoustic Model 1 and 2</h2>
|
|
<div class="sectionbody">
|
|
<p>These PAMs are from the ISO standard. Just because they are the standard,
|
|
doesn't mean that they are any good. Look at LAME which basically threw out
|
|
the MP3 standard psycho models and made their own (GPSYCHO).</p>
|
|
<p><strong>Pros</strong>: A reference for future PAMs</p>
|
|
<p><strong>Cons</strong>: Terrible ISO code, buggy tables, poor documentation.</p>
|
|
</div>
|
|
<h2>Psychoacoustic Model 3</h2>
|
|
<div class="sectionbody">
|
|
<p>A re-implementation of psychoacoustic model 1. ISO11172 was used as the guide
|
|
for re-writing this PAM from the ground up.</p>
|
|
<p><strong>Pros</strong>: No more obscure tables of values from the ISO code. Hopefully a good
|
|
base to work upon for tweaking PAMs</p>
|
|
<p><strong>Cons</strong>: At the moment, doesn't really sound any better than PAM1</p>
|
|
</div>
|
|
<h2>Psychoacoustic Model 4</h2>
|
|
<div class="sectionbody">
|
|
<p>A cleaned up version of PAM2.</p>
|
|
<p><strong>Pros</strong>: Faster than PAM2. No more obscure tables of values from the ISO
|
|
standard. Hopefully a good base to work from for improving the PAMs</p>
|
|
<p><strong>Cons</strong>: Still has the same "warbling"/"Davros" problems as PAM2.</p>
|
|
</div>
|
|
<h2>Future psychoacoustic models</h2>
|
|
<div class="sectionbody">
|
|
<p>There's a heap that could be done. Unfortunately, I've got a set of tin
|
|
ears, crappy speakers and a noisy computer room. If you've got the
|
|
capability to do proper PAM testing then please feel free to do so.
|
|
Otherwise, I'll just keep plodding along with new ideas as they
|
|
arise, such as:</p>
|
|
<ul>
|
|
<li>
|
|
<p>
|
|
Temporal masking (there's no pre-echo or anything in TwoLAME)
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Left Right Masking
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
A PAM that's fully tuneable from the command line?
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Graphical output of SMR values etc. Would allow better debugging of PAMs
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Re-sampling routines
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Low/High pass filtering
|
|
</p>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
<div id="footer">
|
|
<div id="footer-text">
|
|
Version 0.3.11<br />
|
|
Last updated 09-Jan-2008 11:45:18 BST
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|