mirror of
https://github.com/cookiengineer/audacity
synced 2025-05-05 14:18:53 +02:00
155 lines
6.6 KiB
HTML
155 lines
6.6 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
|
|
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
|
|
<meta name="generator" content="AsciiDoc 8.6.3" />
|
|
<title>Psychoacoustic Models in TwoLAME</title>
|
|
<link rel="stylesheet" href="./twolame.css" type="text/css" />
|
|
<script type="text/javascript">
|
|
/*<![CDATA[*/
|
|
window.onload = function(){asciidoc.footnotes();}
|
|
/*]]>*/
|
|
</script>
|
|
<script type="text/javascript" src="./asciidoc-xhtml11.js"></script>
|
|
</head>
|
|
<body class="article">
|
|
<div id="header">
|
|
<h1>Psychoacoustic Models in TwoLAME</h1>
|
|
<span id="revnumber">version 0.3.13</span>
|
|
</div>
|
|
<div id="content">
|
|
<div class="sect1">
|
|
<h2 id="_introduction">Introduction</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which
|
|
are the sonically important parts of the waveform that is being encoded. The PAM
|
|
looks for loud sounds which may mask soft sounds, noise which may affect the level
|
|
of sounds nearby, sounds which are too soft for us to hear and should be ignored
|
|
and so on. The information from the PAM is used to determine which parts of the
|
|
spectrum should get more bits and thus be encoded at greater quality - and which
|
|
parts are inaudible/unimportant and should thus get fewer bits.</p></div>
|
|
<div class="paragraph"><p>In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes
|
|
a <em>frame</em>. For each frame the PAM outputs just <strong>32</strong> values
|
|
(The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
|
|
There are only 32 values to determine how to alloctate bits for 1152 samples - this
|
|
is a pretty coarse technique.</p></div>
|
|
<div class="paragraph"><p>The different PAMs listed below use different techniques to decide on these 32
|
|
values. Some models are better than others - meaning that the 32 values chosen
|
|
are pretty good at spreading the bits where they should go. Even with a really
|
|
bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
|
|
All of these models have strengths and weaknesses. The model <em>you</em> end up using
|
|
will be the one that produces the best sound for your ears, for your audio.</p></div>
|
|
</div>
|
|
</div>
|
|
<div class="sect1">
|
|
<h2 id="_psychoacoustic_model_1">Psychoacoustic Model -1</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>This PAM doesn’t actually look at the samples being encoded to decide upon the
|
|
output values. There is simply a set of 32 default values which are used,
|
|
regardless of input.</p></div>
|
|
<div class="paragraph"><p><strong>Pros</strong>: Faaaast. Low complexity. Surprisingly good.
|
|
"Surprising" in that the other PAMs go to the effort of calculating FFTs
|
|
and subbands and masking, and this one does absolutely <strong>nothing</strong>.
|
|
Zip. Nada. Diddly Squat. This model might be the best example of why
|
|
it is hard to make a good model - if having no computations sounds OK,
|
|
how do you improve on it?</p></div>
|
|
<div class="paragraph"><p><strong>Cons</strong>: Absolutely no attempt to consider any of the masking effects that
|
|
would help the audio sound better.</p></div>
|
|
</div>
|
|
</div>
|
|
<div class="sect1">
|
|
<h2 id="_psychoacoustic_model_0">Psychoacoustic Model 0</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>This PAM looks at the sizes of the <em>scalefactors</em> for the audio and combines
|
|
it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.</p></div>
|
|
<div class="paragraph"><p><strong>Pros</strong>: Faaast. Low complexity.</p></div>
|
|
<div class="paragraph"><p><strong>Cons</strong>: This model has absolutely no mathematical basis and does not use
|
|
any perceptual model of hearing. It simply juggles some of the numbers of
|
|
the input sound to determine the values. Feel free to hack the daylights out
|
|
of this PAM - add multipliers, constants, log-tables <strong>anything</strong>. Tweak it until
|
|
you begin to like the sound.</p></div>
|
|
</div>
|
|
</div>
|
|
<div class="sect1">
|
|
<h2 id="_psychoacoustic_model_1_and_2">Psychoacoustic Model 1 and 2</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>These PAMs are from the ISO standard. Just because they are the standard,
|
|
doesn’t mean that they are any good. Look at LAME which basically threw out
|
|
the MP3 standard psycho models and made their own (GPSYCHO).</p></div>
|
|
<div class="paragraph"><p><strong>Pros</strong>: A reference for future PAMs</p></div>
|
|
<div class="paragraph"><p><strong>Cons</strong>: Terrible ISO code, buggy tables, poor documentation.</p></div>
|
|
</div>
|
|
</div>
|
|
<div class="sect1">
|
|
<h2 id="_psychoacoustic_model_3">Psychoacoustic Model 3</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>A re-implementation of psychoacoustic model 1. ISO11172 was used as the guide
|
|
for re-writing this PAM from the ground up.</p></div>
|
|
<div class="paragraph"><p><strong>Pros</strong>: No more obscure tables of values from the ISO code. Hopefully a good
|
|
base to work upon for tweaking PAMs</p></div>
|
|
<div class="paragraph"><p><strong>Cons</strong>: At the moment, doesn’t really sound any better than PAM1</p></div>
|
|
</div>
|
|
</div>
|
|
<div class="sect1">
|
|
<h2 id="_psychoacoustic_model_4">Psychoacoustic Model 4</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>A cleaned up version of PAM2.</p></div>
|
|
<div class="paragraph"><p><strong>Pros</strong>: Faster than PAM2. No more obscure tables of values from the ISO
|
|
standard. Hopefully a good base to work from for improving the PAMs</p></div>
|
|
<div class="paragraph"><p><strong>Cons</strong>: Still has the same "warbling"/"Davros" problems as PAM2.</p></div>
|
|
</div>
|
|
</div>
|
|
<div class="sect1">
|
|
<h2 id="_future_psychoacoustic_models">Future psychoacoustic models</h2>
|
|
<div class="sectionbody">
|
|
<div class="paragraph"><p>There’s a heap that could be done. Unfortunately, I’ve got a set of tin
|
|
ears, crappy speakers and a noisy computer room. If you’ve got the
|
|
capability to do proper PAM testing then please feel free to do so.
|
|
Otherwise, I’ll just keep plodding along with new ideas as they
|
|
arise, such as:</p></div>
|
|
<div class="ulist"><ul>
|
|
<li>
|
|
<p>
|
|
Temporal masking (there’s no pre-echo or anything in TwoLAME)
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Left Right Masking
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
A PAM that’s fully tuneable from the command line?
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Graphical output of SMR values etc. Would allow better debugging of PAMs
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Re-sampling routines
|
|
</p>
|
|
</li>
|
|
<li>
|
|
<p>
|
|
Low/High pass filtering
|
|
</p>
|
|
</li>
|
|
</ul></div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div id="footnotes"><hr /></div>
|
|
<div id="footer">
|
|
<div id="footer-text">
|
|
Version 0.3.13<br />
|
|
Last updated 2011-01-01 19:15:01 GMT
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|