mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-10-26 15:23:48 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			155 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			155 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
 | |
|     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
 | |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
 | |
| <head>
 | |
| <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
 | |
| <meta name="generator" content="AsciiDoc 8.6.3" />
 | |
| <title>Psychoacoustic Models in TwoLAME</title>
 | |
| <link rel="stylesheet" href="./twolame.css" type="text/css" />
 | |
| <script type="text/javascript">
 | |
| /*<![CDATA[*/
 | |
| window.onload = function(){asciidoc.footnotes();}
 | |
| /*]]>*/
 | |
| </script>
 | |
| <script type="text/javascript" src="./asciidoc-xhtml11.js"></script>
 | |
| </head>
 | |
| <body class="article">
 | |
| <div id="header">
 | |
| <h1>Psychoacoustic Models in TwoLAME</h1>
 | |
| <span id="revnumber">version 0.3.13</span>
 | |
| </div>
 | |
| <div id="content">
 | |
| <div class="sect1">
 | |
| <h2 id="_introduction">Introduction</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which
 | |
| are the sonically important parts of the waveform that is being encoded.  The PAM
 | |
| looks for loud sounds which may mask soft sounds, noise which may affect the level
 | |
| of sounds nearby, sounds which are too soft for us to hear and should be ignored
 | |
| and so on.  The information from the PAM is used to determine which parts of the
 | |
| spectrum should get more bits and thus be encoded at greater quality - and which
 | |
| parts are inaudible/unimportant and should thus get fewer bits.</p></div>
 | |
| <div class="paragraph"><p>In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes
 | |
| a <em>frame</em>. For each frame the PAM outputs just <strong>32</strong> values
 | |
| (The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
 | |
| There are only 32 values to determine how to alloctate bits for 1152 samples - this
 | |
| is a pretty coarse technique.</p></div>
 | |
| <div class="paragraph"><p>The different PAMs listed below use different techniques to decide on these 32
 | |
| values. Some models are better than others - meaning that the 32 values chosen
 | |
| are pretty good at spreading the bits where they should go.  Even with a really
 | |
| bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
 | |
| All of these models have strengths and weaknesses.  The model <em>you</em> end up using
 | |
| will be the one that produces the best sound for your ears, for your audio.</p></div>
 | |
| </div>
 | |
| </div>
 | |
| <div class="sect1">
 | |
| <h2 id="_psychoacoustic_model_1">Psychoacoustic Model -1</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>This PAM doesn’t actually look at the samples being encoded to decide upon the
 | |
| output values.  There is simply a set of 32 default values which are used,
 | |
| regardless of input.</p></div>
 | |
| <div class="paragraph"><p><strong>Pros</strong>: Faaaast. Low complexity. Surprisingly good.
 | |
| "Surprising" in that the other PAMs go to the effort of calculating FFTs
 | |
| and subbands and masking, and this one does absolutely <strong>nothing</strong>.
 | |
| Zip. Nada. Diddly Squat. This model might be the best example of why
 | |
| it is hard to make a good model - if having no computations sounds OK,
 | |
| how do you improve on it?</p></div>
 | |
| <div class="paragraph"><p><strong>Cons</strong>: Absolutely no attempt to consider any of the masking effects that
 | |
| would help the audio sound better.</p></div>
 | |
| </div>
 | |
| </div>
 | |
| <div class="sect1">
 | |
| <h2 id="_psychoacoustic_model_0">Psychoacoustic Model 0</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>This PAM looks at the sizes of the <em>scalefactors</em> for the audio and combines
 | |
| it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.</p></div>
 | |
| <div class="paragraph"><p><strong>Pros</strong>: Faaast. Low complexity.</p></div>
 | |
| <div class="paragraph"><p><strong>Cons</strong>: This model has absolutely no mathematical basis and does not use
 | |
| any perceptual model of hearing.  It simply juggles some of the numbers of
 | |
| the input sound to determine the values. Feel free to hack the daylights out
 | |
| of this PAM - add multipliers, constants, log-tables <strong>anything</strong>. Tweak it until
 | |
| you begin to like the sound.</p></div>
 | |
| </div>
 | |
| </div>
 | |
| <div class="sect1">
 | |
| <h2 id="_psychoacoustic_model_1_and_2">Psychoacoustic Model 1 and 2</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>These PAMs are from the ISO standard. Just because they are the standard,
 | |
| doesn’t mean that they are any good. Look at LAME which basically threw out
 | |
| the MP3 standard psycho models and made their own (GPSYCHO).</p></div>
 | |
| <div class="paragraph"><p><strong>Pros</strong>: A reference for future PAMs</p></div>
 | |
| <div class="paragraph"><p><strong>Cons</strong>: Terrible ISO code, buggy tables, poor documentation.</p></div>
 | |
| </div>
 | |
| </div>
 | |
| <div class="sect1">
 | |
| <h2 id="_psychoacoustic_model_3">Psychoacoustic Model 3</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>A re-implementation of psychoacoustic model 1.  ISO11172 was used as the guide
 | |
| for re-writing this PAM from the ground up.</p></div>
 | |
| <div class="paragraph"><p><strong>Pros</strong>: No more obscure tables of values from the ISO code. Hopefully a good
 | |
| base to work upon for tweaking PAMs</p></div>
 | |
| <div class="paragraph"><p><strong>Cons</strong>: At the moment, doesn’t really sound any better than PAM1</p></div>
 | |
| </div>
 | |
| </div>
 | |
| <div class="sect1">
 | |
| <h2 id="_psychoacoustic_model_4">Psychoacoustic Model 4</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>A cleaned up version of PAM2.</p></div>
 | |
| <div class="paragraph"><p><strong>Pros</strong>: Faster than PAM2. No more obscure tables of values from the ISO
 | |
| standard. Hopefully a good base to work from for improving the PAMs</p></div>
 | |
| <div class="paragraph"><p><strong>Cons</strong>: Still has the same "warbling"/"Davros" problems as PAM2.</p></div>
 | |
| </div>
 | |
| </div>
 | |
| <div class="sect1">
 | |
| <h2 id="_future_psychoacoustic_models">Future psychoacoustic models</h2>
 | |
| <div class="sectionbody">
 | |
| <div class="paragraph"><p>There’s a heap that could be done. Unfortunately, I’ve got a set of tin
 | |
| ears, crappy speakers and a noisy computer room.  If you’ve got the
 | |
| capability to do proper PAM testing then please feel free to do so.
 | |
| Otherwise, I’ll just keep plodding along with new ideas as they
 | |
| arise, such as:</p></div>
 | |
| <div class="ulist"><ul>
 | |
| <li>
 | |
| <p>
 | |
| Temporal masking (there’s no pre-echo or anything in TwoLAME)
 | |
| </p>
 | |
| </li>
 | |
| <li>
 | |
| <p>
 | |
| Left Right Masking
 | |
| </p>
 | |
| </li>
 | |
| <li>
 | |
| <p>
 | |
| A PAM that’s fully tuneable from the command line?
 | |
| </p>
 | |
| </li>
 | |
| <li>
 | |
| <p>
 | |
| Graphical output of SMR values etc. Would allow better debugging of PAMs
 | |
| </p>
 | |
| </li>
 | |
| <li>
 | |
| <p>
 | |
| Re-sampling routines
 | |
| </p>
 | |
| </li>
 | |
| <li>
 | |
| <p>
 | |
| Low/High pass filtering
 | |
| </p>
 | |
| </li>
 | |
| </ul></div>
 | |
| </div>
 | |
| </div>
 | |
| </div>
 | |
| <div id="footnotes"><hr /></div>
 | |
| <div id="footer">
 | |
| <div id="footer-text">
 | |
| Version 0.3.13<br />
 | |
| Last updated 2011-01-01 19:15:01 GMT
 | |
| </div>
 | |
| </div>
 | |
| </body>
 | |
| </html>
 |