mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-10-31 14:13:50 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			115 lines
		
	
	
		
			4.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			115 lines
		
	
	
		
			4.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Psychoacoustic Models in TwoLAME
 | |
| ================================
 | |
| 
 | |
| 
 | |
| Introduction
 | |
| ------------
 | |
| 
 | |
| In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which 
 | |
| are the sonically important parts of the waveform that is being encoded.  The PAM 
 | |
| looks for loud sounds which may mask soft sounds, noise which may affect the level 
 | |
| of sounds nearby, sounds which are too soft for us to hear and should be ignored 
 | |
| and so on.  The information from the PAM is used to determine which parts of the 
 | |
| spectrum should get more bits and thus be encoded at greater quality - and which 
 | |
| parts are inaudible/unimportant and should thus get fewer bits.
 | |
| 
 | |
| In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes 
 | |
| a 'frame'. For each frame the PAM outputs just *32* values 
 | |
| (The values are the Signal to Masking Ratio [SMR] in that subband). This is important!
 | |
| There are only 32 values to determine how to alloctate bits for 1152 samples - this 
 | |
| is a pretty coarse technique.
 | |
| 
 | |
| The different PAMs listed below use different techniques to decide on these 32 
 | |
| values. Some models are better than others - meaning that the 32 values chosen 
 | |
| are pretty good at spreading the bits where they should go.  Even with a really 
 | |
| bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time.
 | |
| All of these models have strengths and weaknesses.  The model 'you' end up using 
 | |
| will be the one that produces the best sound for your ears, for your audio.  
 | |
| 
 | |
| Psychoacoustic Model -1
 | |
| -----------------------
 | |
| 
 | |
| This PAM doesn't actually look at the samples being encoded to decide upon the 
 | |
| output values.  There is simply a set of 32 default values which are used, 
 | |
| regardless of input.
 | |
| 
 | |
| *Pros*: Faaaast. Low complexity. Surprisingly good.
 | |
| "Surprising" in that the other PAMs go to the effort of calculating FFTs
 | |
| and subbands and masking, and this one does absolutely *nothing*. 
 | |
| Zip. Nada. Diddly Squat. This model might be the best example of why 
 | |
| it is hard to make a good model - if having no computations sounds OK, 
 | |
| how do you improve on it?
 | |
| 
 | |
| *Cons*: Absolutely no attempt to consider any of the masking effects that 
 | |
| would help the audio sound better. 
 | |
| 
 | |
| 
 | |
| Psychoacoustic Model 0
 | |
| ----------------------
 | |
| 
 | |
| This PAM looks at the sizes of the 'scalefactors' for the audio and combines 
 | |
| it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.
 | |
| 
 | |
| *Pros*: Faaast. Low complexity.
 | |
| 
 | |
| *Cons*: This model has absolutely no mathematical basis and does not use 
 | |
| any perceptual model of hearing.  It simply juggles some of the numbers of 
 | |
| the input sound to determine the values. Feel free to hack the daylights out 
 | |
| of this PAM - add multipliers, constants, log-tables *anything*. Tweak it until 
 | |
| you begin to like the sound.
 | |
| 
 | |
| 
 | |
| Psychoacoustic Model 1 and 2
 | |
| ----------------------------
 | |
| 
 | |
| These PAMs are from the ISO standard. Just because they are the standard, 
 | |
| doesn't mean that they are any good. Look at LAME which basically threw out 
 | |
| the MP3 standard psycho models and made their own (GPSYCHO).
 | |
| 
 | |
| *Pros*: A reference for future PAMs
 | |
| 
 | |
| *Cons*: Terrible ISO code, buggy tables, poor documentation.
 | |
| 
 | |
| 
 | |
| Psychoacoustic Model 3
 | |
| ----------------------
 | |
| 
 | |
| A re-implementation of psychoacoustic model 1.  ISO11172 was used as the guide 
 | |
| for re-writing this PAM from the ground up.
 | |
| 
 | |
| *Pros*: No more obscure tables of values from the ISO code. Hopefully a good 
 | |
| base to work upon for tweaking PAMs
 | |
| 
 | |
| *Cons*: At the moment, doesn't really sound any better than PAM1
 | |
| 
 | |
| 
 | |
| Psychoacoustic Model 4
 | |
| ----------------------
 | |
| 
 | |
| A cleaned up version of PAM2.
 | |
| 
 | |
| *Pros*: Faster than PAM2. No more obscure tables of values from the ISO 
 | |
| standard. Hopefully a good base to work from for improving the PAMs
 | |
| 
 | |
| *Cons*: Still has the same "warbling"/"Davros" problems as PAM2.
 | |
| 
 | |
| 
 | |
| 
 | |
| Future psychoacoustic models
 | |
| ----------------------------
 | |
| 
 | |
| There's a heap that could be done. Unfortunately, I've got a set of tin 
 | |
| ears, crappy speakers and a noisy computer room.  If you've got the 
 | |
| capability to do proper PAM testing then please feel free to do so. 
 | |
| Otherwise, I'll just keep plodding along with new ideas as they 
 | |
| arise, such as:
 | |
| 
 | |
| - Temporal masking (there's no pre-echo or anything in TwoLAME)
 | |
| - Left Right Masking
 | |
| - A PAM that's fully tuneable from the command line?
 | |
| - Graphical output of SMR values etc. Would allow better debugging of PAMs
 | |
| - Re-sampling routines
 | |
| - Low/High pass filtering
 | |
| 
 | |
| 
 |