mirror of
https://github.com/cookiengineer/audacity
synced 2026-01-11 23:25:53 +01:00
Move library tree where it belongs
This commit is contained in:
236
lib-src/twolame/doc/vbr.txt
Normal file
236
lib-src/twolame/doc/vbr.txt
Normal file
@@ -0,0 +1,236 @@
|
||||
TwoLAME: MPEG Audio Layer II VBR
|
||||
================================
|
||||
|
||||
|
||||
Contents
|
||||
--------
|
||||
|
||||
- Introduction
|
||||
- Usage
|
||||
- Bitrate Ranges for various Sampling frequencies
|
||||
- Why can't the bitrate vary from 32kbps to 384kbps for every file?
|
||||
- Short Answer
|
||||
- Long Answer
|
||||
- Tech Stuff
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
VBR mode works by selecting a different bitrate for each frame. Frames
|
||||
which are harder to encode will be allocated more bits i.e. a higher bitrate.
|
||||
|
||||
LayerII VBR is a complete hack - the ISO standard actually says that decoders are not
|
||||
required to support it. As a hack, its implementation is a pain to try and understand.
|
||||
If you're mega-keen to get full range VBR working, either (a) send me money (b) grab the
|
||||
ISO standard and a C compiler and email me.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
twolame -v [level] inputfile outputfile.
|
||||
|
||||
A level of 5 works very well for me.
|
||||
|
||||
The level value can is a measurement of quality - the higher
|
||||
the level the higher the average bitrate of the resultant file.
|
||||
[See TECH STUFF for a better explanation of what the value does]
|
||||
|
||||
The confusing part of my implementation of LayerII VBR is that it's different from MP3 VBR.
|
||||
|
||||
- The range of bitrates used is controlled by the input sampling frequency. (See below "Bitrate ranges")
|
||||
- The tendency to use higher bitrates is governed by the <level>.
|
||||
|
||||
E.g. Say you have a 44.1kHz Stereo file. In VBR mode, the bitrate can range from 192 to 384 kbps.
|
||||
|
||||
Using "-v -5" will force the encoder to favour the lower bitrate.
|
||||
|
||||
Using "-v 5" will force the encoder to favour the upper bitrate.
|
||||
|
||||
The value can actually be *any* int. -27, 233, 47. The larger the number, the greater
|
||||
the bitrate bias.
|
||||
|
||||
|
||||
|
||||
Bitrate Ranges
|
||||
--------------
|
||||
|
||||
When making a VBR stream, the bitrate is only allowed to vary within
|
||||
set limits
|
||||
|
||||
48kHz
|
||||
Stereo: 112-384kbps Mono: 56-192kbps
|
||||
|
||||
44.1kHz & 32kHz
|
||||
Stereo: 192-384kbps Mono: 96-192kbps
|
||||
|
||||
24kHz, 22.05kHz & 16kHz
|
||||
Stereo/Mono: 8-160kbps
|
||||
|
||||
Why doesn't the VBR mode work the same as MP3VBR? The Short Answer
|
||||
------------------------------------------------------------------
|
||||
|
||||
*Why can't the bitrate vary from 32kbps to 384kbps for every file?*
|
||||
|
||||
According to the standard (ISO/IEC 11172-3:1993) Section 2.4.2.3
|
||||
|
||||
"In order to provide the smallest possible delay and complexity, the
|
||||
decoder is not required to support a continuously variable bitrate when
|
||||
in layer I or II. Layer III supports variable bitrate by switching the
|
||||
bitrate index."
|
||||
|
||||
and
|
||||
|
||||
"For Layer II, not all combinations of total bitrate and mode are allowed."
|
||||
|
||||
Hence, most LayerII coders would not have been written with VBR in mind, and
|
||||
LayerII VBR is a hack. It works for limited cases. Getting it to work to
|
||||
the same extent as MP3-style VBR will be a major hack.
|
||||
|
||||
(If you *really* want better bitrate ranges, read "The Long Answer" and submit your mega-patch.)
|
||||
|
||||
Why doesn't the VBR mode work the same as MP3VBR? The Long Answer
|
||||
-----------------------------------------------------------------
|
||||
|
||||
*Why can't the bitrate vary from 32kbps to 384kbps for every file?*
|
||||
|
||||
Reason 1: The standard limits the range
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
As quoted above from the standard for 48/44.1/32kHz:
|
||||
|
||||
"For Layer II, not all combinations of total bitrate and mode are allowed. See
|
||||
the following table."
|
||||
|
||||
Bitrate Allowed Modes
|
||||
(kbps)
|
||||
32 mono only
|
||||
48 mono only
|
||||
56 mono only
|
||||
64 all modes
|
||||
80 mono only
|
||||
96 all modes
|
||||
112 all modes
|
||||
128 all modes
|
||||
160 all modes
|
||||
192 all modes
|
||||
224 stereo only
|
||||
256 stereo only
|
||||
320 stereo only
|
||||
384 stereo only
|
||||
|
||||
So based upon this table alone, you *could* have VBR stereo encoding which varies
|
||||
smoothly from 96 to 384kbps. Or you could have have VBR mono encoding which varies from
|
||||
32 to 192kbps. But since the top and bottom bitrates don't apply to all modes, it would
|
||||
be impossible to have a stereo file encoded from 32 to 384 kbps.
|
||||
|
||||
But this isn't what is really limiting the allowable bitrate range - the bit allocation
|
||||
tables are the major hurdle.
|
||||
|
||||
Reason 2: The bit allocation tables don't allow it
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
From the standard, Section 2.4.3.3.1 "Bit allocation decoding"
|
||||
|
||||
"For different combinations of bitrate and sampling frequency, different bit
|
||||
allocation tables exist.
|
||||
|
||||
These bit allocation tables are pre-determined tables (in Annex B of the standard) which
|
||||
indicate
|
||||
|
||||
- how many bits to read for the initial data (2,3 or 4)
|
||||
- these bits are then used as an index back into the table to
|
||||
find the number of quantize levels for the samples in this subband
|
||||
|
||||
But the table used (and hence the number of bits and the calculated index) are different
|
||||
for different combinations of bitrate and sampling frequency.
|
||||
|
||||
I will use TableB.2a as an example.
|
||||
|
||||
Table B.2a Applies for the following combinations.
|
||||
|
||||
Sampling Freq Bitrates in (kbps/channel) [emphasis: this is a PER CHANNEL bitrate]
|
||||
48 56, 64, 80, 96, 112, 128, 160, 192
|
||||
44.1 56, 64, 80
|
||||
32 56, 64, 80
|
||||
|
||||
If we have a STEREO 48kHz input file, and we use this table, then the bitrates
|
||||
we could calculate from this would be 112, 128, 160, 192, 224, 256, 320 and 384 kbps.
|
||||
|
||||
This table contains no information on how to encode stuff at bitrates less than 112kbps
|
||||
(for a stereo file). You would have to load allocation table B.2c to encode stereo at
|
||||
64kbps and 128kbps.
|
||||
|
||||
Since it would be a MAJOR piece of hacking to get the different tables shifted in and out
|
||||
during the encoding process, once an allocation table is loaded *IT IS NOT CHANGED*.
|
||||
|
||||
Hence, the best table is picked at the start of the encoding process, and the encoder
|
||||
is stuck with it for the rest of the encode.
|
||||
|
||||
For twolame-02j, I have picked the table it loads for different
|
||||
sampling frequencies in order to optimize the range of bitrates possible.
|
||||
|
||||
48 kHz - Table B.2a
|
||||
Stereo Bitrate Range: 112 - 384
|
||||
Mono Bitrate Range : 56 - 192
|
||||
|
||||
44.1/32 kHz - Table B.2b
|
||||
Stereo Bitrate Range: 192 - 384
|
||||
Mono Bitrate Range: 96 - 192
|
||||
|
||||
24/22.05/16 kHz - LSF Table (Standard ISO/IEC 13818.3:1995 Annex B, Table B.1)
|
||||
There is only 1 table for the Lower Sampling Frequencies
|
||||
All modes (mono and stereo) are allowable at all bitrates
|
||||
So at the Lower Sampling Frequencies you *can* have a completely variable
|
||||
bitrate over the entire range.
|
||||
|
||||
|
||||
Tech Stuff
|
||||
----------
|
||||
|
||||
The VBR mode is mainly centered around the main_bit_allocation() and
|
||||
a_bit_allocation() routines in encode.c.
|
||||
|
||||
The limited range of VBR is due to my particular implementation which restricts
|
||||
ranges to within one alloc table (see tables B.2a, B.2b, B.2c and B.2d in ISO 11172).
|
||||
The VBR range for 32/44.1khz lies within B.2b, and the 48khz VBR lies within table B.2a.
|
||||
|
||||
I'm not sure whether it is worth extending these ranges down to lower bitrates.
|
||||
The work required to switch alloc tables *during* the encoding is major.
|
||||
|
||||
In the case of silence, it might be worth doing a quick check for very low signals
|
||||
and writing a pre-calculated *blank* 32kpbs frame. [probably also a lot of work].
|
||||
|
||||
How CBR works
|
||||
-------------
|
||||
|
||||
- Use the psycho model to determine the MNRs for each subband
|
||||
[MNR = the ratio of "masking" to "noise"]
|
||||
(From an encoding perspective, a bigger MNR in a subband means that
|
||||
it sounds better since the noise is more masked))
|
||||
- calculate the available data bits (adb) for this bitrate.
|
||||
- Based upon the MNR (Masking:Noise Ratio) values, allocate bits to each
|
||||
subband
|
||||
- Keep increasing the bits to whichever subband currently has the min MNR
|
||||
value until we have no bits left.
|
||||
- This mode does not guarentee that all the subbands are without noise
|
||||
ie there may still be subbands with MNR less than 0.0 (noisy!)
|
||||
|
||||
How VBR works
|
||||
-------------
|
||||
|
||||
- pretend we have lots of bits to spare, and work out the bits which would
|
||||
raise the MNR in each subband to the level given by the argument on the
|
||||
command line "-v [int]"
|
||||
- Pick the bitrate which has more bits than the required_bits we just calculated
|
||||
- calculate a_bit_allocation()
|
||||
- VBR "guarantees" that all subbands have MNR > VBRLEVEL or that we have
|
||||
reached the maximum bitrate.
|
||||
|
||||
FUTURE
|
||||
------
|
||||
|
||||
- with this VBR mode, we know the bits aren't going to run out, so we can
|
||||
just assign them "greedily".
|
||||
- VBR_a_bit_allocation() is yet to be written :)
|
||||
|
||||
Reference in New Issue
Block a user