mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-10-26 07:13:49 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			459 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
			
		
		
	
	
			459 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
| <?xml version="1.0" standalone="no"?>
 | |
| <!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
 | |
|                 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
 | |
| 
 | |
| ]>
 | |
| 
 | |
| <section id="vorbis-spec-codebook">
 | |
| <sectioninfo>
 | |
| <releaseinfo>
 | |
|  $Id: 03-codebook.xml,v 1.1.1.1 2004-11-13 16:51:21 mbrubeck Exp $
 | |
| </releaseinfo>
 | |
| </sectioninfo>
 | |
| <title>Probability Model and Codebooks</title>
 | |
| 
 | |
| <section>
 | |
| <title>Overview</title>
 | |
| 
 | |
| <para>
 | |
| Unlike practically every other mainstream audio codec, Vorbis has no
 | |
| statically configured probability model, instead packing all entropy
 | |
| decoding configuration, VQ and Huffman, into the bitstream itself in
 | |
| the third header, the codec setup header.  This packed configuration
 | |
| consists of multiple 'codebooks', each containing a specific
 | |
| Huffman-equivalent representation for decoding compressed codewords as
 | |
| well as an optional lookup table of output vector values to which a
 | |
| decoded Huffman value is applied as an offset, generating the final
 | |
| decoded output corresponding to a given compressed codeword.</para>
 | |
| 
 | |
| <section><title>Bitwise operation</title>
 | |
| <para>
 | |
| The codebook mechanism is built on top of the vorbis bitpacker. Both
 | |
| the codebooks themselves and the codewords they decode are unrolled 
 | |
| from a packet as a series of arbitrary-width values read from the 
 | |
| stream according to <xref linkend="vorbis-spec-bitpacking"/>.</para>
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section>
 | |
| <title>Packed codebook format</title>
 | |
| 
 | |
| <para>
 | |
| For purposes of the examples below, we assume that the storage
 | |
| system's native byte width is eight bits.  This is not universally
 | |
| true; see <xref linkend="vorbis-spec-bitpacking"/> for discussion 
 | |
| relating to non-eight-bit bytes.</para>
 | |
| 
 | |
| <section><title>codebook decode</title>
 | |
| 
 | |
| <para>
 | |
| A codebook begins with a 24 bit sync pattern, 0x564342:
 | |
| 
 | |
| <screen>
 | |
| byte 0: [ 0 1 0 0 0 0 1 0 ] (0x42)
 | |
| byte 1: [ 0 1 0 0 0 0 1 1 ] (0x43)
 | |
| byte 2: [ 0 1 0 1 0 1 1 0 ] (0x56)
 | |
| </screen></para>
 | |
| 
 | |
| <para>
 | |
| 16 bit <varname>[codebook_dimensions]</varname> and 24 bit <varname>[codebook_entries]</varname> fields:
 | |
| 
 | |
| <screen>
 | |
| 
 | |
| byte 3: [ X X X X X X X X ] 
 | |
| byte 4: [ X X X X X X X X ] [codebook_dimensions] (16 bit unsigned)
 | |
| 
 | |
| byte 5: [ X X X X X X X X ] 
 | |
| byte 6: [ X X X X X X X X ] 
 | |
| byte 7: [ X X X X X X X X ] [codebook_entries] (24 bit unsigned)
 | |
| 
 | |
| </screen></para>
 | |
| 
 | |
| <para>
 | |
| Next is the <varname>[ordered]</varname> bit flag:
 | |
| 
 | |
| <screen>
 | |
| 
 | |
| byte 8: [               X ] [ordered] (1 bit)
 | |
| 
 | |
| </screen></para>
 | |
| 
 | |
| <para>
 | |
| Each entry, numbering a
 | |
| total of <varname>[codebook_entries]</varname>, is assigned a codeword length.
 | |
| We now read the list of codeword lengths and store these lengths in
 | |
| the array <varname>[codebook_codeword_lengths]</varname>. Decode of lengths is
 | |
| according to whether the <varname>[ordered]</varname> flag is set or unset.
 | |
| 
 | |
| <itemizedlist>
 | |
| <listitem>
 | |
|   <para>If the <varname>[ordered]</varname> flag is unset, the codeword list is not
 | |
|   length ordered and the decoder needs to read each codeword length
 | |
|   one-by-one.</para> 
 | |
| 
 | |
|   <para>The decoder first reads one additional bit flag, the
 | |
|   <varname>[sparse]</varname> flag.  This flag determines whether or not the
 | |
|   codebook contains unused entries that are not to be included in the
 | |
|   codeword decode tree:
 | |
| 
 | |
| <screen>
 | |
| byte 8: [             X 1 ] [sparse] flag (1 bit)
 | |
| </screen></para>
 | |
| 
 | |
| <para>
 | |
|   The decoder now performs for each of the <varname>[codebook_entries]</varname>
 | |
|   codebook entries:
 | |
| 
 | |
| <screen>
 | |
|   
 | |
|   1) if([sparse] is set){
 | |
| 
 | |
|          2) [flag] = read one bit;
 | |
|          3) if([flag] is set){
 | |
| 
 | |
|               4) [length] = read a five bit unsigned integer;
 | |
|               5) codeword length for this entry is [length]+1;
 | |
| 
 | |
|             } else {
 | |
| 
 | |
|               6) this entry is unused.  mark it as such.
 | |
| 
 | |
|             }
 | |
| 
 | |
|      } else the sparse flag is not set {
 | |
| 
 | |
|         7) [length] = read a five bit unsigned integer;
 | |
|         8) the codeword length for this entry is [length]+1;
 | |
|         
 | |
|      }
 | |
| 
 | |
| </screen></para>
 | |
| </listitem>
 | |
| <listitem>
 | |
|   <para>If the <varname>[ordered]</varname> flag is set, the codeword list for this
 | |
|   codebook is encoded in ascending length order.  Rather than reading
 | |
|   a length for every codeword, the encoder reads the number of
 | |
|   codewords per length.  That is, beginning at entry zero:
 | |
| 
 | |
| <screen>
 | |
|   1) [current_entry] = 0;
 | |
|   2) [current_length] = read a five bit unsigned integer and add 1;
 | |
|   3) [number] = read <link linkend="vorbis-spec-ilog">ilog</link>([codebook_entries] - [current_entry]) bits as an unsigned integer
 | |
|   4) set the entries [current_entry] through [current_entry]+[number]-1, inclusive, 
 | |
|     of the [codebook_codeword_lengths] array to [current_length]
 | |
|   5) set [current_entry] to [number] + [current_entry]
 | |
|   6) increment [current_length] by 1
 | |
|   7) if [current_entry] is greater than [codebook_entries] ERROR CONDITION; 
 | |
|     the decoder will not be able to read this stream.
 | |
|   8) if [current_entry] is less than [codebook_entries], repeat process starting at 3)
 | |
|   9) done.
 | |
| </screen></para>
 | |
| </listitem>
 | |
| </itemizedlist>
 | |
| 
 | |
| After all codeword lengths have been decoded, the decoder reads the
 | |
| vector lookup table.  Vorbis I supports three lookup types:
 | |
| <orderedlist>
 | |
| <listitem>
 | |
| <simpara>No lookup</simpara>
 | |
| </listitem><listitem>
 | |
| <simpara>Implicitly populated value mapping (lattice VQ)</simpara>
 | |
| </listitem><listitem>
 | |
| <simpara>Explicitly populated value mapping (tessellated or 'foam'
 | |
| VQ)</simpara>
 | |
| </listitem>
 | |
| </orderedlist>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| The lookup table type is read as a four bit unsigned integer:
 | |
| <screen>
 | |
|   1) [codebook_lookup_type] = read four bits as an unsigned integer
 | |
| </screen></para>
 | |
| 
 | |
| <para>
 | |
| Codebook decode precedes according to <varname>[codebook_lookup_type]</varname>:
 | |
| <itemizedlist>
 | |
| <listitem>
 | |
| <para>Lookup type zero indicates no lookup to be read.  Proceed past
 | |
| lookup decode.</para>
 | |
| </listitem><listitem>
 | |
| <para>Lookup types one and two are similar, differing only in the
 | |
| number of lookup values to be read.  Lookup type one reads a list of
 | |
| values that are permuted in a set pattern to build a list of vectors,
 | |
| each vector of order <varname>[codebook_dimensions]</varname> scalars.  Lookup
 | |
| type two builds the same vector list, but reads each scalar for each
 | |
| vector explicitly, rather than building vectors from a smaller list of
 | |
| possible scalar values.  Lookup decode proceeds as follows:
 | |
| 
 | |
| <screen>
 | |
|   1) [codebook_minimum_value] = <link linkend="vorbis-spec-float32_unpack">float32_unpack</link>( read 32 bits as an unsigned integer) 
 | |
|   2) [codebook_delta_value] = <link linkend="vorbis-spec-float32_unpack">float32_unpack</link>( read 32 bits as an unsigned integer) 
 | |
|   3) [codebook_value_bits] = read 4 bits as an unsigned integer and add 1
 | |
|   4) [codebook_sequence_p] = read 1 bit as a boolean flag
 | |
| 
 | |
|   if ( [codebook_lookup_type] is 1 ) {
 | |
|    
 | |
|      5) [codebook_lookup_values] = <link linkend="vorbis-spec-lookup1_values">lookup1_values</link>(<varname>[codebook_entries]</varname>, <varname>[codebook_dimensions]</varname> )
 | |
| 
 | |
|   } else {
 | |
| 
 | |
|      6) [codebook_lookup_values] = <varname>[codebook_entries]</varname> * <varname>[codebook_dimensions]</varname>
 | |
| 
 | |
|   }
 | |
| 
 | |
|   7) read a total of [codebook_lookup_values] unsigned integers of [codebook_value_bits] each; 
 | |
|      store these in order in the array [codebook_multiplicands]
 | |
| </screen></para>
 | |
| </listitem><listitem>
 | |
| <para>A <varname>[codebook_lookup_type]</varname> of greater than two is reserved
 | |
| and indicates a stream that is not decodable by the specification in this
 | |
| document.</para>
 | |
| </listitem>
 | |
| </itemizedlist>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| An 'end of packet' during any read operation in the above steps is
 | |
| considered an error condition rendering the stream undecodable.</para>
 | |
| 
 | |
| <section><title>Huffman decision tree representation</title>
 | |
| 
 | |
| <para>
 | |
| The <varname>[codebook_codeword_lengths]</varname> array and
 | |
| <varname>[codebook_entries]</varname> value uniquely define the Huffman decision
 | |
| tree used for entropy decoding.</para>
 | |
| 
 | |
| <para>
 | |
| Briefly, each used codebook entry (recall that length-unordered
 | |
| codebooks support unused codeword entries) is assigned, in order, the
 | |
| lowest valued unused binary Huffman codeword possible.  Assume the
 | |
| following codeword length list:
 | |
| 
 | |
| <screen>
 | |
| entry 0: length 2
 | |
| entry 1: length 4
 | |
| entry 2: length 4
 | |
| entry 3: length 4
 | |
| entry 4: length 4
 | |
| entry 5: length 2
 | |
| entry 6: length 3
 | |
| entry 7: length 3
 | |
| </screen></para>
 | |
| 
 | |
| <para>
 | |
| Assigning codewords in order (lowest possible value of the appropriate
 | |
| length to highest) results in the following codeword list:
 | |
| 
 | |
| <screen>
 | |
| entry 0: length 2 codeword 00
 | |
| entry 1: length 4 codeword 0100
 | |
| entry 2: length 4 codeword 0101
 | |
| entry 3: length 4 codeword 0110
 | |
| entry 4: length 4 codeword 0111
 | |
| entry 5: length 2 codeword 10
 | |
| entry 6: length 3 codeword 110
 | |
| entry 7: length 3 codeword 111
 | |
| </screen></para>
 | |
| 
 | |
| 
 | |
| <note>
 | |
| <para>
 | |
| Unlike most binary numerical values in this document, we
 | |
| intend the above codewords to be read and used bit by bit from left to
 | |
| right, thus the codeword '001' is the bit string 'zero, zero, one'.
 | |
| When determining 'lowest possible value' in the assignment definition
 | |
| above, the leftmost bit is the MSb.</para>
 | |
| </note>
 | |
| 
 | |
| <para>
 | |
| It is clear that the codeword length list represents a Huffman
 | |
| decision tree with the entry numbers equivalent to the leaves numbered
 | |
| left-to-right:
 | |
| 
 | |
| <mediaobject>
 | |
| <imageobject>
 | |
|  <imagedata fileref="hufftree.png" format="PNG"/>
 | |
| </imageobject>
 | |
| <textobject>
 | |
|  <phrase>[huffman tree illustration]</phrase>
 | |
| </textobject>
 | |
| </mediaobject>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| As we assign codewords in order, we see that each choice constructs a
 | |
| new leaf in the leftmost possible position.</para>
 | |
| 
 | |
| <para>
 | |
| Note that it's possible to underspecify or overspecify a Huffman tree
 | |
| via the length list.  In the above example, if codeword seven were
 | |
| eliminated, it's clear that the tree is unfinished:
 | |
| 
 | |
| <mediaobject>
 | |
| <imageobject>
 | |
|  <imagedata fileref="hufftree-under.png" format="PNG"/>
 | |
| </imageobject>
 | |
| <textobject>
 | |
|  <phrase>[underspecified huffman tree illustration]</phrase>
 | |
| </textobject>
 | |
| </mediaobject>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Similarly, in the original codebook, it's clear that the tree is fully
 | |
| populated and a ninth codeword is impossible.  Both underspecified and
 | |
| overspecified trees are an error condition rendering the stream
 | |
| undecodable.</para>
 | |
| 
 | |
| <para>
 | |
| Codebook entries marked 'unused' are simply skipped in the assigning
 | |
| process.  They have no codeword and do not appear in the decision
 | |
| tree, thus it's impossible for any bit pattern read from the stream to
 | |
| decode to that entry number.</para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section><title>VQ lookup table vector representation</title>
 | |
| 
 | |
| <para>
 | |
| Unpacking the VQ lookup table vectors relies on the following values:
 | |
| <programlisting>
 | |
| the [codebook_multiplicands] array
 | |
| [codebook_minimum_value]
 | |
| [codebook_delta_value]
 | |
| [codebook_sequence_p]
 | |
| [codebook_lookup_type]
 | |
| [codebook_entries]
 | |
| [codebook_dimensions]
 | |
| [codebook_lookup_values]
 | |
| </programlisting>
 | |
| </para>
 | |
| 
 | |
| <para>
 | |
| Decoding (unpacking) a specific vector in the vector lookup table
 | |
| proceeds according to <varname>[codebook_lookup_type]</varname>.  The unpacked
 | |
| vector values are what a codebook would return during audio packet
 | |
| decode in a VQ context.</para>
 | |
| 
 | |
| <section><title>Vector value decode: Lookup type 1</title>
 | |
| 
 | |
| <para>
 | |
| Lookup type one specifies a lattice VQ lookup table built
 | |
| algorithmically from a list of scalar values.  Calculate (unpack) the
 | |
| final values of a codebook entry vector from the entries in
 | |
| <varname>[codebook_multiplicands]</varname> as follows (<varname>[value_vector]</varname>
 | |
| is the output vector representing the vector of values for entry number
 | |
| <varname>[lookup_offset]</varname> in this codebook):
 | |
| 
 | |
| <screen>
 | |
|   1) [last] = 0;
 | |
|   2) [index_divisor] = 1;
 | |
|   3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) {
 | |
|        
 | |
|        4) [multiplicand_offset] = ( [lookup_offset] divided by [index_divisor] using integer 
 | |
|           division ) integer modulo [codebook_lookup_values]
 | |
| 
 | |
|        5) vector [value_vector] element [i] = 
 | |
|             ( [codebook_multiplicands] array element number [multiplicand_offset] ) *
 | |
|             [codebook_delta_value] + [codebook_minimum_value] + [last];
 | |
| 
 | |
|        6) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i]
 | |
| 
 | |
|        7) [index_divisor] = [index_divisor] * [codebook_lookup_values]
 | |
| 
 | |
|      }
 | |
|  
 | |
|   8) vector calculation completed.
 | |
| </screen></para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section><title>Vector value decode: Lookup type 2</title>
 | |
| 
 | |
| <para>
 | |
| Lookup type two specifies a VQ lookup table in which each scalar in
 | |
| each vector is explicitly set by the <varname>[codebook_multiplicands]</varname>
 | |
| array in a one-to-one mapping.  Calculate [unpack] the
 | |
| final values of a codebook entry vector from the entries in
 | |
| <varname>[codebook_multiplicands]</varname> as follows (<varname>[value_vector]</varname>
 | |
| is the output vector representing the vector of values for entry number
 | |
| <varname>[lookup_offset]</varname> in this codebook):
 | |
| 
 | |
| <screen>
 | |
|   1) [last] = 0;
 | |
|   2) [multiplicand_offset] = [lookup_offset] * [codebook_dimensions]
 | |
|   3) iterate [i] over the range 0 ... [codebook_dimensions]-1 (once for each scalar value in the value vector) {
 | |
| 
 | |
|        4) vector [value_vector] element [i] = 
 | |
|             ( [codebook_multiplicands] array element number [multiplicand_offset] ) *
 | |
|             [codebook_delta_value] + [codebook_minimum_value] + [last];
 | |
| 
 | |
|        5) if ( [codebook_sequence_p] is set ) then set [last] = vector [value_vector] element [i] 
 | |
| 
 | |
|        6) increment [multiplicand_offset]
 | |
| 
 | |
|      }
 | |
|  
 | |
|   7) vector calculation completed.
 | |
| </screen></para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <section>
 | |
| <title>Use of the codebook abstraction</title>
 | |
| 
 | |
| <para>
 | |
| The decoder uses the codebook abstraction much as it does the
 | |
| bit-unpacking convention; a specific codebook reads a
 | |
| codeword from the bitstream, decoding it into an entry number, and then
 | |
| returns that entry number to the decoder (when used in a scalar
 | |
| entropy coding context), or uses that entry number as an offset into
 | |
| the VQ lookup table, returning a vector of values (when used in a context
 | |
| desiring a VQ value). Scalar or VQ context is always explicit; any call
 | |
| to the codebook mechanism requests either a scalar entry number or a
 | |
| lookup vector.</para>
 | |
| 
 | |
| <para>
 | |
| Note that VQ lookup type zero indicates that there is no lookup table;
 | |
| requesting decode using a codebook of lookup type 0 in any context
 | |
| expecting a vector return value (even in a case where a vector of
 | |
| dimension one) is forbidden.  If decoder setup or decode requests such
 | |
| an action, that is an error condition rendering the packet
 | |
| undecodable.</para>
 | |
| 
 | |
| <para>
 | |
| Using a codebook to read from the packet bitstream consists first of
 | |
| reading and decoding the next codeword in the bitstream. The decoder
 | |
| reads bits until the accumulated bits match a codeword in the
 | |
| codebook.  This process can be though of as logically walking the
 | |
| Huffman decode tree by reading one bit at a time from the bitstream,
 | |
| and using the bit as a decision boolean to take the 0 branch (left in
 | |
| the above examples) or the 1 branch (right in the above examples).
 | |
| Walking the tree finishes when the decode process hits a leaf in the
 | |
| decision tree; the result is the entry number corresponding to that
 | |
| leaf.  Reading past the end of a packet propagates the 'end-of-stream'
 | |
| condition to the decoder.</para>
 | |
| 
 | |
| <para>
 | |
| When used in a scalar context, the resulting codeword entry is the
 | |
| desired return value.</para>
 | |
| 
 | |
| <para>
 | |
| When used in a VQ context, the codeword entry number is used as an
 | |
| offset into the VQ lookup table.  The value returned to the decoder is
 | |
| the vector of scalars corresponding to this offset.</para>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| <!-- end section of probablity model and codebooks -->
 |