mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-11-03 23:53:55 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			453 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			453 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
 | 
						|
%!TEX root = Vorbis_I_spec.tex
 | 
						|
% $Id$
 | 
						|
\section{Residue setup and decode} \label{vorbis:spec:residue}
 | 
						|
 | 
						|
\subsection{Overview}
 | 
						|
 | 
						|
A residue vector represents the fine detail of the audio spectrum of
 | 
						|
one channel in an audio frame after the encoder subtracts the floor
 | 
						|
curve and performs any channel coupling.  A residue vector may
 | 
						|
represent spectral lines, spectral magnitude, spectral phase or
 | 
						|
hybrids as mixed by channel coupling.  The exact semantic content of
 | 
						|
the vector does not matter to the residue abstraction.
 | 
						|
 | 
						|
Whatever the exact qualities, the Vorbis residue abstraction codes the
 | 
						|
residue vectors into the bitstream packet, and then reconstructs the
 | 
						|
vectors during decode.  Vorbis makes use of three different encoding
 | 
						|
variants (numbered 0, 1 and 2) of the same basic vector encoding
 | 
						|
abstraction.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsection{Residue format}
 | 
						|
 | 
						|
Residue format partitions each vector in the vector bundle into chunks,
 | 
						|
classifies each chunk, encodes the chunk classifications and finally
 | 
						|
encodes the chunks themselves using the the specific VQ arrangement
 | 
						|
defined for each selected classification.
 | 
						|
The exact interleaving and partitioning vary by residue encoding number,
 | 
						|
however the high-level process used to classify and encode the residue
 | 
						|
vector is the same in all three variants.
 | 
						|
 | 
						|
A set of coded residue vectors are all of the same length.  High level
 | 
						|
coding structure, ignoring for the moment exactly how a partition is
 | 
						|
encoded and simply trusting that it is, is as follows:
 | 
						|
 | 
						|
\begin{itemize}
 | 
						|
\item Each vector is partitioned into multiple equal sized chunks
 | 
						|
according to configuration specified.  If we have a vector size of
 | 
						|
\emph{n}, a partition size \emph{residue\_partition\_size}, and a total
 | 
						|
of \emph{ch} residue vectors, the total number of partitioned chunks
 | 
						|
coded is \emph{n}/\emph{residue\_partition\_size}*\emph{ch}.  It is
 | 
						|
important to note that the integer division truncates.  In the below
 | 
						|
example, we assume an example \emph{residue\_partition\_size} of 8.
 | 
						|
 | 
						|
\item Each partition in each vector has a classification number that
 | 
						|
specifies which of multiple configured VQ codebook setups are used to
 | 
						|
decode that partition.  The classification numbers of each partition
 | 
						|
can be thought of as forming a vector in their own right, as in the
 | 
						|
illustration below.  Just as the residue vectors are coded in grouped
 | 
						|
partitions to increase encoding efficiency, the classification vector
 | 
						|
is also partitioned into chunks.  The integer elements of each scalar
 | 
						|
in a classification chunk are built into a single scalar that
 | 
						|
represents the classification numbers in that chunk.  In the below
 | 
						|
example, the classification codeword encodes two classification
 | 
						|
numbers.
 | 
						|
 | 
						|
\item The values in a residue vector may be encoded monolithically in a
 | 
						|
single pass through the residue vector, but more often efficient
 | 
						|
codebook design dictates that each vector is encoded as the additive
 | 
						|
sum of several passes through the residue vector using more than one
 | 
						|
VQ codebook.  Thus, each residue value potentially accumulates values
 | 
						|
from multiple decode passes.  The classification value associated with
 | 
						|
a partition is the same in each pass, thus the classification codeword
 | 
						|
is coded only in the first pass.
 | 
						|
 | 
						|
\end{itemize}
 | 
						|
 | 
						|
 | 
						|
\begin{center}
 | 
						|
\includegraphics[width=\textwidth]{residue-pack}
 | 
						|
\captionof{figure}{illustration of residue vector format}
 | 
						|
\end{center}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsection{residue 0}
 | 
						|
 | 
						|
Residue 0 and 1 differ only in the way the values within a residue
 | 
						|
partition are interleaved during partition encoding (visually treated
 | 
						|
as a black box--or cyan box or brown box--in the above figure).
 | 
						|
 | 
						|
Residue encoding 0 interleaves VQ encoding according to the
 | 
						|
dimension of the codebook used to encode a partition in a specific
 | 
						|
pass.  The dimension of the codebook need not be the same in multiple
 | 
						|
passes, however the partition size must be an even multiple of the
 | 
						|
codebook dimension.
 | 
						|
 | 
						|
As an example, assume a partition vector of size eight, to be encoded
 | 
						|
by residue 0 using codebook sizes of 8, 4, 2 and 1:
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
 | 
						|
            original residue vector: [ 0 1 2 3 4 5 6 7 ]
 | 
						|
 | 
						|
codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]
 | 
						|
 | 
						|
codebook dimensions = 4  encoded as: [ 0 2 4 6 ], [ 1 3 5 7 ]
 | 
						|
 | 
						|
codebook dimensions = 2  encoded as: [ 0 4 ], [ 1 5 ], [ 2 6 ], [ 3 7 ]
 | 
						|
 | 
						|
codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]
 | 
						|
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
It is worth mentioning at this point that no configurable value in the
 | 
						|
residue coding setup is restricted to a power of two.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsection{residue 1}
 | 
						|
 | 
						|
Residue 1 does not interleave VQ encoding.  It represents partition
 | 
						|
vector scalars in order.  As with residue 0, however, partition length
 | 
						|
must be an integer multiple of the codebook dimension, although
 | 
						|
dimension may vary from pass to pass.
 | 
						|
 | 
						|
As an example, assume a partition vector of size eight, to be encoded
 | 
						|
by residue 0 using codebook sizes of 8, 4, 2 and 1:
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
 | 
						|
            original residue vector: [ 0 1 2 3 4 5 6 7 ]
 | 
						|
 | 
						|
codebook dimensions = 8  encoded as: [ 0 1 2 3 4 5 6 7 ]
 | 
						|
 | 
						|
codebook dimensions = 4  encoded as: [ 0 1 2 3 ], [ 4 5 6 7 ]
 | 
						|
 | 
						|
codebook dimensions = 2  encoded as: [ 0 1 ], [ 2 3 ], [ 4 5 ], [ 6 7 ]
 | 
						|
 | 
						|
codebook dimensions = 1  encoded as: [ 0 ], [ 1 ], [ 2 ], [ 3 ], [ 4 ], [ 5 ], [ 6 ], [ 7 ]
 | 
						|
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsection{residue 2}
 | 
						|
 | 
						|
Residue type two can be thought of as a variant of residue type 1.
 | 
						|
Rather than encoding multiple passed-in vectors as in residue type 1,
 | 
						|
the \emph{ch} passed in vectors of length \emph{n} are first
 | 
						|
interleaved and flattened into a single vector of length
 | 
						|
\emph{ch}*\emph{n}.  Encoding then proceeds as in type 1. Decoding is
 | 
						|
as in type 1 with decode interleave reversed. If operating on a single
 | 
						|
vector to begin with, residue type 1 and type 2 are equivalent.
 | 
						|
 | 
						|
\begin{center}
 | 
						|
\includegraphics[width=\textwidth]{residue2}
 | 
						|
\captionof{figure}{illustration of residue type 2}
 | 
						|
\end{center}
 | 
						|
 | 
						|
 | 
						|
\subsection{Residue decode}
 | 
						|
 | 
						|
\subsubsection{header decode}
 | 
						|
 | 
						|
Header decode for all three residue types is identical.
 | 
						|
\begin{programlisting}
 | 
						|
  1) [residue\_begin] = read 24 bits as unsigned integer
 | 
						|
  2) [residue\_end] = read 24 bits as unsigned integer
 | 
						|
  3) [residue\_partition\_size] = read 24 bits as unsigned integer and add one
 | 
						|
  4) [residue\_classifications] = read 6 bits as unsigned integer and add one
 | 
						|
  5) [residue\_classbook] = read 8 bits as unsigned integer
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
\varname{[residue\_begin]} and
 | 
						|
\varname{[residue\_end]} select the specific sub-portion of
 | 
						|
each vector that is actually coded; it implements akin to a bandpass
 | 
						|
where, for coding purposes, the vector effectively begins at element
 | 
						|
\varname{[residue\_begin]} and ends at
 | 
						|
\varname{[residue\_end]}.  Preceding and following values in
 | 
						|
the unpacked vectors are zeroed.  Note that for residue type 2, these
 | 
						|
values as well as \varname{[residue\_partition\_size]}apply to
 | 
						|
the interleaved vector, not the individual vectors before interleave.
 | 
						|
\varname{[residue\_partition\_size]} is as explained above,
 | 
						|
\varname{[residue\_classifications]} is the number of possible
 | 
						|
classification to which a partition can belong and
 | 
						|
\varname{[residue\_classbook]} is the codebook number used to
 | 
						|
code classification codewords.  The number of dimensions in book
 | 
						|
\varname{[residue\_classbook]} determines how many
 | 
						|
classification values are grouped into a single classification
 | 
						|
codeword.  Note that the number of entries and dimensions in book
 | 
						|
\varname{[residue\_classbook]}, along with
 | 
						|
\varname{[residue\_classifications]}, overdetermines to
 | 
						|
possible number of classification codewords.  
 | 
						|
If \varname{[residue\_classifications]}\^{}\varname{[residue\_classbook]}.dimensions
 | 
						|
exceeds \varname{[residue\_classbook]}.entries, the
 | 
						|
bitstream should be regarded to be undecodable.
 | 
						|
 | 
						|
Next we read a bitmap pattern that specifies which partition classes
 | 
						|
code values in which passes.
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
  1) iterate [i] over the range 0 ... [residue\_classifications]-1 {
 | 
						|
 | 
						|
       2) [high\_bits] = 0
 | 
						|
       3) [low\_bits] = read 3 bits as unsigned integer
 | 
						|
       4) [bitflag] = read one bit as boolean
 | 
						|
       5) if ( [bitflag] is set ) then [high\_bits] = read five bits as unsigned integer
 | 
						|
       6) vector [residue\_cascade] element [i] = [high\_bits] * 8 + [low\_bits]
 | 
						|
     }
 | 
						|
  7) done
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
Finally, we read in a list of book numbers, each corresponding to
 | 
						|
specific bit set in the cascade bitmap.  We loop over the possible
 | 
						|
codebook classifications and the maximum possible number of encoding
 | 
						|
stages (8 in Vorbis I, as constrained by the elements of the cascade
 | 
						|
bitmap being eight bits):
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
  1) iterate [i] over the range 0 ... [residue\_classifications]-1 {
 | 
						|
 | 
						|
       2) iterate [j] over the range 0 ... 7 {
 | 
						|
 | 
						|
            3) if ( vector [residue\_cascade] element [i] bit [j] is set ) {
 | 
						|
 | 
						|
                 4) array [residue\_books] element [i][j] = read 8 bits as unsigned integer
 | 
						|
 | 
						|
               } else {
 | 
						|
 | 
						|
                 5) array [residue\_books] element [i][j] = unused
 | 
						|
 | 
						|
               }
 | 
						|
          }
 | 
						|
      }
 | 
						|
 | 
						|
  6) done
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
An end-of-packet condition at any point in header decode renders the
 | 
						|
stream undecodable.  In addition, any codebook number greater than the
 | 
						|
maximum numbered codebook set up in this stream also renders the
 | 
						|
stream undecodable. All codebooks in array [residue\_books] are
 | 
						|
required to have a value mapping.  The presence of codebook in array
 | 
						|
[residue\_books] without a value mapping (maptype equals zero) renders
 | 
						|
the stream undecodable.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsubsection{packet decode}
 | 
						|
 | 
						|
Format 0 and 1 packet decode is identical except for specific
 | 
						|
partition interleave.  Format 2 packet decode can be built out of the
 | 
						|
format 1 decode process.  Thus we describe first the decode
 | 
						|
infrastructure identical to all three formats.
 | 
						|
 | 
						|
In addition to configuration information, the residue decode process
 | 
						|
is passed the number of vectors in the submap bundle and a vector of
 | 
						|
flags indicating if any of the vectors are not to be decoded.  If the
 | 
						|
passed in number of vectors is 3 and vector number 1 is marked 'do not
 | 
						|
decode', decode skips vector 1 during the decode loop.  However, even
 | 
						|
'do not decode' vectors are allocated and zeroed.
 | 
						|
 | 
						|
Depending on the values of \varname{[residue\_begin]} and
 | 
						|
\varname{[residue\_end]}, it is obvious that the encoded
 | 
						|
portion of a residue vector may be the entire possible residue vector
 | 
						|
or some other strict subset of the actual residue vector size with
 | 
						|
zero padding at either uncoded end.  However, it is also possible to
 | 
						|
set \varname{[residue\_begin]} and
 | 
						|
\varname{[residue\_end]} to specify a range partially or
 | 
						|
wholly beyond the maximum vector size.  Before beginning residue
 | 
						|
decode, limit \varname{[residue\_begin]} and
 | 
						|
\varname{[residue\_end]} to the maximum possible vector size
 | 
						|
as follows.  We assume that the number of vectors being encoded,
 | 
						|
\varname{[ch]} is provided by the higher level decoding
 | 
						|
process.
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
  1) [actual\_size] = current blocksize/2;
 | 
						|
  2) if residue encoding is format 2
 | 
						|
       3) [actual\_size] = [actual\_size] * [ch];
 | 
						|
  4) [limit\_residue\_begin] = maximum of ([residue\_begin],[actual\_size]);
 | 
						|
  5) [limit\_residue\_end] = maximum of ([residue\_end],[actual\_size]);
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
The following convenience values are conceptually useful to clarifying
 | 
						|
the decode process:
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
  1) [classwords\_per\_codeword] = [codebook\_dimensions] value of codebook [residue\_classbook]
 | 
						|
  2) [n\_to\_read] = [limit\_residue\_end] - [limit\_residue\_begin]
 | 
						|
  3) [partitions\_to\_read] = [n\_to\_read] / [residue\_partition\_size]
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
Packet decode proceeds as follows, matching the description offered earlier in the document.
 | 
						|
\begin{programlisting}
 | 
						|
  1) allocate and zero all vectors that will be returned.
 | 
						|
  2) if ([n\_to\_read] is zero), stop; there is no residue to decode.
 | 
						|
  3) iterate [pass] over the range 0 ... 7 {
 | 
						|
 | 
						|
       4) [partition\_count] = 0
 | 
						|
 | 
						|
       5) while [partition\_count] is less than [partitions\_to\_read]
 | 
						|
 | 
						|
            6) if ([pass] is zero) {
 | 
						|
 | 
						|
                 7) iterate [j] over the range 0 .. [ch]-1 {
 | 
						|
 | 
						|
                      8) if vector [j] is not marked 'do not decode' {
 | 
						|
 | 
						|
                           9) [temp] = read from packet using codebook [residue\_classbook] in scalar context
 | 
						|
                          10) iterate [i] descending over the range [classwords\_per\_codeword]-1 ... 0 {
 | 
						|
 | 
						|
                               11) array [classifications] element [j],([i]+[partition\_count]) =
 | 
						|
                                   [temp] integer modulo [residue\_classifications]
 | 
						|
                               12) [temp] = [temp] / [residue\_classifications] using integer division
 | 
						|
 | 
						|
                              }
 | 
						|
 | 
						|
                         }
 | 
						|
 | 
						|
                    }
 | 
						|
 | 
						|
               }
 | 
						|
 | 
						|
           13) iterate [i] over the range 0 .. ([classwords\_per\_codeword] - 1) while [partition\_count]
 | 
						|
               is also less than [partitions\_to\_read] {
 | 
						|
 | 
						|
                 14) iterate [j] over the range 0 .. [ch]-1 {
 | 
						|
 | 
						|
                      15) if vector [j] is not marked 'do not decode' {
 | 
						|
 | 
						|
                           16) [vqclass] = array [classifications] element [j],[partition\_count]
 | 
						|
                           17) [vqbook] = array [residue\_books] element [vqclass],[pass]
 | 
						|
                           18) if ([vqbook] is not 'unused') {
 | 
						|
 | 
						|
                                19) decode partition into output vector number [j], starting at scalar
 | 
						|
                                    offset [limit\_residue\_begin]+[partition\_count]*[residue\_partition\_size] using
 | 
						|
                                    codebook number [vqbook] in VQ context
 | 
						|
                          }
 | 
						|
                     }
 | 
						|
 | 
						|
                 20) increment [partition\_count] by one
 | 
						|
 | 
						|
               }
 | 
						|
          }
 | 
						|
     }
 | 
						|
 | 
						|
 21) done
 | 
						|
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
An end-of-packet condition during packet decode is to be considered a
 | 
						|
nominal occurrence.  Decode returns the result of vector decode up to
 | 
						|
that point.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsubsection{format 0 specifics}
 | 
						|
 | 
						|
Format zero decodes partitions exactly as described earlier in the
 | 
						|
'Residue Format: residue 0' section.  The following pseudocode
 | 
						|
presents the same algorithm. Assume:
 | 
						|
 | 
						|
\begin{itemize}
 | 
						|
\item  \varname{[n]} is the value in \varname{[residue\_partition\_size]}
 | 
						|
\item \varname{[v]} is the residue vector
 | 
						|
\item \varname{[offset]} is the beginning read offset in [v]
 | 
						|
\end{itemize}
 | 
						|
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
 1) [step] = [n] / [codebook\_dimensions]
 | 
						|
 2) iterate [i] over the range 0 ... [step]-1 {
 | 
						|
 | 
						|
      3) vector [entry\_temp] = read vector from packet using current codebook in VQ context
 | 
						|
      4) iterate [j] over the range 0 ... [codebook\_dimensions]-1 {
 | 
						|
 | 
						|
           5) vector [v] element ([offset]+[i]+[j]*[step]) =
 | 
						|
	        vector [v] element ([offset]+[i]+[j]*[step]) +
 | 
						|
                vector [entry\_temp] element [j]
 | 
						|
 | 
						|
         }
 | 
						|
 | 
						|
    }
 | 
						|
 | 
						|
  6) done
 | 
						|
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsubsection{format 1 specifics}
 | 
						|
 | 
						|
Format 1 decodes partitions exactly as described earlier in the
 | 
						|
'Residue Format: residue 1' section.  The following pseudocode
 | 
						|
presents the same algorithm. Assume:
 | 
						|
 | 
						|
\begin{itemize}
 | 
						|
\item  \varname{[n]} is the value in
 | 
						|
\varname{[residue\_partition\_size]}
 | 
						|
\item \varname{[v]} is the residue vector
 | 
						|
\item \varname{[offset]} is the beginning read offset in [v]
 | 
						|
\end{itemize}
 | 
						|
 | 
						|
 | 
						|
\begin{programlisting}
 | 
						|
 1) [i] = 0
 | 
						|
 2) vector [entry\_temp] = read vector from packet using current codebook in VQ context
 | 
						|
 3) iterate [j] over the range 0 ... [codebook\_dimensions]-1 {
 | 
						|
 | 
						|
      4) vector [v] element ([offset]+[i]) =
 | 
						|
	  vector [v] element ([offset]+[i]) +
 | 
						|
          vector [entry\_temp] element [j]
 | 
						|
      5) increment [i]
 | 
						|
 | 
						|
    }
 | 
						|
 | 
						|
  6) if ( [i] is less than [n] ) continue at step 2
 | 
						|
  7) done
 | 
						|
\end{programlisting}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsubsection{format 2 specifics}
 | 
						|
 | 
						|
Format 2 is reducible to format 1.  It may be implemented as an additional step prior to and an additional post-decode step after a normal format 1 decode.
 | 
						|
 | 
						|
 | 
						|
Format 2 handles 'do not decode' vectors differently than residue 0 or
 | 
						|
1; if all vectors are marked 'do not decode', no decode occurrs.
 | 
						|
However, if at least one vector is to be decoded, all the vectors are
 | 
						|
decoded.  We then request normal format 1 to decode a single vector
 | 
						|
representing all output channels, rather than a vector for each
 | 
						|
channel.  After decode, deinterleave the vector into independent vectors, one for each output channel.  That is:
 | 
						|
 | 
						|
\begin{enumerate}
 | 
						|
 \item If all vectors 0 through \emph{ch}-1 are marked 'do not decode', allocate and clear a single vector \varname{[v]}of length \emph{ch*n} and skip step 2 below; proceed directly to the post-decode step.
 | 
						|
 \item Rather than performing format 1 decode to produce \emph{ch} vectors of length \emph{n} each, call format 1 decode to produce a single vector \varname{[v]} of length \emph{ch*n}.
 | 
						|
 \item Post decode: Deinterleave the single vector \varname{[v]} returned by format 1 decode as described above into \emph{ch} independent vectors, one for each outputchannel, according to:
 | 
						|
  \begin{programlisting}
 | 
						|
  1) iterate [i] over the range 0 ... [n]-1 {
 | 
						|
 | 
						|
       2) iterate [j] over the range 0 ... [ch]-1 {
 | 
						|
 | 
						|
            3) output vector number [j] element [i] = vector [v] element ([i] * [ch] + [j])
 | 
						|
 | 
						|
          }
 | 
						|
     }
 | 
						|
 | 
						|
  4) done
 | 
						|
  \end{programlisting}
 | 
						|
 | 
						|
\end{enumerate}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 |