mirror of
https://github.com/cookiengineer/audacity
synced 2025-06-22 07:10:06 +02:00
449 lines
16 KiB
XML
449 lines
16 KiB
XML
<?xml version="1.0" standalone="no"?>
|
|
<!DOCTYPE section PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
|
|
|
|
]>
|
|
|
|
<section id="vorbis-spec-floor1">
|
|
<sectioninfo>
|
|
<releaseinfo>
|
|
$Id: 07-floor1.xml,v 1.1.1.1 2004-11-13 16:51:21 mbrubeck Exp $
|
|
</releaseinfo>
|
|
</sectioninfo>
|
|
<title>Floor type 1 setup and decode</title>
|
|
|
|
<section>
|
|
<title>Overview</title>
|
|
|
|
<para>
|
|
Vorbis floor type one uses a piecewise straight-line representation to
|
|
encode a spectral envelope curve. The representation plots this curve
|
|
mechanically on a linear frequency axis and a logarithmic (dB)
|
|
amplitude axis. The integer plotting algorithm used is similar to
|
|
Bresenham's algorithm.</para>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Floor 1 format</title>
|
|
|
|
<section><title>model</title>
|
|
|
|
<para>
|
|
Floor type one represents a spectral curve as a series of
|
|
line segments. Synthesis constructs a floor curve using iterative
|
|
prediction in a process roughly equivalent to the following simplified
|
|
description:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><simpara> the first line segment (base case) is a logical line spanning
|
|
from x_0,y_0 to x_1,y_1 where in the base case x_0=0 and x_1=[n], the
|
|
full range of the spectral floor to be computed.</simpara></listitem>
|
|
|
|
<listitem><simpara>the induction step chooses a point x_new within an existing
|
|
logical line segment and produces a y_new value at that point computed
|
|
from the existing line's y value at x_new (as plotted by the line) and
|
|
a difference value decoded from the bitstream packet.</simpara></listitem>
|
|
|
|
<listitem><simpara>floor computation produces two new line segments, one running from
|
|
x_0,y_0 to x_new,y_new and from x_new,y_new to x_1,y_1. This step is
|
|
performed logically even if y_new represents no change to the
|
|
amplitude value at x_new so that later refinement is additionally
|
|
bounded at x_new.</simpara></listitem>
|
|
|
|
<listitem><simpara>the induction step repeats, using a list of x values specified in
|
|
the codec setup header at floor 1 initialization time. Computation
|
|
is completed at the end of the x value list.</simpara></listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
Consider the following example, with values chosen for ease of
|
|
understanding rather than representing typical configuration:</para>
|
|
|
|
<para>
|
|
For the below example, we assume a floor setup with an [n] of 128.
|
|
The list of selected X values in increasing order is
|
|
0,16,32,48,64,80,96,112 and 128. In list order, the values interleave
|
|
as 0, 128, 64, 32, 96, 16, 48, 80 and 112. The corresponding
|
|
list-order Y values as decoded from an example packet are 110, 20, -5,
|
|
-45, 0, -25, -10, 30 and -10. We compute the floor in the following
|
|
way, beginning with the first line:</para>
|
|
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="floor1-1.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>[graph of example floor]</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
|
|
<para>
|
|
We now draw new logical lines to reflect the correction to new_Y, and
|
|
iterate for X positions 32 and 96:</para>
|
|
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="floor1-2.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>[graph of example floor]</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
|
|
<para>
|
|
Although the new Y value at X position 96 is unchanged, it is still
|
|
used later as an endpoint for further refinement. From here on, the
|
|
pattern should be clear; we complete the floor computation as follows:</para>
|
|
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="floor1-3.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>[graph of example floor]</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
|
|
<mediaobject>
|
|
<imageobject>
|
|
<imagedata fileref="floor1-4.png" format="PNG"/>
|
|
</imageobject>
|
|
<textobject>
|
|
<phrase>[graph of example floor]</phrase>
|
|
</textobject>
|
|
</mediaobject>
|
|
|
|
|
|
<para>
|
|
A more efficient algorithm with carefully defined integer rounding
|
|
behavior is used for actual decode, as described later. The actual
|
|
algorithm splits Y value computation and line plotting into two steps
|
|
with modifications to the above algorithm to eliminate noise
|
|
accumulation through integer roundoff/truncation. </para>
|
|
|
|
</section>
|
|
|
|
<section><title>header decode</title>
|
|
|
|
<para>
|
|
A list of floor X values is stored in the packet header in interleaved
|
|
format (used in list order during packet decode and synthesis). This
|
|
list is split into partitions, and each partition is assigned to a
|
|
partition class. X positions 0 and [n] are implicit and do not belong
|
|
to an explicit partition or partition class.</para>
|
|
|
|
<para>
|
|
A partition class consists of a representation vector width (the
|
|
number of Y values which the partition class encodes at once), a
|
|
'subclass' value representing the number of alternate entropy books
|
|
the partition class may use in representing Y values, the list of
|
|
[subclass] books and a master book used to encode which alternate
|
|
books were chosen for representation in a given packet. The
|
|
master/subclass mechanism is meant to be used as a flexible
|
|
representation cascade while still using codebooks only in a scalar
|
|
context.</para>
|
|
|
|
<screen>
|
|
|
|
1) [floor1_partitions] = read 5 bits as unsigned integer
|
|
2) [maximum_class] = -1
|
|
3) iterate [i] over the range 0 ... [floor1_partitions]-1 {
|
|
|
|
4) vector [floor1_partition_class_list] element [i] = read 4 bits as unsigned integer
|
|
|
|
}
|
|
|
|
5) [maximum_class] = largest integer scalar value in vector [floor1_partition_class_list]
|
|
6) iterate [i] over the range 0 ... [maximum_class] {
|
|
|
|
7) vector [floor1_class_dimensions] element [i] = read 3 bits as unsigned integer and add 1
|
|
8) vector [floor1_class_subclasses] element [i] = read 2 bits as unsigned integer
|
|
9) if ( vector [floor1_class_subclasses] element [i] is nonzero ) {
|
|
|
|
10) vector [floor1_class_masterbooks] element [i] = read 8 bits as unsigned integer
|
|
|
|
}
|
|
|
|
11) iterate [j] over the range 0 ... (2 exponent [floor1_class_subclasses] element [i]) - 1 {
|
|
|
|
12) array [floor1_subclass_books] element [i],[j] =
|
|
read 8 bits as unsigned integer and subtract one
|
|
}
|
|
}
|
|
|
|
13) [floor1_multiplier] = read 2 bits as unsigned integer and add one
|
|
14) [rangebits] = read 4 bits as unsigned integer
|
|
15) vector [floor1_X_list] element [0] = 0
|
|
16) vector [floor1_X_list] element [1] = 2 exponent [rangebits];
|
|
17) [floor1_values] = 2
|
|
18) iterate [i] over the range 0 ... [floor1_partitions]-1 {
|
|
|
|
19) [current_class_number] = vector [floor1_partition_class_list] element [i]
|
|
20) iterate [j] over the range 0 ... ([floor1_class_dimensions] element [current_class_number])-1 {
|
|
21) vector [floor1_X_list] element ([j] + [floor1_values]) =
|
|
read [rangebits] bits as unsigned integer
|
|
22) increment [floor1_values] by one
|
|
}
|
|
}
|
|
|
|
23) done
|
|
</screen>
|
|
|
|
<para>
|
|
An end-of-packet condition while reading any aspect of a floor 1
|
|
configuration during setup renders a stream undecodable. In
|
|
addition, a <varname>[floor1_class_masterbooks]</varname> or
|
|
<varname>[floor1_subclass_books]</varname> scalar element greater than the
|
|
highest numbered codebook configured in this stream is an error
|
|
condition that renders the stream undecodable.</para>
|
|
|
|
<section id="vorbis-spec-floor1-decode">
|
|
<title>packet decode</title>
|
|
|
|
<para>
|
|
Packet decode begins by checking the <varname>[nonzero]</varname> flag:</para>
|
|
|
|
<screen>
|
|
1) [nonzero] = read 1 bit as boolean
|
|
</screen>
|
|
|
|
<para>
|
|
If <varname>[nonzero]</varname> is unset, that indicates this channel contained
|
|
no audio energy in this frame. Decode immediately returns a status
|
|
indicating this floor curve (and thus this channel) is unused this
|
|
frame. (A return status of 'unused' is different from decoding a
|
|
floor that has all points set to minimum representation amplitude,
|
|
which happens to be approximately -140dB).
|
|
</para>
|
|
|
|
<para>
|
|
Assuming <varname>[nonzero]</varname> is set, decode proceeds as follows:</para>
|
|
|
|
<screen>
|
|
1) [range] = vector { 256, 128, 86, 64 } element ([floor1_multiplier]-1)
|
|
2) vector [floor1_Y] element [0] = read <link linkend="vorbis-spec-ilog">ilog</link>([range]-1) bits as unsigned integer
|
|
3) vector [floor1_Y] element [1] = read <link linkend="vorbis-spec-ilog">ilog</link>([range]-1) bits as unsigned integer
|
|
4) [offset] = 2;
|
|
5) iterate [i] over the range 0 ... [floor1_partitions]-1 {
|
|
|
|
6) [class] = vector [floor1_partition_class] element [i]
|
|
7) [cdim] = vector [floor1_class_dimensions] element [class]
|
|
8) [cbits] = vector [floor1_class_subclasses] element [class]
|
|
9) [csub] = (2 exponent [cbits])-1
|
|
10) [cval] = 0
|
|
11) if ( [cbits] is greater than zero ) {
|
|
|
|
12) [cval] = read from packet using codebook number
|
|
(vector [floor1_class_masterbooks] element [class]) in scalar context
|
|
}
|
|
|
|
13) iterate [j] over the range 0 ... [cdim]-1 {
|
|
|
|
14) [book] = array [floor1_subclass_books] element [class],([cval] bitwise AND [csub])
|
|
15) [cval] = [cval] right shifted [cbits] bits
|
|
16) if ( [book] is not less than zero ) {
|
|
|
|
17) vector [floor1_Y] element ([j]+[offset]) = read from packet using codebook
|
|
[book] in scalar context
|
|
|
|
} else [book] is less than zero {
|
|
|
|
18) vector [floor1_Y] element ([j]+[offset]) = 0
|
|
|
|
}
|
|
}
|
|
|
|
19) [offset] = [offset] + [cdim]
|
|
|
|
}
|
|
|
|
20) done
|
|
</screen>
|
|
|
|
<para>
|
|
An end-of-packet condition during curve decode should be considered a
|
|
nominal occurrence; if end-of-packet is reached during any read
|
|
operation above, floor decode is to return 'unused' status as if the
|
|
<varname>[nonzero]</varname> flag had been unset at the beginning of decode.
|
|
</para>
|
|
|
|
<para>
|
|
Vector <varname>[floor1_Y]</varname> contains the values from packet decode
|
|
needed for floor 1 synthesis.</para>
|
|
|
|
</section>
|
|
|
|
<section id="vorbis-spec-floor1-synth">
|
|
<title>curve computation</title>
|
|
|
|
<para>
|
|
Curve computation is split into two logical steps; the first step
|
|
derives final Y amplitude values from the encoded, wrapped difference
|
|
values taken from the bitstream. The second step plots the curve
|
|
lines. Also, although zero-difference values are used in the
|
|
iterative prediction to find final Y values, these points are
|
|
conditionally skipped during final line computation in step two.
|
|
Skipping zero-difference values allows a smoother line fit. </para>
|
|
|
|
<para>
|
|
Although some aspects of the below algorithm look like inconsequential
|
|
optimizations, implementors are warned to follow the details closely.
|
|
Deviation from implementing a strictly equivalent algorithm can result
|
|
in serious decoding errors.</para>
|
|
|
|
<section>
|
|
<title>step 1: amplitude value synthesis</title>
|
|
|
|
<para>
|
|
Unwrap the always-positive-or-zero values read from the packet into
|
|
+/- difference values, then apply to line prediction.</para>
|
|
|
|
<screen>
|
|
1) [range] = vector { 256, 128, 86, 64 } element ([floor1_multiplier]-1)
|
|
2) vector [floor1_step2_flag] element [0] = set
|
|
3) vector [floor1_step2_flag] element [1] = set
|
|
4) vector [floor1_final_Y] element [0] = vector [floor1_Y] element [0]
|
|
5) vector [floor1_final_Y] element [1] = vector [floor1_Y] element [1]
|
|
6) iterate [i] over the range 2 ... [floor1_values]-1 {
|
|
|
|
7) [low_neighbor_offset] = <link linkend="vorbis-spec-low_neighbor">low_neighbor</link>([floor1_X_list],[i])
|
|
8) [high_neighbor_offset] = <link linkend="vorbis-spec-high_neighbor">high_neighbor</link>([floor1_X_list],[i])
|
|
|
|
9) [predicted] = <link linkend="vorbis-spec-render_point">render_point</link>( vector [floor1_X_list] element [low_neighbor_offset],
|
|
vector [floor1_final_Y] element [low_neighbor_offset],
|
|
vector [floor1_X_list] element [high_neighbor_offset],
|
|
vector [floor1_final_Y] element [high_neighbor_offset],
|
|
vector [floor1_X_list] element [i] )
|
|
|
|
10) [val] = vector [floor1_Y] element [i]
|
|
11) [highroom] = [range] - [predicted]
|
|
12) [lowroom] = [predicted]
|
|
13) if ( [highroom] is less than [lowroom] ) {
|
|
|
|
14) [room] = [highroom] * 2
|
|
|
|
} else [highroom] is not less than [lowroom] {
|
|
|
|
15) [root] = [lowroom] * 2
|
|
|
|
}
|
|
|
|
16) if ( [val] is nonzero ) {
|
|
|
|
17) vector [floor1_step2_flag] element [low_neighbor_offset] = set
|
|
18) vector [floor1_step2_flag] element [high_neighbor_offset] = set
|
|
19) vector [floor1_step2_flag] element [i] = set
|
|
20) if ( [val] is greater than or equal to [room] ) {
|
|
|
|
21) if ( [highroom] is greater than [lowroom] ) {
|
|
|
|
22) vector [floor1_final_Y] element [i] = [val] - [lowroom] + [predicted]
|
|
|
|
} else [highroom] is not greater than [lowroom] {
|
|
|
|
23) vector [floor1_final_Y] element [i] = [predicted] - [val] + [highroom] - 1
|
|
|
|
}
|
|
|
|
} else [val] is less than [room] {
|
|
|
|
24) if ([val] is odd) {
|
|
|
|
25) vector [floor1_final_Y] element [i] =
|
|
[predicted] - (([val] + 1) divided by 2 using integer division)
|
|
|
|
} else [val] is even {
|
|
|
|
26) vector [floor1_final_Y] element [i] =
|
|
[predicted] + ([val] / 2 using integer division)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} else [val] is zero {
|
|
|
|
27) vector [floor1_step2_flag] element [i] = unset
|
|
28) vector [floor1_final_Y] element [i] = [predicted]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
29) done
|
|
|
|
</screen>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>step 2: curve synthesis</title>
|
|
|
|
<para>
|
|
Curve synthesis generates a return vector <varname>[floor]</varname> of length
|
|
<varname>[n]</varname> (where <varname>[n]</varname> is provided by the decode process
|
|
calling to floor decode). Floor 1 curve synthesis makes use of the
|
|
<varname>[floor1_X_list]</varname>, <varname>[floor1_final_Y]</varname> and
|
|
<varname>[floor1_step2_flag]</varname> vectors, as well as [floor1_multiplier]
|
|
and [floor1_values] values.</para>
|
|
|
|
<para>
|
|
Decode begins by sorting the scalars from vectors
|
|
<varname>[floor1_X_list]</varname>, <varname>[floor1_final_Y]</varname> and
|
|
<varname>[floor1_step2_flag]</varname> together into new vectors
|
|
<varname>[floor1_X_list]'</varname>, <varname>[floor1_final_Y]'</varname> and
|
|
<varname>[floor1_step2_flag]'</varname> according to ascending sort order of the
|
|
values in <varname>[floor1_X_list]</varname>. That is, sort the values of
|
|
<varname>[floor1_X_list]</varname> and then apply the same permutation to
|
|
elements of the other two vectors so that the X, Y and step2_flag
|
|
values still match.</para>
|
|
|
|
<para>
|
|
Then compute the final curve in one pass:</para>
|
|
|
|
<screen>
|
|
1) [hx] = 0
|
|
2) [lx] = 0
|
|
3) [ly] = vector [floor1_final_Y]' element [0] * [floor1_multiplier]
|
|
4) iterate [i] over the range 1 ... [floor1_values]-1 {
|
|
|
|
5) if ( [floor1_step2_flag]' is set ) {
|
|
|
|
6) [hy] = [floor1_final_Y]' element [i] * [floor1_multiplier]
|
|
7) [hx] = [floor1_X_list]' element [i]
|
|
8) <link linkend="vorbis-spec-render_line">render_line</link>( [lx], [ly], [hx], [hy], [floor] )
|
|
9) [lx] = [hx]
|
|
10) [ly] = [hy]
|
|
}
|
|
}
|
|
|
|
11) if ( [hx] is less than [n] ) {
|
|
|
|
12) <link linkend="vorbis-spec-render_line">render_line</link>( [hx], [hy], [n], [hy], [floor] )
|
|
|
|
}
|
|
|
|
13) if ( [hx] is greater than [n] ) {
|
|
|
|
14) truncate vector [floor] to [n] elements
|
|
|
|
}
|
|
|
|
15) for each scalar in vector [floor], perform a lookup substitution using
|
|
the scalar value from [floor] as an offset into the vector <link linkend="vorbis-spec-floor1_inverse_dB_table">[floor1_inverse_dB_static_table]</link>
|
|
|
|
16) done
|
|
|
|
</screen>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
</section>
|
|
</section>
|
|
|