1
0
mirror of https://github.com/cookiengineer/audacity synced 2025-05-04 17:49:45 +02:00
audacity/lib-src/libogg/doc/xml/a1-encapsulation_ogg.xml
2010-01-24 09:19:39 +00:00

202 lines
6.9 KiB
XML

<?xml version="1.0" standalone="no"?>
<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
]>
<appendix id="vorbis-over-ogg">
<appendixinfo>
<releaseinfo>
$Id: a1-encapsulation_ogg.xml,v 1.1.1.1 2004-11-13 16:51:21 mbrubeck Exp $
</releaseinfo>
</appendixinfo>
<title>Embedding Vorbis into an Ogg stream</title>
<section>
<title>Overview</title>
<para>
This document describes using Ogg logical and physical transport
streams to encapsulate Vorbis compressed audio packet data into file
form.</para>
<para>
The <xref linkend="vorbis-spec-intro"/> provides an overview of the construction
of Vorbis audio packets.</para>
<para>
The <ulink url="oggstream.html">Ogg
bitstream overview</ulink> and <ulink url="framing.html">Ogg logical
bitstream and framing spec</ulink> provide detailed descriptions of Ogg
transport streams. This specification document assumes a working
knowledge of the concepts covered in these named backround
documents. Please read them first.</para>
<section><title>Restrictions</title>
<para>
The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
streams use Ogg transport streams in degenerate, unmultiplexed
form only. That is:
<itemizedlist>
<listitem><simpara>
A meta-headerless Ogg file encapsulates the Vorbis I packets
</simpara></listitem>
<listitem><simpara>
The Ogg stream may be chained, i.e. contain multiple, contigous logical streams (links).
</simpara></listitem>
<listitem><simpara>
The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
</simpara></listitem>
</itemizedlist>
</para>
<para>
This is not to say that it is not currently possible to multiplex
Vorbis with other media types into a multi-stream Ogg file. At the
time this document was written, Ogg was becoming a popular container
for low-bitrate movies consisting of DiVX video and Vorbis audio.
However, a 'Vorbis I audio file' is taken to imply Vorbis audio
existing alone within a degenerate Ogg stream. A compliant 'Vorbis
audio player' is not required to implement Ogg support beyond the
specific support of Vorbis within a degenrate ogg stream (naturally,
application authors are encouraged to support full multiplexed Ogg
handling).
</para>
</section>
<section><title>MIME type</title>
<para>
The correct MIME type of any Ogg file is <literal>application/ogg</literal>.
However, if a file is a Vorbis I audio file (which implies a
degenerate Ogg stream including only unmultiplexed Vorbis audio), the
mime type <literal>audio/x-vorbis</literal> is also allowed.</para>
</section>
</section>
<section>
<title>Encapsulation</title>
<para>
Ogg encapsulation of a Vorbis packet stream is straightforward.</para>
<itemizedlist>
<listitem><simpara>
The first Vorbis packet (the identification header), which
uniquely identifies a stream as Vorbis audio, is placed alone in the
first page of the logical Ogg stream. This results in a first Ogg
page of exactly 58 bytes at the very beginning of the logical stream.
</simpara></listitem>
<listitem><simpara>
This first page is marked 'beginning of stream' in the page flags.
</simpara></listitem>
<listitem><simpara>
The second and third vorbis packets (comment and setup
headers) may span one or more pages beginning on the second page of
the logical stream. However many pages they span, the third header
packet finishes the page on which it ends. The next (first audio) packet
must begin on a fresh page.
</simpara></listitem>
<listitem><simpara>
The granule position of these first pages containing only headers is zero.
</simpara></listitem>
<listitem><simpara>
The first audio packet of the logical stream begins a fresh Ogg page.
</simpara></listitem>
<listitem><simpara>
Packets are placed into ogg pages in order until the end of stream.
</simpara></listitem>
<listitem><simpara>
The last page is marked 'end of stream' in the page flags.
</simpara></listitem>
<listitem><simpara>
Vorbis packets may span page boundaries.
</simpara></listitem>
<listitem><simpara>
The granule position of pages containing Vorbis audio is in units
of PCM audio samples (per channel; a stereo stream's granule position
does not increment at twice the speed of a mono stream).
</simpara></listitem>
<listitem><simpara>
The granule position of a page represents the end PCM sample
position of the last packet <emphasis>completed</emphasis> on that page.
A page that is entirely spanned by a single packet (that completes on a
subsequent page) has no granule position, and the granule position is
set to '-1'.
</simpara></listitem>
<listitem>
<simpara>
The granule (PCM) position of the first page need not indicate
that the stream started at position zero. Although the granule
position belongs to the last completed packet on the page and a
valid granule position must be positive, by
inference it may indicate that the PCM position of the beginning
of audio is positive or negative.
</simpara>
<itemizedlist>
<listitem><simpara>
A positive starting value simply indicates that this stream begins at
some positive time offset, potentially within a larger
program. This is a common case when connecting to the middle
of broadcast stream.
</simpara></listitem>
<listitem><simpara>
A negative value indicates that
output samples preceeding time zero should be discarded during
decoding; this technique is used to allow sample-granularity
editing of the stream start time of already-encoded Vorbis
streams. The number of samples to be discarded must not exceed
the overlap-add span of the first two audio packets.
</simpara></listitem>
</itemizedlist>
<simpara>
In both of these cases in which the initial audio PCM starting
offset is nonzero, the second finished audio packet must flush the
page on which it appears and the third packet begin a fresh page.
This allows the decoder to always be able to perform PCM position
adjustments before needing to return any PCM data from synthesis,
resulting in correct positioning information without any aditional
seeking logic.
</simpara>
<note><simpara>
Failure to do so should, at worst, cause a
decoder implementation to return incorrect positioning information
for seeking operations at the very beginning of the stream.
</simpara></note>
</listitem>
<listitem><simpara>
A granule position on the final page in a stream that indicates
less audio data than the final packet would normally return is used to
end the stream on other than even frame boundaries. The difference
between the actual available data returned and the declared amount
indicates how many trailing samples to discard from the decoding
process.
</simpara></listitem>
</itemizedlist>
</section>
</appendix>
<!-- end appendix on Vorbis encapsulation in Ogg -->