mirror of
				https://github.com/cookiengineer/audacity
				synced 2025-11-04 08:04:06 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			186 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			186 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
 | 
						|
%!TEX root = Vorbis_I_spec.tex
 | 
						|
% $Id$
 | 
						|
\section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
 | 
						|
 | 
						|
\subsection{Overview}
 | 
						|
 | 
						|
This document describes using Ogg logical and physical transport
 | 
						|
streams to encapsulate Vorbis compressed audio packet data into file
 | 
						|
form.
 | 
						|
 | 
						|
The \xref{vorbis:spec:intro} provides an overview of the construction
 | 
						|
of Vorbis audio packets.
 | 
						|
 | 
						|
The \href{oggstream.html}{Ogg
 | 
						|
bitstream overview} and \href{framing.html}{Ogg logical
 | 
						|
bitstream and framing spec} provide detailed descriptions of Ogg
 | 
						|
transport streams. This specification document assumes a working
 | 
						|
knowledge of the concepts covered in these named backround
 | 
						|
documents.  Please read them first.
 | 
						|
 | 
						|
\subsubsection{Restrictions}
 | 
						|
 | 
						|
The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
 | 
						|
streams use Ogg transport streams in degenerate, unmultiplexed
 | 
						|
form only. That is:
 | 
						|
 | 
						|
\begin{itemize}
 | 
						|
 \item
 | 
						|
  A meta-headerless Ogg file encapsulates the Vorbis I packets
 | 
						|
 | 
						|
 \item
 | 
						|
  The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
 | 
						|
 | 
						|
 \item
 | 
						|
  The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
 | 
						|
 | 
						|
\end{itemize}
 | 
						|
 | 
						|
 | 
						|
This is not to say that it is not currently possible to multiplex
 | 
						|
Vorbis with other media types into a multi-stream Ogg file.  At the
 | 
						|
time this document was written, Ogg was becoming a popular container
 | 
						|
for low-bitrate movies consisting of DivX video and Vorbis audio.
 | 
						|
However, a 'Vorbis I audio file' is taken to imply Vorbis audio
 | 
						|
existing alone within a degenerate Ogg stream.  A compliant 'Vorbis
 | 
						|
audio player' is not required to implement Ogg support beyond the
 | 
						|
specific support of Vorbis within a degenrate Ogg stream (naturally,
 | 
						|
application authors are encouraged to support full multiplexed Ogg
 | 
						|
handling).
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
\subsubsection{MIME type}
 | 
						|
 | 
						|
The MIME type of Ogg files depend on the context.  Specifically, complex
 | 
						|
multimedia and applications should use \literal{application/ogg},
 | 
						|
while visual media should use \literal{video/ogg}, and audio
 | 
						|
\literal{audio/ogg}.  Vorbis data encapsulated in Ogg may appear
 | 
						|
in any of those types.  RTP encapsulated Vorbis should use
 | 
						|
\literal{audio/vorbis} + \literal{audio/vorbis-config}.
 | 
						|
 | 
						|
 | 
						|
\subsection{Encapsulation}
 | 
						|
 | 
						|
Ogg encapsulation of a Vorbis packet stream is straightforward.
 | 
						|
 | 
						|
\begin{itemize}
 | 
						|
 | 
						|
\item
 | 
						|
  The first Vorbis packet (the identification header), which
 | 
						|
  uniquely identifies a stream as Vorbis audio, is placed alone in the
 | 
						|
  first page of the logical Ogg stream.  This results in a first Ogg
 | 
						|
  page of exactly 58 bytes at the very beginning of the logical stream.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  This first page is marked 'beginning of stream' in the page flags.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  The second and third vorbis packets (comment and setup
 | 
						|
  headers) may span one or more pages beginning on the second page of
 | 
						|
  the logical stream.  However many pages they span, the third header
 | 
						|
  packet finishes the page on which it ends.  The next (first audio) packet
 | 
						|
  must begin on a fresh page.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  The granule position of these first pages containing only headers is zero.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  The first audio packet of the logical stream begins a fresh Ogg page.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  Packets are placed into ogg pages in order until the end of stream.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  The last page is marked 'end of stream' in the page flags.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  Vorbis packets may span page boundaries.
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  The granule position of pages containing Vorbis audio is in units
 | 
						|
  of PCM audio samples (per channel; a stereo stream's granule position
 | 
						|
  does not increment at twice the speed of a mono stream).
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  The granule position of a page represents the end PCM sample
 | 
						|
  position of the last packet \emph{completed} on that
 | 
						|
  page.  The 'last PCM sample' is the last complete sample returned by
 | 
						|
  decode, not an internal sample awaiting lapping with a
 | 
						|
  subsequent block.  A page that is entirely spanned by a single
 | 
						|
  packet (that completes on a subsequent page) has no granule
 | 
						|
  position, and the granule position is set to '-1'.
 | 
						|
 | 
						|
 | 
						|
  Note that the last decoded (fully lapped) PCM sample from a packet
 | 
						|
  is not necessarily the middle sample from that block. If, eg, the
 | 
						|
  current Vorbis packet encodes a "long block" and the next Vorbis
 | 
						|
  packet encodes a "short block", the last decodable sample from the
 | 
						|
  current packet be at position (3*long\_block\_length/4) -
 | 
						|
  (short\_block\_length/4).
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
    The granule (PCM) position of the first page need not indicate
 | 
						|
    that the stream started at position zero.  Although the granule
 | 
						|
    position belongs to the last completed packet on the page and a
 | 
						|
    valid granule position must be positive, by
 | 
						|
    inference it may indicate that the PCM position of the beginning
 | 
						|
    of audio is positive or negative.
 | 
						|
 | 
						|
 | 
						|
  \begin{itemize}
 | 
						|
    \item
 | 
						|
        A positive starting value simply indicates that this stream begins at
 | 
						|
        some positive time offset, potentially within a larger
 | 
						|
        program. This is a common case when connecting to the middle
 | 
						|
        of broadcast stream.
 | 
						|
 | 
						|
    \item
 | 
						|
        A negative value indicates that
 | 
						|
        output samples preceeding time zero should be discarded during
 | 
						|
        decoding; this technique is used to allow sample-granularity
 | 
						|
        editing of the stream start time of already-encoded Vorbis
 | 
						|
        streams.  The number of samples to be discarded must not exceed
 | 
						|
        the overlap-add span of the first two audio packets.
 | 
						|
 | 
						|
  \end{itemize}
 | 
						|
 | 
						|
 | 
						|
    In both of these cases in which the initial audio PCM starting
 | 
						|
    offset is nonzero, the second finished audio packet must flush the
 | 
						|
    page on which it appears and the third packet begin a fresh page.
 | 
						|
    This allows the decoder to always be able to perform PCM position
 | 
						|
    adjustments before needing to return any PCM data from synthesis,
 | 
						|
    resulting in correct positioning information without any aditional
 | 
						|
    seeking logic.
 | 
						|
 | 
						|
 | 
						|
  \begin{note}
 | 
						|
    Failure to do so should, at worst, cause a
 | 
						|
    decoder implementation to return incorrect positioning information
 | 
						|
    for seeking operations at the very beginning of the stream.
 | 
						|
  \end{note}
 | 
						|
 | 
						|
 | 
						|
\item
 | 
						|
  A granule position on the final page in a stream that indicates
 | 
						|
  less audio data than the final packet would normally return is used to
 | 
						|
  end the stream on other than even frame boundaries.  The difference
 | 
						|
  between the actual available data returned and the declared amount
 | 
						|
  indicates how many trailing samples to discard from the decoding
 | 
						|
  process.
 | 
						|
 | 
						|
\end{itemize}
 |