mirror of
https://github.com/cookiengineer/audacity
synced 2025-09-18 17:10:55 +02:00
Roger Dannenberg's explanatory comments for new MidiTime function
This commit is contained in:
parent
51296237da
commit
d6657df045
246
src/AudioIO.cpp
246
src/AudioIO.cpp
@ -81,6 +81,78 @@
|
|||||||
speed at mTime. This effectively integrates speed to get position.
|
speed at mTime. This effectively integrates speed to get position.
|
||||||
Negative speeds are allowed too, for instance in scrubbing.
|
Negative speeds are allowed too, for instance in scrubbing.
|
||||||
|
|
||||||
|
\par The Big Picture
|
||||||
|
@verbatim
|
||||||
|
|
||||||
|
Sample
|
||||||
|
Time (in seconds, = total_sample_count / sample_rate)
|
||||||
|
^
|
||||||
|
| / /
|
||||||
|
| y=x-mSystemTimeMinusAudioTime / /
|
||||||
|
| / # /
|
||||||
|
| / /
|
||||||
|
| / # <- callbacks (#) showing
|
||||||
|
| /# / lots of timing jitter.
|
||||||
|
| top line is "full buffer" / / Some are later,
|
||||||
|
| condition / / indicating buffer is
|
||||||
|
| / / getting low. Plot
|
||||||
|
| / # / shows sample time
|
||||||
|
| / # / (based on how many
|
||||||
|
| / # / samples previously
|
||||||
|
| / / *written*) vs. real
|
||||||
|
| / # / time.
|
||||||
|
| /<------->/ audio latency
|
||||||
|
| /# v/
|
||||||
|
| / / bottom line is "empty buffer"
|
||||||
|
| / # / condition = DAC output time =
|
||||||
|
| / /
|
||||||
|
| / # <-- rapid callbacks as buffer is filled
|
||||||
|
| / /
|
||||||
|
0 +...+---------#---------------------------------------------------->
|
||||||
|
0 ^ | | real time
|
||||||
|
| | first callback time
|
||||||
|
| mSystemMinusAudioTime
|
||||||
|
|
|
||||||
|
Probably the actual real times shown in this graph are very large
|
||||||
|
in practice (> 350,000 sec.), so the X "origin" might be when
|
||||||
|
the computer was booted or 1970 or something.
|
||||||
|
|
||||||
|
|
||||||
|
@endverbatim
|
||||||
|
|
||||||
|
To estimate the true DAC time (needed to synchronize MIDI), we need
|
||||||
|
a mapping from track time to DAC time. The estimate is the theoretical
|
||||||
|
time of the full buffer (top diagonal line) + audio latency. To
|
||||||
|
estimate the top diagonal line, we "draw" the line to be at least
|
||||||
|
as high as any sample time corresponding to a callback (#), and we
|
||||||
|
slowly lower the line in case the sample clock is slow or the system
|
||||||
|
clock is fast, preventing the estimated line from drifting too far
|
||||||
|
from the actual callback observations. The line is occasionally
|
||||||
|
"bumped" up by new callback observations, but continuously
|
||||||
|
"lowered" at a very low rate. All adjustment is accomplished
|
||||||
|
by changing mSystemMinusAudioTime, shown here as the X-intercept.\n
|
||||||
|
theoreticalFullBufferTime = realTime - mSystemMinusAudioTime\n
|
||||||
|
To estimate audio latency, notice that the first callback happens on
|
||||||
|
an empty buffer, but the buffer soon fills up. This will cause a rapid
|
||||||
|
re-estimation of mSystemMinusAudioTime. (The first estimate of
|
||||||
|
mSystemMinusAudioTime will simply be the real time of the first
|
||||||
|
callback time.) By watching these changes, which happen within ms of
|
||||||
|
starting, we can estimate the buffer size and thus audio latency.
|
||||||
|
So, to map from track time to real time, we compute:\n
|
||||||
|
DACoutputTime = trackTime + mSystemMinusAudioTime\n
|
||||||
|
There are some additional details to avoid counting samples while
|
||||||
|
paused or while waiting for initialization, MIDI latency, etc.
|
||||||
|
Also, in the code, track time is measured with respect to the track
|
||||||
|
origin, so there's an extra term to add (mT0) if you start somewhere
|
||||||
|
in the middle of the track.
|
||||||
|
Finally, when a callback occurs, you might expect there is room in
|
||||||
|
the output buffer for the requested frames, so maybe the "full buffer"
|
||||||
|
sample time should be based not on the first sample of the callback, but
|
||||||
|
the last sample time + 1 sample. I suspect, at least on Linux, that the
|
||||||
|
callback occurs as soon as the last callback completes, so the buffer is
|
||||||
|
really full, and the callback thread is going to block waiting for space
|
||||||
|
in the output buffer.
|
||||||
|
|
||||||
\par Midi Time
|
\par Midi Time
|
||||||
MIDI is not warped according to the speed control. This might be
|
MIDI is not warped according to the speed control. This might be
|
||||||
something that should be changed. (Editorial note: Wouldn't it
|
something that should be changed. (Editorial note: Wouldn't it
|
||||||
@ -95,33 +167,61 @@
|
|||||||
\par
|
\par
|
||||||
Therefore, we define the following interface for MIDI timing:
|
Therefore, we define the following interface for MIDI timing:
|
||||||
\li \c AudioTime() is the time based on all samples written so far, including zeros output during pauses. AudioTime() is based on the start location mT0, not zero.
|
\li \c AudioTime() is the time based on all samples written so far, including zeros output during pauses. AudioTime() is based on the start location mT0, not zero.
|
||||||
\li \c PauseTime() is the amount of time spent paused, based on a count of zero samples output.
|
\li \c PauseTime() is the amount of time spent paused, based on a count of zero-padding samples output.
|
||||||
\li \c MidiTime() is an estimate in milliseconds of the current audio output time + 1s. In other words, what audacity track time corresponds to the audio (including pause insertions) at the output?
|
\li \c MidiTime() is an estimate in milliseconds of the current audio output time + 1s. In other words, what audacity track time corresponds to the audio (plus pause insertions) at the DAC output?
|
||||||
|
|
||||||
\par AudioTime() and PauseTime() computation
|
\par AudioTime() and PauseTime() computation
|
||||||
AudioTime() is simply mT0 + mNumFrames / mRate.
|
AudioTime() is simply mT0 + mNumFrames / mRate.
|
||||||
mNumFrames is incremented in each audio callback. Similarly, PauseTime()
|
mNumFrames is incremented in each audio callback. Similarly, PauseTime()
|
||||||
is mNumPauseFrames / mRate. mNumPauseFrames is also incremented in
|
is mNumPauseFrames / mRate. mNumPauseFrames is also incremented in
|
||||||
each audio callback when a pause is in effect.
|
each audio callback when a pause is in effect or audio output is ready to start.
|
||||||
|
|
||||||
\par MidiTime() computation
|
\par MidiTime() computation
|
||||||
MidiTime() is computed based on information from PortAudio's callback,
|
MidiTime() is computed based on information from PortAudio's callback,
|
||||||
which estimates the system time at which the current audio buffer will
|
which estimates the system time at which the current audio buffer will
|
||||||
be output. Consider the (unimplemented) function RealToTrack() that
|
be output. Consider the (unimplemented) function RealToTrack() that
|
||||||
maps real time to track time. If outputTime is PortAudio's time
|
maps real audio write time to track time. If writeTime is the system
|
||||||
estimate for the most recent output buffer, then \n
|
time for the first sample of the current output buffer, and
|
||||||
RealToTrack(outputTime) = AudioTime() - PauseTime() - bufferDuration \n
|
if we are in the callback, so AudioTime() also refers to the first sample
|
||||||
We want to know RealToTrack of the current time, so we use this
|
of the buffer, then \n
|
||||||
approximation for small d: \n
|
RealToTrack(writeTime) = AudioTime() - PauseTime()\n
|
||||||
RealToTrack(t + d) = RealToTrack(t) + d \n
|
We want to know RealToTrack of the current time (when we are not in the
|
||||||
Letting t = outputTime and d = (systemTime - outputTime), we can
|
callback, so we use this approximation for small d: \n
|
||||||
|
RealToTrack(t + d) = RealToTrack(t) + d, or \n
|
||||||
|
Letting t = writeTime and d = (systemTime - writeTime), we can
|
||||||
substitute to get:\n
|
substitute to get:\n
|
||||||
RealToTrack(systemTime) = AudioTime() - PauseTime() - bufferduration + (systemTime - outputTime) \n
|
RealToTrack(systemTime)
|
||||||
MidiTime() should include pause time, so add PauseTime() to both sides of
|
= RealToTrack(writeTime) + systemTime - writeTime\n
|
||||||
the equation. Also MidiTime() is offset by 1 second to avoid negative
|
= AudioTime() - PauseTime() + (systemTime - writeTime) \n
|
||||||
time at startup, so add 1 to both sides:
|
MidiTime() should include pause time, so that it increases smoothly,
|
||||||
MidiTime() in seconds = RealToTrack(systemTime) + PauseTime() + 1 = \n
|
and audioLatency so that MidiTime() corresponds to the time of audio
|
||||||
AudioTime() - bufferduration + (systemTime - outputTime) + 1
|
output rather than audio write times. Also MidiTime() is offset by 1
|
||||||
|
second to avoid negative time at startup, so add 1: \n
|
||||||
|
MidiTime(systemTime) in seconds\n
|
||||||
|
= RealToTrack(systemTime) + PauseTime() - audioLatency + 1 \n
|
||||||
|
= AudioTime() + (systemTime - writeTime) - audioLatency + 1 \n
|
||||||
|
(Note that audioLatency is called mAudioOutLatency in the code.)
|
||||||
|
When we schedule a MIDI event with track time TT, we need
|
||||||
|
to map TT to a PortMidi timestamp. The PortMidi timestamp is exactly
|
||||||
|
MidiTime(systemTime) in ms units, and \n
|
||||||
|
MidiTime(x) = RealToTrack(x) + PauseTime() + 1, so \n
|
||||||
|
timestamp = TT + PauseTime() + 1 - midiLatency \n
|
||||||
|
Note 1: The timestamp is incremented by the PortMidi stream latency
|
||||||
|
(midiLatency) so we subtract midiLatency here for the timestamp
|
||||||
|
passed to PortMidi. \n
|
||||||
|
Note 2: Here, we're setting x to the time at which RealToTrack(x) = TT,
|
||||||
|
so then MidiTime(x) is the desired timestamp. To be completely
|
||||||
|
correct, we should assume that MidiTime(x + d) = MidiTime(x) + d,
|
||||||
|
and consider that we compute MidiTime(systemTime) based on the
|
||||||
|
*current* system time, but we really want the MidiTime(x) for some
|
||||||
|
future time corresponding when MidiTime(x) = TT.)
|
||||||
|
|
||||||
|
\par
|
||||||
|
Also, we should assume PortMidi was opened with mMidiLatency, and that
|
||||||
|
MIDI messages become sound with a delay of mSynthLatency. Therefore,
|
||||||
|
the final timestamp calculation is: \n
|
||||||
|
timestamp = TT + PauseTime() + 1 - (mMidiLatency + mSynthLatency) \n
|
||||||
|
(All units here are seconds; some conversion is needed in the code.)
|
||||||
|
|
||||||
\par
|
\par
|
||||||
The difference AudioTime() - PauseTime() is the time "cursor" for
|
The difference AudioTime() - PauseTime() is the time "cursor" for
|
||||||
@ -129,34 +229,92 @@
|
|||||||
unsynchronized. In particular, MIDI will not be synchronized with
|
unsynchronized. In particular, MIDI will not be synchronized with
|
||||||
the visual cursor, which moves with scaled time reported in mTime.
|
the visual cursor, which moves with scaled time reported in mTime.
|
||||||
|
|
||||||
\par Midi Synchronization
|
\par Timing in Linux
|
||||||
The goal of MIDI playback is to deliver MIDI messages synchronized to
|
It seems we cannot get much info from Linux. We can read the time
|
||||||
audio (assuming no speed variation for now). If a midi event has time
|
when we get a callback, and we get a variable frame count (it changes
|
||||||
tmidi, then the timestamp for that message should be \n
|
from one callback to the next). Returning to the RealToTrack()
|
||||||
timestamp (in seconds) = tmidi + PauseTime() + 1.0 - latency.\n
|
equations above: \n
|
||||||
(This is actually off by 1ms; see "PortMidi Latency Parameter" below for
|
RealToTrack(outputTime) = AudioTime() - PauseTime() - bufferDuration \n
|
||||||
more detail.)
|
where outputTime should be PortAudio's estimate for the most recent output
|
||||||
Notice the extra 1.0, added because MidiTime() is offset by 1s to avoid
|
buffer, but at least on my Dell Latitude E7450, PortAudio is getting zero
|
||||||
starting at a negative value. Also notice that we subtract latency.
|
from ALSA, so we need to find a proxy for this.
|
||||||
The user must set device latency using preferences. Some software
|
|
||||||
synthesizers have very high latency (on the order of 100ms), so unless
|
\par Estimating outputTime (Plan A, assuming double-buffered, fixed-size buffers, please skip to Plan B)
|
||||||
we lower timestamps and send messages early, the final output will not
|
One can expect the audio callback to happen as soon as there is room in
|
||||||
be synchronized.
|
the output for another block of samples, so we could just measure system
|
||||||
This timestamp is interpreted by PortMidi relative to MidiTime(), which
|
time at the top of the callback. Then we could add the maximum delay
|
||||||
is synchronized to audio output. So the only thing we need to do is
|
buffered in the system. E.g. if there is simple double buffering and the
|
||||||
output Midi messages shortly before they will be played with the correct
|
callback is computing one of the buffers, the callback happens just as
|
||||||
timestamp. We will take "shortly before" to mean "at about the same time
|
one of the buffers empties, meaning the other buffer is full, so we have
|
||||||
as corresponding audio". Based on this, output the event when
|
exactly one buffer delay before the next computed sample is output.
|
||||||
AudioTime() - PauseTime() > mtime - latency,
|
|
||||||
adjusting the event time by adding PauseTime() + 1 - latency.
|
If computation falls behind a bit, the callback will be later, so the
|
||||||
This gives at least mAudioOutputLatency for
|
delay to play the next computed sample will be less. I think a reasonable
|
||||||
the MIDI output to be generated (we want to generate MIDI output before
|
way to estimate the actual output time is to assume that the computer is
|
||||||
the actual output time because events generated early are accurately timed
|
mostly keeping up and that *most* callbacks will occur immediately when
|
||||||
according to their timestamp). However, the MIDI thread sleeps for
|
there is space. Note that the most likely reason for the high-priority
|
||||||
MIDI_SLEEP in its polling loop, so the worst case is really
|
audio thread to fall behind is the callback itself, but the start of the
|
||||||
mAudioOutputLatency + MIDI_SLEEP. In case the audio output latency is
|
callback should be pretty consistently keeping up.
|
||||||
very low, we will output events when
|
|
||||||
AudioTime() + MIDI_SLEEP - PauseTime() > mtime - latency.
|
Also, we do not have to have a perfect estimate of the time. Suppose we
|
||||||
|
estimate a linear mapping from sample count to system time by saying
|
||||||
|
that the sample count maps to the system time at the most recent callback,
|
||||||
|
and set the slope to 1% slower than real time (as if the sample clock is
|
||||||
|
slow). Now, at each callback, if the callback seems to occur earlier than
|
||||||
|
expected, we can adjust the mapping to be earlier. The earlier the
|
||||||
|
callback, the more accurate it must be. On the other hand, if the callback
|
||||||
|
is later than predicted, it must be a delayed callback (or else the
|
||||||
|
sample clock is more than 1% slow, which is really a hardware problem.)
|
||||||
|
How bad can this be? Assuming callbacks every 30ms (this seems to be what
|
||||||
|
I'm observing in a default setup), you'll be a maximum of 1ms off even if
|
||||||
|
2 out of 3 callbacks are late. This is pretty reasonable given that
|
||||||
|
PortMIDI clock precision is 1ms. If buffers are larger and callback timing
|
||||||
|
is more erratic, errors will be larger, but even a few ms error is
|
||||||
|
probably OK.
|
||||||
|
|
||||||
|
\par Estimating outputTime (Plan B, variable framesPerBuffer in callback, please skip to Plan C)
|
||||||
|
ALSA is complicated because we get varying values of
|
||||||
|
framesPerBuffer from callback to callback. Assume you get more frames
|
||||||
|
when the callback is later (because there is more accumulated input to
|
||||||
|
deliver and more more accumulated room in the output buffers). So take
|
||||||
|
the current time and subtract the duration of the frame count in the
|
||||||
|
current callback. This should be a time position that is relatively
|
||||||
|
jitter free (because we estimated the lateness by frame count and
|
||||||
|
subtracted that out). This time position intuitively represents the
|
||||||
|
current ADC time, or if no input, the time of the tail of the output
|
||||||
|
buffer. If we wanted DAC time, we'd have to add the total output
|
||||||
|
buffer duration, which should be reported by PortAudio. (If PortAudio
|
||||||
|
is wrong, we'll be systematically shifted in time by the error.)
|
||||||
|
|
||||||
|
Since there is still bound to be jitter, we can smooth these estimates.
|
||||||
|
First, we will assume a linear mapping from system time to audio time
|
||||||
|
with slope = 1, so really it's just the offset we need, which is going
|
||||||
|
to be a double that we can read/write atomically without locks or
|
||||||
|
anything fancy. (Maybe it should be "volatile".)
|
||||||
|
|
||||||
|
To improve the estimate, we get a new offset every callback, so we can
|
||||||
|
create a "smooth" offset by using a simple regression model (also
|
||||||
|
this could be seen as a first order filter). The following formula
|
||||||
|
updates smooth_offset with a new offset estimate in the callback:
|
||||||
|
smooth_offset = smooth_offset * 0.9 + new_offset_estimate * 0.1
|
||||||
|
Since this is smooth, we'll have to be careful to give it a good initial
|
||||||
|
value to avoid a long convergence.
|
||||||
|
|
||||||
|
\par Estimating outputTime (Plan C)
|
||||||
|
ALSA is complicated because we get varying values of
|
||||||
|
framesPerBuffer from callback to callback. It seems there is a lot
|
||||||
|
of variation in callback times and buffer space. One solution would
|
||||||
|
be to go to fixed size double buffer, but Audacity seems to work
|
||||||
|
better as is, so Plan C is to rely on one invariant which is that
|
||||||
|
the output buffer cannot overflow, so there's a limit to how far
|
||||||
|
ahead of the DAC time we can be writing samples into the
|
||||||
|
buffer. Therefore, we'll assume that the audio clock runs slow by
|
||||||
|
about 0.2% and we'll assume we're computing at that rate. If the
|
||||||
|
actual output position is ever ahead of the computed position, we'll
|
||||||
|
increase the computed position to the actual position. Thus whenever
|
||||||
|
the buffer is less than near full, we'll stay ahead of DAC time,
|
||||||
|
falling back at a rate of about 0.2% until eventually there's
|
||||||
|
another near-full buffer callback that will push the time back ahead.
|
||||||
|
|
||||||
\par Interaction between MIDI, Audio, and Pause
|
\par Interaction between MIDI, Audio, and Pause
|
||||||
When Pause is used, PauseTime() will increase at the same rate as
|
When Pause is used, PauseTime() will increase at the same rate as
|
||||||
|
Loading…
x
Reference in New Issue
Block a user