This state will indicate to the media element that it's not guaranteed
to have a frame yet, for the purposes of determining the ready state.
JavaScript should be sure that video elements with a ready state of
HAVE_CURRENT_DATA or greater represent the current video frame already.
To allow the state to be exited if audio is disabled, audio tracks are
now only added to the buffering set on enable if the audio sink exists,
since without the sink starting the data provider, it will never be
removed.
This is a step towards making video ref tests.
This allows us to differentiate between having no data available yet,
having current data, and having future data. The main purpose of this
is to allow a new starting state to explicitly force HAVE_METADATA
instead of >= HAVE_CURRENT_DATA.
Note that the SeekingStateHandler returns Current instead of None. This
is deliberate, since the buffered ranges from the demuxer(s) can be
used to inform whether the possibly-current data is actually available
at the seek target.
Having PlaybackManager start in Buffering was causing us to report
a media element readyState of HAVE_CURRENT_DATA. HAVE_CURRENT_DATA
doesn't make a whole lot of sense for local files, since we should have
all the data immediately when we process the metadata. This is
reflected in the buffered attribute, so let's not limit the ready state
unecessarily.
This fixes a crash when a track is enabled and then disabled while a
seek is in progress.
The logic in SeekingStateHandler is reworked to keep track of the
tracks that are currently being seeked, and when a track is disabled,
it is no longer counted against the seek completion. Any seek
completion callback that was instated is cleared by calling seek with
a null callback.
It may be worth making a separate function on the data providers to
clear the current seek instead, to avoid the extra work of seeking, but
this scenario is a very rare one unless someone intentionally triggers
it, and the cost is minimal unless the toggles are spammed.
A crash test is included, which both tests for the crash, and would
also time out if the failing VERIFY in on_track_enabled() was avoided
with the previous seeking implementation, due to the originally-enabled
video track's seek callback being clobbered by on_track_enabled()'s
seek.
Audio output on macOS was consuming Core Audio resources until the
PlaybackStream creation took well over the timeout for some tests.
This was observed in media-source-buffered.html, where it would time
out due to the long-running callback on the main thread to create the
PlaybackStream for AudioMixingSink.
However, the AudioUnit init should definitely not be blocking the main
thread, so I've added a FIXME there.
This is delegated to the state handlers, but it essentially amounts to
`state() != Buffering && state() != Seeking`. If the PlaybackManager is
in either state, we know that there is no future data yet, as it should
exit those states as soon as the data is ready.
PlaybackManager then intersects all enabled tracks' buffered time
ranges. This will be used by the media element for the buffered
attribute and to update the ready state.
This will allow us to pass in a class implementing Demuxer for each
track owned by a MediaSource.
We'll also use the new ThreadPool class instead of a dedicated media
initialization thread. We shouldn't spin up a new thread for such a
trivial operation.
Previously, we would just listen to the single video track for
buffering, so if for some reason the audio data runs ahead of the
video, we would drop some audio until the video buffered. Instead,
stop playing audio at the last available sample when any provider is
blocked.
Also, PlaybackManager now starts in the Buffering state, so that it can
wait for enough data to be ready to play without interruption. When the
end of the stream is reached, the buffering state is exited to ensure
that we don't get stuck buffering at the end of a media file.
Hook up a callback in AudioMixingSink to notify PlaybackManager if the
output fails to be initialized. Then, when that happens, swap out the
time provider for GenericTimeProvider and continue without audio.
Fixes#8071
Previously, we would call the frame end callbacks every time a frame
was decoded. However, the only use case for the callback was to update
the media duration. Instead, cache the duration in the data providers,
and only invoke the callback (renamed to duration_change_handler) when
the duration actually increases. Hopefully this will reduce wasted
work, and possibly some allocations as well.
We don't use EndOfStream errors as a way to determine the end of the
media anyway. It's still appropriate to keep those errors around in
data providers, though, since they are used in simpler usecases like
tests.
PlaybackManager's ref counting was only used to keep it alive in a few
callbacks. Instead, the callbacks can use weak references that can only
be used from the thread that the PlaybackManager was created on, to
ensure that the PlaybackManager can't be destroyed while being
accessed.
This ensures that:
- The PlaybackManager is destroyed immediately when it is reassigned
by HTMLMediaElement
- No callbacks are invoked after that point
This fixes the crash initially being addressed by #8081. The test from
that PR has been included as a regression test.
We only need to take a strong reference to the main event loop when
an error occurred in order to invoke the callback on the main thread.
By taking this lock for the entire duration of the thread, we were
preventing the main thread from exiting if the init thread hangs.
...and abstract away the stream/cursor blocking/aborting functionality
so that demuxers can implement or ignore those methods as they see fit.
This is a step towards implementing a wrapper demuxer for MSE streams.
In order to free up memory when a video is paused for an extended
period, we add a new Suspended state to PlaybackManager which tells the
data providers to suspend. The data providers will handle this signal
by disposing of their entire decoded data queue and flushing their
decoder.
When initially creating a PlaybackManager, and when resuming to a
paused state, the delay before suspension will be much lower than when
pausing from any other state. This is intended to prevent media
elements from consuming memory for long when decoding the first frame
for display, as well as to allow the data providers to suspend much
more quickly after a seek while paused.
Currently, resuming playback doesn't display much of a delay on my
MacBook, though that may change once we completely tear down the
decoder in the suspended state. It may also be exacerbated by using
hardware decoders due more complex decoder initialization.
By sniffing specifically for MP4 and WebM, we were precluding
PlaybackManager from playing any other formats. Instead, use
MatroskaDemuxer if the media has a `matroska` or `webm` EBML doctype,
and fall back to FFmpeg for all others.
We'll need to limit the containers that FFmpeg is able to open at some
point, but for now, this allows us to play the formats we could before.
`IncrementallyPopulatedStream::Cursor` now tracks whether it's currently
blocked inside a wait for more bytes, allowing higher layers to
distinguish "no frames yet" from "decoder is idle".
Enter buffering when `DisplayingVideoSink` runs out of frames and the
associated `VideoDataProvider` is blocked waiting for data to arrive.
Exit buffering once decoding refills the frame queue.
For now, buffering behaves like paused, but it gives us an explicit
state to hook UI into.
When media data is fully buffered, we can just try Matroska first and
fall back to FFmpeg. With incremental fetching, that approach becomes
wasteful: we may repeatedly attempt demuxer construction before enough
bytes are available, and FFmpeg in particular tends to produce noisy
logs while probing partial input.
Add lightweight container sniffing for WebM and MP4 that operates on
`IncrementallyPopulatedStream::Cursor`,
`prepare_playback_from_media_data()` now blocks until there is enough
data to decide the container type, then constructs the appropriate
demuxer directly instead of probing both.
Co-authored-by: Zaggy1024 <Zaggy1024@gmail.com>
Refactor the FFmpeg and Matroska demuxers to consume data through
`IncrementallyPopulatedStream::Cursor` instead of a pointer to fully
buffered.
This change establishes a new rule: each track must be initialized with
its own cursor. Data providers now explicitly create a per-track context
via `Demuxer::create_context_for_track(track, cursor)`, and own pointer
to that cursor. In the upcoming changes, holding the cursor in the
provider would allow to signal "cancel blocking reads" so an
in-flight seek can fail immediately when a newer seek request arrives.
If we don't pause the updates, re-enabling the video track will show an
old frame briefly. Since a normal seek also pauses/resumes updates on
the active sinks, the display is guaranteed to be resumed.
Without removing the current block, we end up playing no audio until
its timestamp is reached, after we shift the next block forward if its
timestamp is less than the previous block's end time. This would mean
that re-enabling an audio track after a seek backwards would result in
a delay in its audio resuming.
Posting callbacks to the main thread is now predicated on whether the
event loop reference is alive, preventing a stack-use-after-return.
The data providers will also check if they've been requested to exit
before calling deferred_invoke, though this is not going to be the case
unless the media element gets GCed while the media is playing.
Demuxer creation and track+duration extraction are moved to a separate
thread so that the media data byte buffer is no longer accessed from the
main thread. This will be important once the buffer is populated
incrementally, as having the main thread both populate and read from the
same buffer could easily lead to deadlocks. Aside from that, moving
demuxer creation off the main thread helps to be more responsive.
`VideoDataProvider` and `AudioDataProvider` now accept the main thread
event loop pointer as they are constructed from the thread responsible
for demuxer creation.
This implementation allows:
- Accurate seeking to an exact timestamp
- Seeking to the keyframe before a timestamp
- Seeking to the keyframe after a timestamp
These three options will be used to satisfy the playback position
selection in the media element's seeking steps.
This time provider can later be swapped out for the AudioMixingSink
when it implements the MediaTimeProvider interface, so that frame
timing can be driven by audio when it is present.
This commit implements the functionality to play back audio through
PlaybackManager.
To decode the audio data, AudioDataProviders are created for each track
in the provided media data. These providers will fill their audio block
queue, then sit idle until their corresponding tracks are enabled.
In order to output the audio, one AudioMixingSink is created which
manages a PlaybackStream which requests audio blocks from multiple
AudioDataProviders and mixes them into one buffer with sample-perfect
precision.
With this commit, all PlaybackManager can do is autoplay a file from
start to finish, with no pausing or seeking functionality.
All audio playback functionality has been removed from HTMLMediaElement
and HTMLAudioElement in anticipation of PlaybackManager taking that
over, for both audio-only and audio/video.