Note: The original version of this post highlighted the disadvantages of using a fixed key frame interval. I’ve since learned that this problem can be avoided using intra refresh in the H264 stream.
Streaming media. specifically video, is a fickle beast. This will be a short post, but it will cover everything you need to know to stream media using FFmpeg to WebRTC clients. This technique has many applications, such as streaming synchronized videos to users.
Setting up the input
Before inputting a file into FFmpeg, we need to pass a few flags:
> ffmpeg \ -v info \ -fflags +genpts \ -protocol_whitelist pipe,tls,file,http,https,tcp,rtp \
These flags set the log level to info, generate pts if they’re missing, and sets up the protocols we can use. Next we need to provide the input to FFmpeg.
In this example, a file name
in.mp4 is used, but a http(s) URL could also be used. This command starts playback at the beginning of the input file. If you want to start playback in the middle of the input file, then you can add the
-ss <time in secs> flag before
Now the video needs to be converted to an appropriate format for streaming. The format is specific to the application, but common codecs are H264, VP8, and VP9. This example uses H264 due to its ubiquitous support.
-vf realtime,scale=w=min(iw\,1280):h=-2 \ -map 0:v:0 \ -c:v libx264 \ -x264-params intra-refresh=1,fast-pskip=0 -threads 3 \ -profile:v baseline \ -level:v 3.1 \ -pix_fmt yuv420p \ -tune zerolatency \ -minrate 500K \ -maxrate 1.3M \ -bufsize 500K \
-vf specifies the video filters to apply. Here, two filters are applied. The first is
realtime, which causes playback to happen in real time, which is necessary for streaming. This filter is similar to the
-re flag, but works much better with the start time flag (
-ss). The second filter scales the video width to a maximum of 1280 pixels while maintaining the aspect ratio. This is important to keep the bitrate appropriate for real time streaming.
intra-refresh parameter allows users to consume the video stream mid-way through by providing enough metadata over multiple inter frames to decode a full frame. The
fast-pskip=0is required for
intra-refresh to work if the source video files contains very few frames, which can happen if the video displays static images for long periods of time. There may be another way around this, but I’m not aware of it.
Another important parameter is the
-threads 3 flag. FFmpeg will use many threads by the default. Normally, this is good because it produces the final result as fast as possible. For real time encoding, however, using a large number of threads has some overhead that can slow down real time output. Using many threads is also a bad idea if you’re running multiple instances of FFmpeg concurrently.
The next two parametrs
-level:v specify the profile and level to use for the encoding. These are specific to H264. WebRTC clients can only decode certain profiles and levels, so these need to match the specific configuration of the application. These roughly correspond to the profile-level-id of
The pixel format is set to
yuv420p using the
-pix_fmt flag. This is required as this is the only pixel format supported by WebRTC.
-tune zerolatency tunes the encoder for low latency streaming.
Next up are the bitrate parameters. When streaming media, the bitrate should be as low as possible while maintaining the desired quality. This ensures all clients can consume the video in real time. Omitting the
-minrate parameter can cause FFmpeg to produce output with an unnecessarily high bitrate. Setting the
-maxrate is equally important. A DSL connection can only pull down around 2 Mbps. In order for users to watch the video, they must be able to download it in real time, so the maximum bitrate has to be lower than the slowest connection among your users. Another consideration is that the streaming video might not be the only bandwidth consuming task on the user’s network.
The audio arguments are much simpler.
-af arealtime \ -map 0:a:0 \ -c:a libopus \ -ab 128k \ -ac 2 \ -ar 48000 \
The main thing to note here is that the
arealtime filter is used, which is similar to the
realtime filter, but for audio.
The output can be piped to an RTP endpoint using the
tee psuedomuxer. Unfortunately, FFmpeg does not support multiplexing over RTP, so you’ll need two separate RTP endpoints, one for the video stream and one for the audio stream.
-f tee \ [select=a:f=rtp:ssrc=1111:payload_type=<audio-payload_type>]rtp://<audio-ip>:<audio-port>?rtcpport=<audio-rtcpport>|[select=v:f=rtp:ssrc=2222:payload_type=<video-payload_type>]rtp://<video-ip>:<video-port>?rtcpport=<video-rtcpport>
Now you can stream video over RTP! The full command is
> ffmpeg \ -v info \ -fflags +genpts \ -protocol_whitelist pipe,tls,file,http,https,tcp,rtp \ -i in.mp4 -vf realtime,scale=w=min(iw\,1280):h=-2 \ -map 0:v:0 \ -c:v libx264 \ -x264-params intra-refresh=1,fast-pskip=0 -threads 3 \ -profile:v baseline \ -level:v 3.1 \ -pix_fmt yuv420p \ -tune zerolatency \ -minrate 500K \ -maxrate 1.3M \ -bufsize 500K \ -af arealtime \ -map 0:a:0 \ -c:a libopus \ -ab 128k \ -ac 2 \ -ar 48000 \ -f tee \ [select=a:f=rtp:ssrc=1111:payload_type=<payload_type>]rtp://<ip>:<port>?rtcpport=<rtcpport>|[select=v:f=rtp:ssrc=2222:payload_type=<payload_type>]rtp://<ip>:<port>?rtcpport=<rtcpport>