You may have noticed that video effect when one scene slowly fades out while the next fades in, looking like they grow into each other. The open-source tool ffmpeg
can generate such crossfade transitions (I used version 4.2.4 for LINUX). Creating such a transition is very slow, because due to work on pixel-level, ffmpeg
needs to demux and mux everything. Mind that a new filter called xfade is soon to come.
In this Blog I will present a UNIX shell script that joins two videos using a crossfade-transition, and I will try to explain the complex_filter
language a little. ffmpeg
is not a graphical video editor, you need to cope with complex command lines.
A crossfade transition makes the result video shorter than the sum of both video parts, because the fade-out of the first video will be overlapped with the fade-in of the second video. So, if both videos last 5 seconds, and the transition was set to 2 seconds, the result video will be 5 + 5 - 2 = 8 seconds, not 10 seconds.
Below is the complete source code of my shell script joining two videos using a crossfade transition.
1 | # default definitions |
I won't explain the whole script, because most parts are just about argument checking and avoiding usage errors, like all software needs to have. Moreover I want to focus on ffmpeg -filter_complex
that creates the transition, and how to read the contained filtergraph. This starts on line 75.
Here is the part that I want to document. The lines of the command are concatenated through the trailing backslash ("\"), which is the newline-escape for the UNIX shell. The line breaks are necessary to keep ffmpeg
commands readable, as are the blanks inside the filter_complex
filtergraph. When executed, the whole command will be expanded to one single line, and the $variables
will be substituted into it.
75 | ffmpeg -v error -y \ |
Line 75 starts ffmpeg
with some common options: -v error
reduces the logging output to errors, -y
makes ffmpeg
overwrite any existing output file without interactive confirmation.
Line 76: the -i
options list the two input video files.
Line 77 opens a filtergraph specification by the -filter_complex
option. This graph is enclosed into double quotes, because it may contain shell meta-characters. Nevertheless $variables
will be inserted also here by the shell.
Line 78 references the 1st video-stream inside $inputVideo1
as [0:v]
. Read [0:v]
as "stream from first (0) input file of type video (v)", this is called stream specifier. The stream is pushed into a filter called "fade", after the "=" its parameters are given:
The "t" parameter is the fade's "type", in this case a fade-out.
The "st" parameter gives the "starttime" when to apply the filter.
The "d" parameter gives the "duration" of the fade-out.
The "alpha" parameter "1" tells the video to fade just the transparency (= alpha channel).
Finally the filtered stream is named [video1]
, which is an arbitrary internal label.
Line 79 references the 1st video-stream inside $inputVideo2
as [1:v]
. It does the same as line 78, but sets the fade-type to "in", starting at begin ("0"). After the "," the "setpts" filter takes over. It shifts the timestamps of all frames relatively to the beginning of the fade-out of the first video.
Inside this calculation, "PTS" is the presentation timestamp of any frame, "STARTPTS" is the presentation timestamp of the video's first frame, and "TB" is the time base of the timestamps, most likely 1.
The resulting stream finally is named [video2]
.
Line 80 takes the streams [video1]
and [video2]
and combines them using the "overlay" filter. The result is called [resultVideo]
. The video part is ready now for mapping into the output file, but audio is missing.
Line 81 references the 1st audio-stream inside $inputVideo1
as [0:a]
and the one inside $inputVideo2
as [1:a]
. It joins them using the "acrossfade" filter ("a" for audio).
The duration of the crossfade is given in "d" parameter.
The "overlap" (or "o") parameter says that the streams should overlap. This is not really necessary, because the default is overlap, but defaults change sometimes.
The audio result is labeled as [resultAudio]
.
Line 82 and
Line 83 map the streams labeled [resultVideo]
and [resultAudio]
into the output file, in that order, meaning video will be the first stream in output and audio the second.
Line 84 names the output file. If the whole ffmpeg
command fails, the script would now exit with the exit-code of ffmpeg
(due to the "||" indirection that is executed only when the preceding command failed).
ffmpeg
bears all the hallmarks of hackware, but is surprisingly comprehensive and flexible. What is missing is a use-case-oriented documentation. The filtergraph specifications are really hard to read, thus errors inside them are difficult to find. It took me two days to get into that again after two months since my last Blogs about ffmpeg
, and find out how I can generate crossfade transitions including audio.
There are lots of forum entries about simple video manipulations, but for transitions I was more or less left alone with the tool documentation. Also I noticed a kind of "garbage symptom" that you often find in CSS forum entries too: developers deliver lots of code in their examples that is actually not needed, but you need to understand that garbage to find out whether it is meaningless or not. In case of ffmpeg this really takes time.
I won't use crossfade transitions for my private video production, because that conversion takes too much time. I turned to ffmpeg
because I wanted to see video cut results quickly. For any other case, OpenShot is a sufficient graphical video editor.
ɔ⃝ Fritz Ritzberger, 2021-01-05