During the last foss-gbg meeting I tried filming the entire event. The idea is to produce videos of each talk and publish them on YouTube. Since I’m lazy, I simply put up a camera on a tripod and recorder the whole event, some 3h and 16 minutes and a few seconds. A few seconds that would cause me quite some pain, it turns out.
All started with me realizing that I can hear the humming sound of the AC system in the video. No problem, simply use ffmpeg to separate the audio from the video and use the noise reduction filter in Audacity. However, when putting it all together I recognized a sound sync drift (after 6h+ of rendering videos, that is).
ffprobe told me that the video is 03:16:07.58 long, while the extracted audio is 03:16:04.03. This means that the video of the last speaker drifts more than 3s – unwatchable. So, googling for a solution, I realized that I will have to try to stretch the audio to the same duration as the video. Audacity has a tempo effect to do this, but I could not get the UI to accept my very small adjustment in tempo (or my insane number of seconds in the clip). Instead, I had to turn to ffmpeg and the atempo filter.
ffmpeg -i filtered.ac3 -filter:a "atempo=0.9996983236995207" -vn slower.ac3
This resulted in an audio clip of the correct length. (By the way, the factor is the difference in length of the audio and video).
Back to kdenlive – I imported the video clip, put it on the time line, separated the audio and video (just a right click away), ungrouped them, removed the audio, added the filtered, slowed down audio, grouped it with the video and everything seems nice. I about 1h43 I will know when the first clip has been properly rendered :-)
11 thoughts on “kdenlive, audacity and lessons in audio sync”
Dammit. I love so many things about Kdenlive and FFMPEG, but I’ve been getting this exact error with audio from a video shrinking for about 10 years now, to the point that I just stopped using Kdenlive. Now, with all of the recent refactoring, I’m hoping that some of the long time bugs will finally get sorted (yeah, I was submitting bug reports), but it still drives me crazy to hear that the exact same thing is still happening after all this time.
ANyway, I hope you filed a bug report with all relevant projects. Thanks and good luck.
TBH, this is more of an ffmpeg issue. When I look at the files it seems that the audio is too short (merged.ac3) when extracted from the original (merged.mts). The filter does not seem to affect this either (filtered.ac3).
$ ffprobe merged.mts
Input #0, mpegts, from 'merged.mts':
Duration: 03:16:07.58, start: 1.440000, bitrate: 17590 kb/s
$ ffprobe merged.ac3
Input #0, ac3, from 'merged.ac3':
Duration: 03:16:04.09, start: 0.000000, bitrate: 255 kb/s
$ ffprobe filtered.ac3
Input #0, ac3, from 'filtered.ac3':
Duration: 03:16:04.03, start: 0.000000, bitrate: 192 kb/s
I’m not sure if this could be due to me merging a bunch of mts files (as the camera uses FAT internally, i.e. it has a 4GB file size limit).
As I have no good (small) samples demonstrating this, I’ve not reported any bugs. I don’t want to send out a 20+GB attachment with a bug…
Couldn’t you have made ffmpeg stretch the audio track without having to rerender the video by just applying the filter to the complete file, copying the video track?
The issue is that I recorded the whole event, so 3+ hours, then cut it into three separate videos. I could have stretched the audio for the first one, but the other two needed some sort of offset management as well so I decided to take the easy route.
It is often worth trying with WAV audio rather than compressed codecs, with which libavcodec used to warn about “Estimating duration from bitrate, this may be inaccurate”. This has solved sync/seek problems several times (from the moment you split audio from video).
This is a problem in FFmpeg & MLT; it can’t be improved from Kdenlive…
Cool, I will try that. Also, it is quite clear that this is outside of kdenlive – it occurs before kdenlive is introduced in the flow.
For making the sound of my podcast better I’m using http://auphonic.com/ you should test it too, you get 2 hours per month for free and it works with video too. It makes the speaking person louder and removes humming and stuff like it with help of many different machine learning algorithms specialized on human speach.
I took one of your videos send it through auphonic and uploaded it to youtube (unlisted) for reference, check it out: https://youtu.be/iI22ADbfozw and here are some metadata about it https://jeena.net/t/zifra.png
That sounds interesting! I will try it.
Also, I will try to do the whole dance with wav files and see if that solves the issue.
I believe the most reliable approach is to…
* immediately convert to a robust container like matroska
* keep audio and video together – apply audio corrections as ffmpeg/MLT filters concurrently with video filters/cuts and rendering (a.k.a. compression)
* do a test-compile with extremely low compression, to test not only for audio desync, but also areas with extemely low/high/noisy audio or badly angled video (all interesting stuff happening in the corner of the frame).
I have developed a script to support the above called localvideowebencode, available as part of git://source.jones.dk/bin
A concrete use of above script, doing some relatively complex audio adjustments, is at http://media.biks.dk/fb/epfsug2/ (see the “Video mastering script” at th bottom).
Interesting! Thanks for the pointers!
Comments are closed.