Subtiteling with Linux

First the bad news: It's harder with Linux than with Windows. But the good news is that you do not actually need expensive tools to do a quite good job of it.

Much of what I describe here I learnt from David Jao's Linux Digital Fansubbing Guide. You may find his page more to your taste.

The different formats

There is a good number of subtiteling formats out there. For example the kde program ksubtitle supports the SRT format. The SRT format is completely simple, it lets you specify a from and a to time and the text to appear then. There are no way of specifying alignment or multiple texts on screen at one time.

Substation Alpha (ASS, http://moodub.free.fr/ass-specs.doc) and JACOsub (http://unicorn.us.com/jacosub/) stands out. The formats allows many advanced things. They are both popular among anime fan-subbers. That subtiteling is so available as this is something of a statement of how popular Japaneese anime is in the west, esp. among computer geeks that have the initiative to build tools themselves.

The media player on Linux, mplayer, does not support all those fancy features that ASS and JS support, but - and this is key for me - they support multiple texts on screen at one time. That is, I can code a progression like this (using JacoSub)

0:01:59.80 0:02:04.40	D [Døren knirker]
0:02:01.10 0:02:04.40	D [Pilt] Næmmen!
0:02:02.40 0:02:04.40	D [Døren knirker]
as you see from the time codes more and more text appears, and then it all disapears at once. My first subbing project was subbing the "Pompel og Pilt" DVD. which was issued jointly by NRK (the Norwegian public broadcaster) and Buena Vista Home Entertainment. Even though NRK does have a Closed Caption script for this they did not put any text on the DVD. If the text had been on there I would never have done this, even if NRKs CC text is usualy not exactly "full text" or even "for the hard of hearing".

My main motive is that I want to provide good subtitles for the hearing-impaired, as my wife is deaf. Merly adequate subtitles for this is easy, and while helpful, it's not good. Good requires rendering sounds as text and showing how long they last, making it clear who speaks at all times, transcribing what they say accurately. With the facilities of JACOscript available to me in mplayer I can do all this, even if it lacks ways to precisely controll text placement and attributes.

So, JacoScript or Substation Alpha format. I've been using JacoScript since I discovered that first, and because it appears to be simpler.

Pompel og Pilt, by the way, is a surreal childrens TV-show. The people that grew up in Norway in the 70ies should all have some memory of it. As we all have grown up Pompel and Pilt have become mythical creatures, and their creators have been interviewed, the show has been analyzed, and so on. "A cult following" is what the market-speak for this is.

Tools

ksubtitle supports only SRT which I find uninteresting. Over and out with that.

The David Jao page describes a glame hack (glame is a Linux sound tool) to save timings into a MySQL base. Afterwards the MySQL base would be exported and text added to the timing-lines. I find this inconvenient because:

  1. It requires maintenance of a patch.
  2. There is no undo function.
  3. I find it better to type in the manuscript first, not last.

All in all I find that typing the times into an editor is easier, and intinitly more undoable. Glame is very good for finding times though.

Since a JS script is not all that easy to type in with the fixed number of fields and stuff I use another, simpler format for this:

     0.7     3.8  [Pompel, vanlig mannsstemme] Eeeeh, Pilt?
     4.55    6.19 [Pilt, lys guttestemme] Pompel?
     7.18    8.60 [Pompel] Pilt?
     9.17   10.38 [Pilt] Pompel?
    12.51   14.15 [Pompel] Pii-iilt?
    14.20   15.79 [Pilt] Poompel?
    16.50   17.36 [Pompel] PILT!
    17.4    19.7  [Pilt] POMPEL!
    19.76   57.0  [Pompel og Pilt-temaet]
    31.0    33.0  [Kræsj!]
    57.1  1:00.1  [Begge] Oops!

As you see I can just type in the needed parts of the times. This converts quite easily to .js with the txt-to-jaco perl-script, which is basicaly a very simple re-purposing of David Jao's txt-to-script perl script. The script recognized empty lines, comment lines (Starts with '#') and a '__END__' marker, so that you can start out with a manuscript file and put an __END__ to it where the timings end and try to play the video up to there.

Workflow

  1. Firstly: Rip the DVD. dvd::rip works well for this. acidrip shoud also work well enough. I encode them to .avi at once.
  2. At this point I view the film while typing in the dialog and sounds. I can type fast enough that this is not terrebly clumsy.
  3. Secondly, extract the audio track. If you have a avi do this:
    $ mplayer -vo null -vc dummy -ao pcm -aofile Pompel_og_Pilt-006.wav \
       Pompel_og_Pilt-006.avi
    
  4. Load wav in glame. Start with a copy of the manuscript in your editor, and set up the timings. Glame is great for playing and replaying segments if I find that someone is not speaking clearly or if I made a error while typing in the manuscript.
  5. Convert the script. View the movie with subtitles and correct errors. I usually do these two steps in a loop. Time some minutes of the movie, then review at once to check that all is OK. Then continue.
  6. Enjoy your work with your loved ones and friends.

Enjoying your work

On a computer running Linux

Use mplayer

On a computer running Windows

You tell me.

On a computer running Mac OS X

I may find out. Or you tell me?

Mastering a SVCD

A SVCD can be played on most computers and quite a few DVD players.

Never tried this yet. The basic steps are these:

  1. encode mpeg2 (tmpgenc under wine is better than Linux tools) I don't know if it wil read .js subtiteling scripts.
  2. premaster image (vcdimager)
  3. record (cdrdao)
I will flesh them out as soon as I have tried. Thanks to fellow penguin Unslider.

Mastering a DVD

David Jao writes about this. So does some other people. I need to make mplayer render the subtitles and them convert them to vobsub for DVD use. Or something.