Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds

Started by walther, April 14, 2012, 01:48:39 PM

Previous topic - Next topic

walther

Quoteparameter for -t or --convert is 0 or 1 for format.
Are you sure about that?

Since I had no more success: Did you manage to get it running?

Jan Gruuthuse

With -t 0 this "WARNING: Cannot write text (not defined)" was gone.
Quotesubptools - manipulates xml subtitles files
Think this has to do with the control file and the handling the text that should be converted from those images, somehow.
After reading more. I'm convinced the converting from image to text is missing in the documentation. So have it not working. Hope the developers pick up on this and fix the subtitle OCR to .srt thingy. Touch wood and pray this is not at the bottom of their priorities.

walther

Thanks for this information. It's a pity that there's no simple solution for my initial issue. So I hope OCR for idx/sub files will be back as a plugin or so in Avidemux 2.6.

je2213

You OCR a sub file using avidemux saving file as movie.srt
Avidemux has done the hard part of ocr but the subtitle endtimes are wrong.
No problem, the correct endtimes can found and movie.srt endtimes can be fixed.

Download the java file BDSup2Sub
https://github.com/downloads/mjuhasz/BDSup2Sub/BDSup2Sub.jar

Open movie.idx with BDSup2Sub
java -jar BDSup2Sub.jar movie.idx

set output format to xml/png
save as movie_exp.xml

extract subtitle start/finish times from xml file
grep " InTC=" movie_exp.xml | sed 's/[^0-9]*\([0-9:]*\)[^0-9]*\([0-9:]*\).*/\1:\2/' > startfinishtimes

xml time-format is not exactly the same as srt time-format ...
00:00:01:01:00:00:03:00
00:00:07:14:00:00:09:14
00:00:09:14:00:00:12:01
00:00:12:01:00:00:15:01

To make start/finish time-format the same as srt time-format create script called startfinish_ms.awk

#!/usr/bin/awk -f

# this script should convert subtitle start/finish   
# micro-second times from :00 to ,000
#
# startfinish
# 00:00:09:14:00:00:12:01
# startfinish_ms
# 00:00:09,560 00:00:12,040

BEGIN { FS = "[:]"}
{
ms=$4*40
{if (length(ms)==1) ms=0 0 ms}
{if (length(ms)==2) ms=0 ms}
$4=ms
ms=$8*40
{if (length(ms)==1) ms=0 0 ms}
{if (length(ms)==2) ms=0 ms}
$8=ms
print $1 ":" $2 ":" $3 "," $4 " " $5 ":" $6 ":" $7 "," $8
}


run awk script against startfinishtimes
./startfinish_ms.awk startfinishtimes > startfinishtimes_ms

create new srt subtitle file using startfinishtimes_ms and movie.srt
awk 'FNR==NR{a[$1]=$2;next}{if ($0 ~ "-->") {$3=a[$1]}}1' startfinishtimes_ms movie.srt > movie_fixed.srt