Avidemux Forum

Avidemux => Stable branch (2.5) discussion => Topic started by: walther on April 14, 2012, 01:48:39 PM

Title: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 14, 2012, 01:48:39 PM
Using "Tools" -> "OCR (VobSub -> srt)" to create srt files from idx/sub works well, but the subtitles length seems to be limited to three seconds. Timecodes for the sub's start is correct and the timecode for the end of the sub is corrected, too, when it's shorter than three seconds. This issue was already reported by someone else in the Mac/OSX section (http://www.avidemux.org/smf/index.php?topic=9674.msg52451#msg52451) but without any reaction (since I'm using linux it seems not to be OS-related).

Is there a chance to get this fixed? Or is there a way to manually remove this limit?
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Agent_007 on April 15, 2012, 08:50:43 AM
IIRC with 2.6 there will be a separate tool for the OCR.

But I think the code fix for 2.5 should be trivial.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 15, 2012, 03:34:50 PM
Thanks for your reply :-)

A trivial fix would be nice... but can you help with that?
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Agent_007 on April 16, 2012, 04:06:42 PM
I tried to find the part that calculates the time, but I didn't. Maybe it isn't in ADM_ocr folder.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: nibbles on April 16, 2012, 09:39:39 PM
There are several programs out there whose only purpose is to convert subs.  While you wait for something to happen here, you could also just get it done with ________ or ________.  Problem is I forget the names.  Check Videohelp and Doom9 for the names of the converters I forgot.  You might want to search on opensubtitles.org to see if the subs exist already.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 17, 2012, 08:42:03 PM
@nibbles: Thanks for the info, but I've already searched for the subs. And the (probably) most popular software for converting subs is "Subrip", but it is not that good in converting subs from idx/sub files. That's why I use Avidemux for this purpose.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: nibbles on April 18, 2012, 02:32:59 AM
Yeah that sounds familiar, subrip.  I've done it a few times with idx/sub because I hate the low res characters and was able to get both high res and adjustable color with srt.  So I hear you.  I didn't realize Avidemux could be better, but I admit I never tried it.  If I get motivated, I'll try to figure what is causing that 3sec limit.  Thanks for your post about this.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Jan Gruuthuse on April 18, 2012, 09:56:04 AM
Don't know if you can do anything with this: Ubuntu Manpage: subp2pgm  -  convert  VobSub  DVD  subtitles  into  pgm  files  and xml description (http://manpages.ubuntu.com/manpages/oneiric/man1/subp2pgm.1.html)
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 18, 2012, 03:33:15 PM
@nibbles: IMHO SubRip ignores many subs in idx/sub files and it doesn't recognize all text correctly.

@ Jan Gruuthuse: Thanks for the hint! I managed to create many pgm files with subp2pgm but I fail to convert these files to srt files using subptools. Do you have any experience in the usage of it to give me an example for its usage? ;-)

But in the end... let's hope this issue will be finally fixed in Avidemux :-)
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Jan Gruuthuse on April 18, 2012, 03:44:20 PM
Sorry no experience with this. Ubuntu Manpage: subptools (http://manpages.ubuntu.com/manpages/oneiric/man1/subptools.1.html)
looks like there could be a problem?
Quote-c, --cut first[,last]
              Write  only  entries  numbered  from  first  to last, where last
              defaults to the last entry of the file.
Quote-c, --convert format
              Convert the xml subtitles file into a srt  (default)  or  spumux
              file.
Have you tried using --convert instead of -c that may cause conflict with --cut?
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Jan Gruuthuse on April 18, 2012, 03:53:25 PM
you don't have avidemux 2.4? http://www.my-guides.net/en/content/view/167/1/
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 19, 2012, 12:38:33 PM
@Jan Gruuthuse: Thanks again. Yes, I've noticed the wrong option -c -> it should be -t to convert, but it didn't work (or I'm to stupid...)

The OCR routine is the same in Avidemux 2.5 as in 2.4. In the linked guide (http://www.my-guides.net/en/content/view/167/1/) the "3 seconds issue" is illustrated in the last picture (http://www.my-guides.net/en/images/stories/extract_dvd_subs/extract_dvd_subs_20.jpg): no sub is longer than 3 seconds :/

@Agent_007: I've installed Avidemux 2.6 now and you're right: OCR is removed from the main program.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Jan Gruuthuse on April 19, 2012, 12:59:13 PM
Quote from: walther on April 19, 2012, 12:38:33 PM
Thanks again. Yes, I've noticed the wrong option -c -> it should be -t to convert, but it didn't work (or I'm to stupid...)

Instead of using -t try withfull --convert .
If I'm not mistaken, you can use the shorthand (-t) or the full (--convert), see if that makes a difference?
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 19, 2012, 02:39:01 PM
Quote from: Jan Gruuthuse on April 19, 2012, 12:59:13 PM
Instead of using -t try withfull --convert .
It doesn't make any different :/

But maybe my command is wrong - example:
subptools -i xyz.xml -o zyx.srt -t srt
This gives many lines saying "WARNING: Cannot write text (not defined)".
Did I miss something?
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Jan Gruuthuse on April 19, 2012, 05:32:03 PM
Quotesubptools -i xyz.xml -o zyx.srt -t srt
parameter for -t or --convert is 0 or 1 for format. File empty is written with 0 length.
Looks like information is missing? I'm trying to piece together and test each step of the process. Will have a look at it tomorrow. And see if I can get some progress.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 23, 2012, 05:15:03 PM
Quoteparameter for -t or --convert is 0 or 1 for format.
Are you sure about that?

Since I had no more success: Did you manage to get it running?
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: Jan Gruuthuse on April 23, 2012, 06:07:08 PM
With -t 0 this "WARNING: Cannot write text (not defined)" was gone.
Quotesubptools - manipulates xml subtitles files
Think this has to do with the control file and the handling the text that should be converted from those images, somehow.
After reading more. I'm convinced the converting from image to text is missing in the documentation. So have it not working. Hope the developers pick up on this and fix the subtitle OCR to .srt thingy. Touch wood and pray this is not at the bottom of their priorities.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: walther on April 23, 2012, 07:47:59 PM
Thanks for this information. It's a pity that there's no simple solution for my initial issue. So I hope OCR for idx/sub files will be back as a plugin or so in Avidemux 2.6.
Title: Re: Subtitles created via OCR (VobSUb->srt) are limited to 3 seconds
Post by: je2213 on December 25, 2012, 10:51:29 PM
You OCR a sub file using avidemux saving file as movie.srt
Avidemux has done the hard part of ocr but the subtitle endtimes are wrong.
No problem, the correct endtimes can found and movie.srt endtimes can be fixed.

Download the java file BDSup2Sub
https://github.com/downloads/mjuhasz/BDSup2Sub/BDSup2Sub.jar

Open movie.idx with BDSup2Sub
java -jar BDSup2Sub.jar movie.idx

set output format to xml/png
save as movie_exp.xml

extract subtitle start/finish times from xml file
grep " InTC=" movie_exp.xml | sed 's/[^0-9]*\([0-9:]*\)[^0-9]*\([0-9:]*\).*/\1:\2/' > startfinishtimes

xml time-format is not exactly the same as srt time-format ...
00:00:01:01:00:00:03:00
00:00:07:14:00:00:09:14
00:00:09:14:00:00:12:01
00:00:12:01:00:00:15:01

To make start/finish time-format the same as srt time-format create script called startfinish_ms.awk

#!/usr/bin/awk -f

# this script should convert subtitle start/finish   
# micro-second times from :00 to ,000
#
# startfinish
# 00:00:09:14:00:00:12:01
# startfinish_ms
# 00:00:09,560 00:00:12,040

BEGIN { FS = "[:]"}
{
ms=$4*40
{if (length(ms)==1) ms=0 0 ms}
{if (length(ms)==2) ms=0 ms}
$4=ms
ms=$8*40
{if (length(ms)==1) ms=0 0 ms}
{if (length(ms)==2) ms=0 ms}
$8=ms
print $1 ":" $2 ":" $3 "," $4 " " $5 ":" $6 ":" $7 "," $8
}


run awk script against startfinishtimes
./startfinish_ms.awk startfinishtimes > startfinishtimes_ms

create new srt subtitle file using startfinishtimes_ms and movie.srt
awk 'FNR==NR{a[$1]=$2;next}{if ($0 ~ "-->") {$3=a[$1]}}1' startfinishtimes_ms movie.srt > movie_fixed.srt