Using "Tools" -> "OCR (VobSub -> srt)" to create srt files from idx/sub works well, but the subtitles length seems to be limited to three seconds. Timecodes for the sub's start is correct and the timecode for the end of the sub is corrected, too, when it's shorter than three seconds. This issue was already reported by someone else in the Mac/OSX section (http://www.avidemux.org/smf/index.php?topic=9674.msg52451#msg52451) but without any reaction (since I'm using linux it seems not to be OS-related).
Is there a chance to get this fixed? Or is there a way to manually remove this limit?
IIRC with 2.6 there will be a separate tool for the OCR.
But I think the code fix for 2.5 should be trivial.
Thanks for your reply :-)
A trivial fix would be nice... but can you help with that?
I tried to find the part that calculates the time, but I didn't. Maybe it isn't in ADM_ocr folder.
There are several programs out there whose only purpose is to convert subs. While you wait for something to happen here, you could also just get it done with ________ or ________. Problem is I forget the names. Check Videohelp and Doom9 for the names of the converters I forgot. You might want to search on opensubtitles.org to see if the subs exist already.
@nibbles: Thanks for the info, but I've already searched for the subs. And the (probably) most popular software for converting subs is "Subrip", but it is not that good in converting subs from idx/sub files. That's why I use Avidemux for this purpose.
Yeah that sounds familiar, subrip. I've done it a few times with idx/sub because I hate the low res characters and was able to get both high res and adjustable color with srt. So I hear you. I didn't realize Avidemux could be better, but I admit I never tried it. If I get motivated, I'll try to figure what is causing that 3sec limit. Thanks for your post about this.
Don't know if you can do anything with this: Ubuntu Manpage: subp2pgm - convert VobSub DVD subtitles into pgm files and xml description (http://manpages.ubuntu.com/manpages/oneiric/man1/subp2pgm.1.html)
@nibbles: IMHO SubRip ignores many subs in idx/sub files and it doesn't recognize all text correctly.
@ Jan Gruuthuse: Thanks for the hint! I managed to create many pgm files with subp2pgm but I fail to convert these files to srt files using subptools. Do you have any experience in the usage of it to give me an example for its usage? ;-)
But in the end... let's hope this issue will be finally fixed in Avidemux :-)
Sorry no experience with this. Ubuntu Manpage: subptools (http://manpages.ubuntu.com/manpages/oneiric/man1/subptools.1.html)
looks like there could be a problem?
Quote-c, --cut first[,last]
Write only entries numbered from first to last, where last
defaults to the last entry of the file.
Quote-c, --convert format
Convert the xml subtitles file into a srt (default) or spumux
file.
Have you tried using
--convert instead of
-c that may cause conflict with
--cut?
you don't have avidemux 2.4? http://www.my-guides.net/en/content/view/167/1/
@Jan Gruuthuse: Thanks again. Yes, I've noticed the wrong option -c -> it should be -t to convert, but it didn't work (or I'm to stupid...)
The OCR routine is the same in Avidemux 2.5 as in 2.4. In the linked guide (http://www.my-guides.net/en/content/view/167/1/) the "3 seconds issue" is illustrated in the last picture (http://www.my-guides.net/en/images/stories/extract_dvd_subs/extract_dvd_subs_20.jpg): no sub is longer than 3 seconds :/
@Agent_007: I've installed Avidemux 2.6 now and you're right: OCR is removed from the main program.
Quote from: walther on April 19, 2012, 12:38:33 PM
Thanks again. Yes, I've noticed the wrong option -c -> it should be -t to convert, but it didn't work (or I'm to stupid...)
Instead of using
-t try withfull
--convert .
If I'm not mistaken, you can use the shorthand (-t) or the full (--convert), see if that makes a difference?
Quote from: Jan Gruuthuse on April 19, 2012, 12:59:13 PM
Instead of using -t try withfull --convert .
It doesn't make any different :/
But maybe my command is wrong - example:
subptools -i xyz.xml -o zyx.srt -t srt
This gives many lines saying "WARNING: Cannot write text (not defined)".
Did I miss something?
Quotesubptools -i xyz.xml -o zyx.srt -t srt
parameter for
-t or
--convert is
0 or
1 for format. File empty is written with 0 length.
Looks like information is missing? I'm trying to piece together and test each step of the process. Will have a look at it tomorrow. And see if I can get some progress.
Quoteparameter for -t or --convert is 0 or 1 for format.
Are you sure about that?
Since I had no more success: Did you manage to get it running?
With -t 0 this "WARNING: Cannot write text (not defined)" was gone.
Quotesubptools - manipulates xml subtitles files
Think this has to do with the control file and the handling the text that should be converted from those images, somehow.
After reading more. I'm convinced the converting from image to text is missing in the documentation. So have it not working. Hope the developers pick up on this and fix the subtitle OCR to .srt thingy. Touch wood and pray this is not at the bottom of their priorities.
Thanks for this information. It's a pity that there's no simple solution for my initial issue. So I hope OCR for idx/sub files will be back as a plugin or so in Avidemux 2.6.
You OCR a sub file using avidemux saving file as movie.srt
Avidemux has done the hard part of ocr but the subtitle endtimes are wrong.
No problem, the correct endtimes can found and movie.srt endtimes can be fixed.
Download the java file BDSup2Sub
https://github.com/downloads/mjuhasz/BDSup2Sub/BDSup2Sub.jar
Open movie.idx with BDSup2Sub
java -jar BDSup2Sub.jar movie.idx
set output format to xml/png
save as movie_exp.xml
extract subtitle start/finish times from xml file
grep " InTC=" movie_exp.xml | sed 's/[^0-9]*\([0-9:]*\)[^0-9]*\([0-9:]*\).*/\1:\2/' > startfinishtimes
xml time-format is not exactly the same as srt time-format ...
00:00:01:01:00:00:03:00
00:00:07:14:00:00:09:14
00:00:09:14:00:00:12:01
00:00:12:01:00:00:15:01
To make start/finish time-format the same as srt time-format create script called startfinish_ms.awk
#!/usr/bin/awk -f
# this script should convert subtitle start/finish
# micro-second times from :00 to ,000
#
# startfinish
# 00:00:09:14:00:00:12:01
# startfinish_ms
# 00:00:09,560 00:00:12,040
BEGIN { FS = "[:]"}
{
ms=$4*40
{if (length(ms)==1) ms=0 0 ms}
{if (length(ms)==2) ms=0 ms}
$4=ms
ms=$8*40
{if (length(ms)==1) ms=0 0 ms}
{if (length(ms)==2) ms=0 ms}
$8=ms
print $1 ":" $2 ":" $3 "," $4 " " $5 ":" $6 ":" $7 "," $8
}
run awk script against startfinishtimes
./startfinish_ms.awk startfinishtimes > startfinishtimes_ms
create new srt subtitle file using startfinishtimes_ms and movie.srt
awk 'FNR==NR{a[$1]=$2;next}{if ($0 ~ "-->") {$3=a[$1]}}1' startfinishtimes_ms movie.srt > movie_fixed.srt