Tuesday, October 30, 2012

How to rescue some kinds of unfinalized DVD recorder disks, using free and open-source software

Recently, while recording some family videos using a DVD recorder, there was suddenly a power outage. This happened before the DVD recorder had even built a menu or a file system which separated the 5 MPEG streams into separate VOB files.

The disk was not even recognized by the recorder itself when the power returned.

A search of the web brought me to this helpful blog post by Flay. Unfortunately, that post didn't exactly cover my particular problem. The output of "dvd+rw-mediainfo /dev/dvd" showed a different situation, similar to one described by a commenter on the blog post:
INQUIRY:                [HL-DT-ST][DVDRAM GMA-4082N][ED01]
GET [CURRENT] CONFIGURATION:
 Mounted Media:         1Bh, DVD+R
 Current Write Speed:   8.0x1385=11080KB/s
 Write Speed #0:        8.0x1385=11080KB/s
 Write Speed #1:        4.0x1385=5540KB/s
GET [CURRENT] PERFORMANCE:
 Write Performance:     3.3x1385=4584KB/s@0 -> 8.0x1385=11092KB/s@2299967
 Speed Descriptor#0:    02/2295103 R@3.3x1385=4584KB/s W@8.0x1385=11080KB/s
 Speed Descriptor#1:    02/2295103 R@3.3x1385=4584KB/s W@4.0x1385=5540KB/s
READ DVD STRUCTURE[#0h]:
 Media Book Type:       00h, DVD-ROM book [revision 0]
 Media ID:              MCC/004
 Legacy lead-out at:    2295104*2KB=4700372992
READ DISC INFORMATION:
 Disc status:           appendable
 Number of Sessions:    1
 State of Last Session: incomplete
 "Next" Track:          1
 Number of Tracks:      2
READ TRACK INFORMATION[#1]:
 Track State:           partial/complete
 Track Start Address:   0*2KB
 Next Writable Address: 0*2KB
 Free Blocks:           15872*2KB
 Track Size:            15872*2KB
READ TRACK INFORMATION[#2]:
 Track State:           invisible
 Track Start Address:   15888*2KB
 Next Writable Address: 1115840*2KB
 Free Blocks:           1179264*2KB
 Track Size:            2279216*2KB
 ROM Compatibility LBA: 266544
READ CAPACITY:          0*2048=0
Only two tracks and all of the information in the second one. The first step was to extract the information from the second track using the "dd" command, as explained by Flay:
dd bs=2048 skip=15888 count=1099952 if=/dev/dvd of=track2
Well, actually I have to admit I was a bad boy and didn't record the exact "dd" command I used, and I see that I ended up with an output file of size 1099945*2KiB and not 1099952*2KiB. I don't think it's that important; anyway, I didn't expect to get every last second of recorded video off the last track which was recording at the time of the power outage.

The output file is not recognized as video, and I knew that somewhere in it are actually 5 different video streams. My assumption was that the DVD recorder merely records the video streams one after the other and later, when the DVD is finalized, it also generates a file system which divides the blocks of that long track into different files containing the separate streams. After a little research with Wikipedia and a finalized DVD generated by the same recorder, I discovered that the "track2" file contained 5 strings of bytes which looked like VOB file headers. The magic header string has character codes: 0, 0, 1, 186, 68, 0, 4, 0, 4.

Actually, it was a bit more complicated than just that, because the string searching script found an extra location which could be identified as a false alarm, because the file offset wasn't a multiple of 32768.

The following Python scripts (supplied under any OSI-approved open-source license of your choice, including but not limited to all version of the GPL, MIT, or BSD licenses)  are useful tools for extracting files from such a predicament. The first finds a list of offsets for headers within a file:
#! /usr/bin/python

#
# find_header_offsets.py
#
# Find locations of a header string within a file.
# Outputs the offsets as a Python list to standard output.
#

import sys
from optparse import OptionParser

parser = OptionParser()
parser.add_option("-m",
                  "--magic",
                  dest = "header_chars",
                  help = "the header string as a comma delimited string of character codes")

(options, args) = parser.parse_args()

target = "".join([chr(int(x)) for x in options.header_chars.split(',')])
input_filename = args[0]

fh = open(input_filename, "rb")

i = 0
blocksize = 10000 * 1024
overlap = 1024

data = fh.read(blocksize)

offsets = []
while (len(data) > overlap):
    j = 0
    idx = data.find(target, j)
    while (idx >= 0):
        offsets.append(idx + i + j)
        j = idx + 1
        idx = data.find(target, j)
    data = data[-overlap:] + fh.read(blocksize - overlap)
    i = i + blocksize - overlap

print offsets
 To filter the offset list (for my particular  problem):
#! /usr/bin/python

#
# filter_offsets.py
#
# Eliminate bad header string locations found by "find_header_offsets.py".
# Input read from standard input as a Python list.
# Outputs the filtered offsets as a Python list to standard output.
#
# You will have to adapt the filtering criteria for your particular problem.
#

import sys

offsets_string = sys.stdin.read()
offsets = eval(offsets_string)

filtered_offsets = [x
                    for x in offsets
                    if (x % 2048*16) == 0]

print filtered_offsets
 And to chop up the file based on the filtered offset list:
#! /usr/bin/python

#
# extract_by_offsets.py
#
# Chop up a given file based on a list of offsets.
# Input (list of offsets) read from standard input as a Python list.
#

import sys
from optparse import OptionParser

parser = OptionParser()
parser.add_option("-o",
                  "--output",
                  dest = "output_template",
                  default = "%d.dat",
                  help = "a template for the output file names")

(options, args) = parser.parse_args()

input_filename = args[0]

fh = open(input_filename, "rb")

offsets_string = sys.stdin.read()
offsets = eval(offsets_string)

blocksize = 10000 * 1024

for (i, offset) in enumerate(offsets):
    output_file = open(options.output_template % i, "wb")
    fh.seek(offset)
    size = None
    try:
        size = offsets[i + 1] - offset
    except:
        pass

    print "Copying from offset", offset
    if (size is not None):
        print "     size =", size
    print "Output file is", options.output_template % i

    if (size is None):
        data = fh.read(blocksize)
        while (len(data) > 0):
            output_file.write(data)
            data = fh.read(blocksize)
    else:
        data = fh.read(blocksize)
        remaining = size
        while ((len(data) > 0) and (remaining > 0)):
            if (len(data) > remaining):
                output_file.write(data[0 : remaining])
                remaining = 0
            else:
                output_file.write(data)
                remaining = remaining - len(data)
                data = fh.read(blocksize)
    output_file.close()
 And finally, here is an example of using these scripts from the command line to extract VOB files:
./find_header_offsets.py --magic=0,0,1,186,68,0,4,0,4 track2 | ./filter_offsets.py | ./extract_by_offsets.py --output=%d.vob track2

No comments: