Mathinker's No-Longer-Empty Blog: How to rescue some kinds of unfinalized DVD recorder disks, using free and open-source software

Recently, while recording some family videos using a DVD recorder, there was suddenly a power outage. This happened before the DVD recorder had even built a menu or a file system which separated the 5 MPEG streams into separate VOB files.

The disk was not even recognized by the recorder itself when the power returned.

A search of the web brought me to this helpful blog post by Flay. Unfortunately, that post didn't exactly cover my particular problem. The output of "dvd+rw-mediainfo /dev/dvd" showed a different situation, similar to one described by a commenter on the blog post:

INQUIRY: [HL-DT-ST][DVDRAM GMA-4082N][ED01] GET [CURRENT] CONFIGURATION: Mounted Media: 1Bh, DVD+R Current Write Speed: 8.0x1385=11080KB/s Write Speed #0: 8.0x1385=11080KB/s Write Speed #1: 4.0x1385=5540KB/s GET [CURRENT] PERFORMANCE: Write Performance: 3.3x1385=4584KB/s@0 -> 8.0x1385=11092KB/s@2299967 Speed Descriptor#0: 02/2295103 R@3.3x1385=4584KB/s W@8.0x1385=11080KB/s Speed Descriptor#1: 02/2295103 R@3.3x1385=4584KB/s W@4.0x1385=5540KB/s READ DVD STRUCTURE[#0h]: Media Book Type: 00h, DVD-ROM book [revision 0] Media ID: MCC/004 Legacy lead-out at: 2295104*2KB=4700372992 READ DISC INFORMATION: Disc status: appendable Number of Sessions: 1 State of Last Session: incomplete "Next" Track: 1 Number of Tracks: 2 READ TRACK INFORMATION[#1]: Track State: partial/complete Track Start Address: 0*2KB Next Writable Address: 0*2KB Free Blocks: 15872*2KB Track Size: 15872*2KB READ TRACK INFORMATION[#2]: Track State: invisible Track Start Address: 15888*2KB Next Writable Address: 1115840*2KB Free Blocks: 1179264*2KB Track Size: 2279216*2KB ROM Compatibility LBA: 266544 READ CAPACITY: 0*2048=0

Only two tracks and all of the information in the second one. The first step was to extract the information from the second track using the "dd" command, as explained by Flay:

dd bs=2048 skip=15888 count=1099952 if=/dev/dvd of=track2

Well, actually I have to admit I was a bad boy and didn't record the exact "dd" command I used, and I see that I ended up with an output file of size 1099945*2KiB and not 1099952*2KiB. I don't think it's that important; anyway, I didn't expect to get every last second of recorded video off the last track which was recording at the time of the power outage.

The output file is not recognized as video, and I knew that somewhere in it are actually 5 different video streams. My assumption was that the DVD recorder merely records the video streams one after the other and later, when the DVD is finalized, it also generates a file system which divides the blocks of that long track into different files containing the separate streams. After a little research with Wikipedia and a finalized DVD generated by the same recorder, I discovered that the "track2" file contained 5 strings of bytes which looked like VOB file headers. The magic header string has character codes: 0, 0, 1, 186, 68, 0, 4, 0, 4.

Actually, it was a bit more complicated than just that, because the string searching script found an extra location which could be identified as a false alarm, because the file offset wasn't a multiple of 32768.

The following Python scripts (supplied under any OSI-approved open-source license of your choice, including but not limited to all version of the GPL, MIT, or BSD licenses) are useful tools for extracting files from such a predicament. The first finds a list of offsets for headers within a file:

#! /usr/bin/python # # find_header_offsets.py # # Find locations of a header string within a file. # Outputs the offsets as a Python list to standard output. # import sys from optparse import OptionParser parser = OptionParser() parser.add_option("-m", "--magic", dest = "header_chars", help = "the header string as a comma delimited string of character codes") (options, args) = parser.parse_args() target = "".join([chr(int(x)) for x in options.header_chars.split(',')]) input_filename = args[0] fh = open(input_filename, "rb") i = 0 blocksize = 10000 * 1024 overlap = 1024 data = fh.read(blocksize) offsets = [] while (len(data) > overlap): j = 0 idx = data.find(target, j) while (idx >= 0): offsets.append(idx + i + j) j = idx + 1 idx = data.find(target, j) data = data[-overlap:] + fh.read(blocksize - overlap) i = i + blocksize - overlap print offsets

To filter the offset list (for my particular problem):

#! /usr/bin/python # # filter_offsets.py # # Eliminate bad header string locations found by "find_header_offsets.py". # Input read from standard input as a Python list. # Outputs the filtered offsets as a Python list to standard output. # # You will have to adapt the filtering criteria for your particular problem. # import sys offsets_string = sys.stdin.read() offsets = eval(offsets_string) filtered_offsets = [x for x in offsets if (x % 2048*16) == 0] print filtered_offsets

And to chop up the file based on the filtered offset list:

#! /usr/bin/python # # extract_by_offsets.py # # Chop up a given file based on a list of offsets. # Input (list of offsets) read from standard input as a Python list. # import sys from optparse import OptionParser parser = OptionParser() parser.add_option("-o", "--output", dest = "output_template", default = "%d.dat", help = "a template for the output file names") (options, args) = parser.parse_args() input_filename = args[0] fh = open(input_filename, "rb") offsets_string = sys.stdin.read() offsets = eval(offsets_string) blocksize = 10000 * 1024 for (i, offset) in enumerate(offsets): output_file = open(options.output_template % i, "wb") fh.seek(offset) size = None try: size = offsets[i + 1] - offset except: pass print "Copying from offset", offset if (size is not None): print " size =", size print "Output file is", options.output_template % i if (size is None): data = fh.read(blocksize) while (len(data) > 0): output_file.write(data) data = fh.read(blocksize) else: data = fh.read(blocksize) remaining = size while ((len(data) > 0) and (remaining > 0)): if (len(data) > remaining): output_file.write(data[0 : remaining]) remaining = 0 else: output_file.write(data) remaining = remaining - len(data) data = fh.read(blocksize) output_file.close()

And finally, here is an example of using these scripts from the command line to extract VOB files:

./find_header_offsets.py --magic=0,0,1,186,68,0,4,0,4 track2 | ./filter_offsets.py | ./extract_by_offsets.py --output=%d.vob track2

Mathinker's No-Longer-Empty Blog

Tuesday, October 30, 2012

How to rescue some kinds of unfinalized DVD recorder disks, using free and open-source software

No comments: