A strange tiff file

These days I got an email in which someone asked me for help with fixit-tiff. In the conversation it crystallized that the real problem were digitized files from the Internet Archive, which were obviously TIFF files, but could not be opened by the common programs.

Here a cite from the mail:

One other thing, while I’ve got you: The reason I went looking for a TIFF fixer was because all the TIFFs here https://archive.org/details/SchenleyHS1937 are broken. When I run them through your tool, though, it complains a bit but the output is identical, and is of course still not a working TIFF file. I’m not expecting much, but I thought I’d ask if there was an obvious solution there.

This has been an interesting challenge!

The analysis

I downloaded one of the files, the program file identifies them as:

TIFF image data, little-endian, direntries=20, height=3216, bps=8,
compression=JPEG (old), PhotometricInterpretation=BlackIsZero,
width=2442

ImageMagick has problems to open it, because the Tag 37677 has null count. But this was not the source of the problem, but the old (and very deprecated) used JPEG compression. Also libtiff does not support this JPEG variant anymore (and reports Deprecated and troublesome old-style JPEG compression mode, please convert to new-style JPEG compression and notify vendor of writing software).

tiffdump display the following:

Activities_0097.tif:
Magic: 0x4949 <little-endian> Version: 0x2a <ClassicTIFF>
Directory 0: offset 2871550 (0x2bd0fe) next 0 (0)
SubFileType (254) LONG (4) 1<0>
ImageWidth (256) SHORT (3) 1<2442>
ImageLength (257) SHORT (3) 1<3216>
BitsPerSample (258) SHORT (3) 1<8>
Compression (259) SHORT (3) 1<6>
Photometric (262) SHORT (3) 1<1>
SamplesPerPixel (277) SHORT (3) 1<1>
XResolution (282) RATIONAL (5) 1<300>
YResolution (283) RATIONAL (5) 1<300>
ResolutionUnit (296) SHORT (3) 1<2>
TileWidth (322) LONG (4) 1<2448>
TileLength (323) LONG (4) 1<3216>
TileOffsets (324) LONG (4) 1<8>
TileByteCounts (325) LONG (4) 1<2871416>
JPEGProcessingMode (512) SHORT (3) 1<1>
JPEGInterchangeFormat (513) LONG (4) 1<8>
JPEGInterchangeFormatLength (514) LONG (4) 1<2871416>
37677 (0x932d) UNDEFINED (7) 0<>
37678 (0x932e) UNDEFINED (7) 110<0x2 00 00 0x1 0x40 00 00 00 0x9 0x4 00 00 0x3f 00 00 00 00 00 00 00 00 00 00 00 ...>
37680 (0x9330) UNDEFINED (7) 3072<0xd0 0xcf 0x11 0xe0 0xa1 0xb1 0x1a 0xe1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...>

With checkit-tiff I have figured out, that the given TIFF has it’s source from Microsoft Office Document Image Writer (*.mdi).

The tag 37680 was reserved for the MS OLE service.

Some files also have the tag 305 set (software), pointing to “Microsoft Office Document Imaging 1.03.2349.01”.

I then found out that these TIFFs can only be read and converted by Microsoft Paint.

The solution

Somehow the story didn’t let me go. It bothered me that it should only be possible to read and convert these files via Microsoft Paint. So I searched for “TIFF” “TAG 259” and “JPEG old” and I got a hit.

On Stackoverflow someone said that the JFIF stream was one-to-one in the TIFF.

And indeed, with the help of Exiftool this can be extracted:

$> exiftool Activities_0097.tif -OtherImage -b > Activities_0097.jpg

Here, exiftool identifies tag 37680 as “Other Image”.

Now, the image could be displayed and file identifies them as:

JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 2442x3216, components 1

Problem solved :)

PS.: Thanks to Nick for your problem!