Extracting segmented zipfiles - sul-dlss/preservation_catalog GitHub Wiki

Extracting Segmented / Multipart Zipfiles

Druid-versions > 10g are archived into zip files segmented at the 10g boundary, using this command:

zip -r0X -s 10g (destination file) druid/v0001

This results in several zip files with predictable names: druid.version.v0001.[z01..] and ending with druid.version.v0001.zip.

The Hard Way

  1. Carefully concatenate the files, starting with .z01 and ending with .zip.
$ cat zy140tm9333.z01 > zy140tm9333.fixed.zip
$ cat zy140tm9333.z02 >> zy140tm9333.fixed.zip
$ cat zy140tm9333.z03 >> zy140tm9333.fixed.zip
$ cat zy140tm9333.z04 >> zy140tm9333.fixed.zip
$ cat zy140tm9333.z05 >> zy140tm9333.fixed.zip
$ cat zy140tm9333.zip >> zy140tm9333.fixed.zip
  1. Unzip it and enjoy the warnings
$ unzip zy140tm9333.fixed.zip
Archive:  zy140tm9333.fixed.zip
warning [zy140tm9333.fixed.zip]:  zipfile claims to be last disk of a multi-part archive;
  attempting to process anyway, assuming all parts have been concatenated
  together in order.  Expect "errors" and warnings...true multi-part support
  doesn't exist yet (coming soon).
warning [zy140tm9333.fixed.zip]:  104857600 extra bytes at beginning or within zipfile
  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  104857604
  (attempting to re-compensate)

The Easy Way

Use p7zip.

# yum install p7zip

Then run 7za on the final .zip file in the archive set (no concatenation necessary!):

$ 7za x zy140tm9333.v0001.zip 

7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,8 CPUs Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz (306F0),ASM,AES-NI)

Scanning the drive for archives:
1 file, 4411211 bytes (4308 KiB)

Extracting archive: zy140tm9333.v0001.zip
--
Path = zy140tm9333.v0001.zip
Type = zip
Physical Size = 4411211
Embedded Stub Size = 4
Total Physical Size = 109268811
Multivolume = +
Volume Index = 5
Volumes = 6

Everything is Ok

Folders: 5
Files: 15
Size:       109265469
Compressed: 109268811

Everything really is OK.

Things that Don't Work:

No.

zip -s 0 zy140tm9333.zip --out zy140tm9333.fixed.zip

Also no.

zip -F zy140tm9333.v0001.zip --out zy140tm9333.v0001.fixed.zip

Cat the files together, then try zip -F or zip -FF? That's a hard no.

$ zip -F zy140tm9333.v0001.fixed.zip --out zy140tm9333.v0001.out.zip 
Fix archive (-F) - assume mostly intact archive
	zip warning: bad archive - unexpected signature 50 4b 00 00 on disk 5 at 8728689

	zip warning: skipping this signature...
	zip warning: bad archive - unexpected signature 50 4b 00 00 on disk 5 at 10637508

	zip warning: skipping this signature...