Structure of zip files - lmmx/devnotes GitHub Wiki
The source code for Python's zipfile module is here
The best description is the one Python's zipfile module links to, the APPNOTE.TXT which I also backed up to a GitHub Gist for posterity. The section on the general format is as follows:
4.3.1 A ZIP file MUST contain an "end of central directory record". A ZIP file containing only an "end of central directory record" is considered an empty ZIP file. Files MAY be added or replaced within a ZIP file, or deleted. A ZIP file MUST have only one "end of central directory record". Other records defined in this specification MAY be used as needed to support storage requirements for individual ZIP files.
4.3.2 Each file placed into a ZIP file MUST be preceded by a "local file header" record for that file. Each "local file header" MUST be accompanied by a corresponding "central directory header" record within the central directory section of the ZIP file.
4.3.3 Files MAY be stored in arbitrary order within a ZIP file. A ZIP file MAY span multiple volumes or it MAY be split into user-defined segment sizes. All values MUST be stored in little-endian byte order unless otherwise specified in this document for a specific data element.
4.3.4 Compression MUST NOT be applied to a "local file header", an "encryption header", or an "end of central directory record". Individual "central directory records" MUST NOT be compressed, but the aggregate of all central directory records MAY be compressed.
4.3.5 File data MAY be followed by a "data descriptor" for the file. Data descriptors are used to facilitate ZIP file streaming.
4.3.6 Overall .ZIP file format:
[local file header 1] [encryption header 1] [file data 1] [data descriptor 1] . . . [local file header n] [encryption header n] [file data n] [data descriptor n] [archive decryption header] [archive extra data record] [central directory header 1] . . . [central directory header n] [zip64 end of central directory record] [zip64 end of central directory locator] [end of central directory record]
The first of the sections is "local file header" then the encryption header then the file data then a data descriptor, and this repeats for all the files compressed by the archive.
Here are the details of the local file header, which begins the file:
4.3.7 Local file header:
local file header signature 4 bytes (0x04034b50) version needed to extract 2 bytes general purpose bit flag 2 bytes compression method 2 bytes last mod file time 2 bytes last mod file date 2 bytes crc-32 4 bytes compressed size 4 bytes uncompressed size 4 bytes file name length 2 bytes extra field length 2 bytes
file name (variable size) extra field (variable size)
After these comes the central directory:
4.3.12 Central directory structure:
[central directory header 1] . . . [central directory header n] [digital signature]
File header:
central file header signature 4 bytes (0x02014b50) version made by 2 bytes version needed to extract 2 bytes general purpose bit flag 2 bytes compression method 2 bytes last mod file time 2 bytes last mod file date 2 bytes crc-32 4 bytes compressed size 4 bytes uncompressed size 4 bytes file name length 2 bytes extra field length 2 bytes file comment length 2 bytes disk number start 2 bytes internal file attributes 2 bytes external file attributes 4 bytes relative offset of local header 4 bytes file name (variable size) extra field (variable size) file comment (variable size)
4.3.13 Digital signature:
header signature 4 bytes (0x05054b50) size of data 2 bytes signature data (variable size)
With the introduction of the Central Directory Encryption feature in version 6.2 of this specification, the Central Directory Structure MAY be stored both compressed and encrypted. Although not required, it is assumed when encrypting the Central Directory Structure, that it will be compressed for greater storage efficiency. Information on the Central Directory Encryption feature can be found in the section describing the Strong Encryption Specification. The Digital Signature record will be neither compressed nor encrypted.
4.3.14 Zip64 end of central directory record
zip64 end of central dir signature 4 bytes (0x06064b50) size of zip64 end of central directory record 8 bytes version made by 2 bytes version needed to extract 2 bytes number of this disk 4 bytes number of the disk with the start of the central directory 4 bytes total number of entries in the central directory on this disk 8 bytes total number of entries in the central directory 8 bytes size of the central directory 8 bytes offset of start of central directory with respect to the starting disk number 8 bytes zip64 extensible data sector (variable size)
4.3.14.1 The value stored into the "size of zip64 end of central directory record" SHOULD be the size of the remaining record and SHOULD NOT include the leading 12 bytes.
Size = SizeOfFixedFields + SizeOfVariableData - 12.
...
4.3.15 Zip64 end of central directory locator
zip64 end of central dir locator signature 4 bytes (0x07064b50) number of the disk with the start of the zip64 end of central directory 4 bytes relative offset of the zip64 end of central directory record 8 bytes total number of disks 4 bytes
4.3.16 End of central directory record:
end of central dir signature 4 bytes (0x06054b50) number of this disk 2 bytes number of the disk with the start of the central directory 2 bytes total number of entries in the central directory on this disk 2 bytes total number of entries in the central directory 2 bytes size of the central directory 4 bytes offset of start of central directory with respect to the starting disk number 4 bytes .ZIP file comment length 2 bytes .ZIP file comment (variable size)
The most important part to grasp is this section of the overall structure:
[central directory header 1] ... [central directory header n] [zip64 end of central directory record] [zip64 end of central directory locator] [end of central directory record]
So that's:
- Local File Headers marked by
0x04034b50
(zipfile.stringFileHeader=
b"PK\003\004"`) - Central Directory Headers marked by
0x02014b50
(zipfile.stringCentralDir
=b"PK\001\002"
)- to clarify, this signature is at the start of the 'Digital Signature' [which contains further data]
- the 'Digital Signature' is the final part of the 'Central Directory'
- the Central Directory Record finishes with
0x05054b50
(not declared inzipfile
but =b"PK\x05\x05"
)- to clarify, this signature is at the end of the 'CDR'
- the Zip64 End Of Central Directory Record is marked by
0x06064b50
(stringEndArchive64
=b"PK\x06\x06"
)- to clarify, this signature is at the start of the 'Z64EOCDR'
Opening a file (e.g. a .conda
zip file such as
https://repo.anaconda.com/pkgs/main/linux-64/decorator-4.1.2-py36hd076ac8_0.conda
)
with mode rb
in Python, then looking through the bytes (printing them as integers not characters)
you can search for 80,75
to identify:
- P K 3 4 (three times in a row, the first immediately at the start)
- three Local File Header signatures
- P K 1 2 (three times in a row)
- three Central Directory Header signatures
- P K 5 6
- one End Of Central Directory Record signature
- followed by 18 bytes: 0 0 0 0 3 0 3 0 220 0 0 0 167 70 0 0 0 0
The first time zipfile
closes a zip file (upon reading, upon initialising the class and
calling
self._RealGetContents
) it
sets
the value of start_dir
to the current offset position of the file cursor
If you create a file from a zip file:
import io
import zipfile
with open("example.zip", "rb") as f:
b = f.read()
z = zipfile.ZipFile(io.BytesIO(b))
...You will see that z.fp
is preserved (a BytesIO
object storing the bytes passed in over STDIN),
and the value of z.fp.tell()
is a few behind z.start_dir
. In fact, this is the same value you'll
find if you call zipfile._EndRecData(z.fp)
as the docstring on that function explains:
def _EndRecData(fpin):
"""Return data from the "End of Central Directory" record, or None.
The data is a list of the nine items in the ZIP "End of central dir"
record followed by a tenth item, the file seek offset of this record."""
If I run that on my example file:
>>> zipfile._EndRecData(z.fp)
[b'PK\x05\x06', 0, 0, 3, 3, 291, 5640033, 0, b'', 5640324]
>>> z.fp.tell()
5640324
>>> z.start_dir
5640033
So we can see that the End Of Central Directory record stretches from byte position
5640033 (start_dir
) to 5640324 ("the file seek offset of this [End Of Central Directory]
record"), so it stretches over 292 bytes [inclusive of start and end positions].
If I use a Python .conda
archive from conda-forge
, decorator-4.3.2-py37_0.conda
I get
>>> z.start_dir
18087
>>> z.fp.tell()
18307
So here the End Of Central Directory record stretches over 221 bytes [inclusive of start and end positions].
From what I've seen from inspecting various .conda
zip files from the conda-forge
repository, the start_dir
is usually somewhere from 240-260 bytes away.
It's important to note that though these positions vary, the size of the "End Of Central
Directory" structure is constant: it is zipfile.sizeEndCentDir
= 22 bytes. This allows
the following:
This "End Of Central Directory Record" can be read to determine the positions of the individual files within the zip, and an example of code that does that is here (async port here)
-
Note that it uses a
Struct('<H2sHHHIIIHH')
: see Python'sstruct
library- The
<
means the byte-order is little-endian -
H
meansunsigned short
size 2 -
s
meanschar[]
(no standard size, each is 1 byte)- size is given by the number before it (which is
2
here)
- size is given by the number before it (which is
-
I
meansunsigned int
size 4 - So this means the sequence
<H2sHHHIIIHH
has length H+2(s)+3(H)+3(I)+2(H) = 6(H)+2(s)+3(I) = 6(2)+2+3(4) = 12+2+12 = 26 bytes- You can do this in Python by calling
struct.calcsize("<H2sHHHIIIHH")
orstruct.Struct("<H2sHHHIIIHH").size
= 26
- You can do this in Python by calling
- The
-
central_directory_signature
is given asb'\x50\x4b\x01\x02'
which is equivalent tob'PK\x01\x02'
-
line 101
of Python's
zipfile.py
givesstringCentralDir = b"PK\001\002"
-
line 101
of Python's
-
local_file_header_signature = b'\x50\x4b\x03\x04'
(orb'PK\x03\x04'
) -
Note that the central directory is not read, but
pass
ed on (this script just extracts files and doesn't try to be selective using the central directory)
The "signature" of the central directory is also known as the "magic number", which is used to signal the start (and another for the end)
Signature values begin with the two byte constant marker of
0x4b50
, representing the characters "PK".
In fact these are all hard coded into the zipfile
module itself
# The "end of central directory" structure, magic number, size, and indices
# (section V.I in the format document)
structEndArchive = b"<4s4H2LH"
stringEndArchive = b"PK\005\006"
sizeEndCentDir = struct.calcsize(structEndArchive)
# The "central directory" structure, magic number, size, and indices
# of entries in the structure (section V.F in the format document)
structCentralDir = "<4s4B4HL2L5H2L"
stringCentralDir = b"PK\001\002"
sizeCentralDir = struct.calcsize(structCentralDir)
# The "local file header" structure, magic number, size, and indices
# (section V.A in the format document)
structFileHeader = "<4s2B4HL2L2H"
stringFileHeader = b"PK\003\004"
sizeFileHeader = struct.calcsize(structFileHeader)
# The "Zip64 end of central directory locator" structure, magic number, and size
structEndArchive64Locator = "<4sLQL"
stringEndArchive64Locator = b"PK\x06\x07"
sizeEndCentDir64Locator = struct.calcsize(structEndArchive64Locator)
# The "Zip64 end of central directory" record, magic number, size, and indices
# (section V.G in the format document)
structEndArchive64 = "<4sQ2H2L4Q"
stringEndArchive64 = b"PK\x06\x06"
sizeEndCentDir64 = struct.calcsize(structEndArchive64)
More concisely:
- EndArchive =
b"PK\005\006"
(struct:b"<4s4H2LH"
) - CentralDir =
b"PK\001\002"
(struct:b"<4s4B4HL2L5H2L"
) - FileHeader =
b"PK\003\004"
(struct:b"<4s2B4HL2L2H"
) - EndArchive64Locator =
b"PK\x06\x07"
(struct:b"<4sLQL"
) - EndArchive64 =
b"PK\x06\x06"
(struct:b"<4sQ2H2L4Q"
)
Next in zipfile.py
there is a list of the index for each entry in the central directory struct
which I'll accompany below by the description/title/section from
the APPNOTE.TXT
Click to show details of central directory structure
# indexes of entries in the central directory structure
_CD_SIGNATURE = 0
The signature of the central directory. This is always b"\x50\x4b\x01\x02"
_CD_CREATE_VERSION = 1
_CD_CREATE_SYSTEM = 2
4.4.2 version made by (2 bytes)
4.4.2 version made by (2 bytes)
4.4.2.1 The upper byte indicates the compatibility of the file attribute information. If the external file attributes are compatible with MS-DOS and can be read by PKZIP for DOS version 2.04g then this value will be zero. If these attributes are not compatible, then this value will identify the host system on which the attributes are compatible. Software can use this information to determine the line record format for text files etc. 4.4.2.2 The current mappings are: 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems) 1 - Amiga 2 - OpenVMS 3 - UNIX 4 - VM/CMS 5 - Atari ST 6 - OS/2 H.P.F.S. 7 - Macintosh 8 - Z-System 9 - CP/M 10 - Windows NTFS 11 - MVS (OS/390 - Z/OS) 12 - VSE 13 - Acorn Risc 14 - VFAT 15 - alternate MVS 16 - BeOS 17 - Tandem 18 - OS/400 19 - OS X (Darwin) 20 thru 255 - unused 4.4.2.3 The lower byte indicates the ZIP specification version (the version of this document) supported by the software used to encode the file. The value/10 indicates the major version number, and the value mod 10 is the minor version number.
_CD_EXTRACT_VERSION = 3
_CD_EXTRACT_SYSTEM = 4
4.4.3 version needed to extract (2 bytes)
4.4.3.1 The minimum supported ZIP specification version needed to extract the file, mapped as above. This value is based on the specific format features a ZIP program MUST support to be able to extract the file. If multiple features are applied to a file, the minimum version MUST be set to the feature having the highest value. New features or feature changes affecting the published format specification will be implemented using higher version numbers than the last published value to avoid conflict. 4.4.3.2 Current minimum feature versions are as defined below: 1.0 - Default value 1.1 - File is a volume label 2.0 - File is a folder (directory) 2.0 - File is compressed using Deflate compression 2.0 - File is encrypted using traditional PKWARE encryption 2.1 - File is compressed using Deflate64(tm) 2.5 - File is compressed using PKWARE DCL Implode 2.7 - File is a patch data set 4.5 - File uses ZIP64 format extensions 4.6 - File is compressed using BZIP2 compression* 5.0 - File is encrypted using DES 5.0 - File is encrypted using 3DES 5.0 - File is encrypted using original RC2 encryption 5.0 - File is encrypted using RC4 encryption 5.1 - File is encrypted using AES encryption 5.1 - File is encrypted using corrected RC2 encryption** 5.2 - File is encrypted using corrected RC2-64 encryption** 6.1 - File is encrypted using non-OAEP key wrapping*** 6.2 - Central directory encryption 6.3 - File is compressed using LZMA 6.3 - File is compressed using PPMd+ 6.3 - File is encrypted using Blowfish 6.3 - File is encrypted using Twofish 4.4.3.3 Notes on version needed to extract * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the version needed to extract for BZIP2 compression to be 50 when it SHOULD have been 46. ** Refer to the section on Strong Encryption Specification for additional information regarding RC2 corrections. *** Certificate encryption using non-OAEP key wrapping is the intended mode of operation for all versions beginning with 6.1. Support for OAEP key wrapping MUST only be used for backward compatibility when sending ZIP files to be opened by versions of PKZIP older than 6.1 (5.0 or 6.0). + Files compressed using PPMd MUST set the version needed to extract field to 6.3, however, not all ZIP programs enforce this and MAY be unable to decompress data files compressed using PPMd if this value is set. When using ZIP64 extensions, the corresponding value in the zip64 end of central directory record MUST also be set. This field SHOULD be set appropriately to indicate whether Version 1 or Version 2 format is in use.
_CD_FLAG_BITS = 5
4.4.4 general purpose bit flag: (2 bytes)
Bit 0: If set, indicates that the file is encrypted. (For Method 6 - Imploding) Bit 1: If the compression method used was type 6, Imploding, then this bit, if set, indicates an 8K sliding dictionary was used. If clear, then a 4K sliding dictionary was used. Bit 2: If the compression method used was type 6, Imploding, then this bit, if set, indicates 3 Shannon-Fano trees were used to encode the sliding dictionary output. If clear, then 2 Shannon-Fano trees were used. (For Methods 8 and 9 - Deflating) Bit 2 Bit 1 0 0 Normal (-en) compression option was used. 0 1 Maximum (-exx/-ex) compression option was used. 1 0 Fast (-ef) compression option was used. 1 1 Super Fast (-es) compression option was used. (For Method 14 - LZMA) Bit 1: If the compression method used was type 14, LZMA, then this bit, if set, indicates an end-of-stream (EOS) marker is used to mark the end of the compressed data stream. If clear, then an EOS marker is not present and the compressed data size must be known to extract. Note: Bits 1 and 2 are undefined if the compression method is any other. Bit 3: If this bit is set, the fields crc-32, compressed size and uncompressed size are set to zero in the local header. The correct values are put in the data descriptor immediately following the compressed data. (Note: PKZIP version 2.04g for DOS only recognizes this bit for method 8 compression, newer versions of PKZIP recognize this bit for any compression method.) Bit 4: Reserved for use with method 8, for enhanced deflating. Bit 5: If this bit is set, this indicates that the file is compressed patched data. (Note: Requires PKZIP version 2.70 or greater) Bit 6: Strong encryption. If this bit is set, you MUST set the version needed to extract value to at least 50 and you MUST also set bit 0. If AES encryption is used, the version needed to extract value MUST be at least 51. See the section describing the Strong Encryption Specification for details. Refer to the section in this document entitled "Incorporating PKWARE Proprietary Technology into Your Product" for more information. Bit 7: Currently unused. Bit 8: Currently unused. Bit 9: Currently unused. Bit 10: Currently unused. Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D) Bit 12: Reserved by PKWARE for enhanced compression. Bit 13: Set when encrypting the Central Directory to indicate selected data values in the Local Header are masked to hide their actual values. See the section describing the Strong Encryption Specification for details. Refer to the section in this document entitled "Incorporating PKWARE Proprietary Technology into Your Product" for more information. Bit 14: Reserved by PKWARE for alternate streams. Bit 15: Reserved by PKWARE.
_CD_COMPRESS_TYPE = 6
4.4.5 compression method: (2 bytes)
0 - The file is stored (no compression) 1 - The file is Shrunk 2 - The file is Reduced with compression factor 1 3 - The file is Reduced with compression factor 2 4 - The file is Reduced with compression factor 3 5 - The file is Reduced with compression factor 4 6 - The file is Imploded 7 - Reserved for Tokenizing compression algorithm 8 - The file is Deflated 9 - Enhanced Deflating using Deflate64(tm)
10 - PKWARE Data Compression Library Imploding (old IBM TERSE) 11 - Reserved by PKWARE 12 - File is compressed using BZIP2 algorithm 13 - Reserved by PKWARE 14 - LZMA 15 - Reserved by PKWARE 16 - IBM z/OS CMPSC Compression 17 - Reserved by PKWARE 18 - File is compressed using IBM TERSE (new) 19 - IBM LZ77 z Architecture 20 - deprecated (use method 93 for zstd) 93 - Zstandard (zstd) Compression 94 - MP3 Compression 95 - XZ Compression 96 - JPEG variant 97 - WavPack compressed data 98 - PPMd version I, Rev 1 99 - AE-x encryption marker (see APPENDIX E)
4.4.5.1 Methods 1-6 are legacy algorithms and are no longer recommended for use when compressing files.
_CD_TIME = 7
_CD_DATE = 8
4.4.6 date and time fields: (2 bytes each)
The date and time are encoded in standard MS-DOS format. If input came from standard input, the date and time are those at which compression was started for this data. If encrypting the central directory and general purpose bit flag 13 is set indicating masking, the value stored in the Local Header will be zero. MS-DOS time format is different from more commonly used computer time formats such as UTC. For example, MS-DOS uses year values relative to 1980 and 2 second precision.
_CD_CRC = 9
4.4.7 CRC-32: (4 bytes)
The CRC-32 algorithm was generously contributed by David Schwaderer and can be found in his excellent book "C Programmers Guide to NetBIOS" published by Howard W. Sams & Co. Inc. The 'magic number' for the CRC is 0xdebb20e3. The proper CRC pre and post conditioning is used, meaning that the CRC register is pre-conditioned with all ones (a starting value of 0xffffffff) and the value is post-conditioned by taking the one's complement of the CRC residual. If bit 3 of the general purpose flag is set, this field is set to zero in the local header and the correct value is put in the data descriptor and in the central directory. When encrypting the central directory, if the local header is not in ZIP64 format and general purpose bit flag 13 is set indicating masking, the value stored in the Local Header will be zero.
_CD_COMPRESSED_SIZE = 10
_CD_UNCOMPRESSED_SIZE = 11
4.4.8 compressed size: (4 bytes) 4.4.9 uncompressed size: (4 bytes)
The size of the file compressed (4.4.8) and uncompressed, (4.4.9) respectively. When a decryption header is present it will be placed in front of the file data and the value of the compressed file size will include the bytes of the decryption header. If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory. If an archive is in ZIP64 format and the value in this field is 0xFFFFFFFF, the size will be in the corresponding 8 byte ZIP64 extended information extra field. When encrypting the central directory, if the local header is not in ZIP64 format and general purpose bit flag 13 is set indicating masking, the value stored for the uncompressed size in the Local Header will be zero.
_CD_FILENAME_LENGTH = 12
_CD_EXTRA_FIELD_LENGTH = 13
_CD_COMMENT_LENGTH = 14
4.4.10 file name length: (2 bytes) 4.4.11 extra field length: (2 bytes) 4.4.12 file comment length: (2 bytes)
The length of the file name, extra field, and comment fields respectively. The combined length of any directory record and these three fields SHOULD NOT generally exceed 65,535 bytes. If input came from standard input, the file name length is set to zero.
_CD_DISK_NUMBER_START = 15
4.4.13 disk number start: (2 bytes)
The number of the disk on which this file begins. If an archive is in ZIP64 format and the value in this field is 0xFFFF, the size will be in the corresponding 4 byte zip64 extended information extra field.
_CD_INTERNAL_FILE_ATTRIBUTES = 16
4.4.14 internal file attributes: (2 bytes)
Bits 1 and 2 are reserved for use by PKWARE. 4.4.14.1 The lowest bit of this field indicates, if set, that the file is apparently an ASCII or text file. If not set, that the file apparently contains binary data. The remaining bits are unused in version 1.0. 4.4.14.2 The 0x0002 bit of this field indicates, if set, that a 4 byte variable record length control field precedes each logical record indicating the length of the record. The record length control field is stored in little-endian byte order. This flag is independent of text control characters, and if used in conjunction with text data, includes any control characters in the total length of the record. This value is provided for mainframe data transfer support.
_CD_EXTERNAL_FILE_ATTRIBUTES = 17
4.4.15 external file attributes: (4 bytes)
The mapping of the external attributes is host-system dependent (see 'version made by'). For MS-DOS, the low order byte is the MS-DOS directory attribute byte. If input came from standard input, this field is set to zero.
_CD_LOCAL_HEADER_OFFSET = 18
4.4.16 relative offset of local header: (4 bytes)
This is the offset from the start of the first disk on which this file appears, to where the local header SHOULD be found. If an archive is in ZIP64 format and the value in this field is 0xFFFFFFFF, the size will be in the corresponding 8 byte zip64 extended information extra field.
The Python module's list of entries stops here, but the PKWare list continues:
4.4.17 file name: (Variable)
4.4.17.1 The name of the file, with optional relative path. The path stored MUST NOT contain a drive or device letter, or a leading slash. All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc. If input came from standard input, there is no file name field. 4.4.17.2 If using the Central Directory Encryption Feature and general purpose bit flag 13 is set indicating masking, the file name stored in the Local Header will not be the actual file name. A masking value consisting of a unique hexadecimal value will be stored. This value will be sequentially incremented for each file in the archive. See the section on the Strong Encryption Specification for details on retrieving the encrypted file name. Refer to the section in this document entitled "Incorporating PKWARE Proprietary Technology into Your Product" for more information.
4.4.18 file comment: (Variable)
The comment for this file.
4.4.19 number of this disk: (2 bytes)
The number of this disk, which contains central directory end record. If an archive is in ZIP64 format and the value in this field is 0xFFFF, the size will be in the corresponding 4 byte zip64 end of central directory field.
4.4.20 number of the disk with the start of the central directory: (2 bytes)
The number of the disk on which the central directory starts. If an archive is in ZIP64 format and the value in this field is 0xFFFF, the size will be in the corresponding 4 byte zip64 end of central directory field.
4.4.21 total number of entries in the central dir on this disk: (2 bytes)
The number of central directory entries on this disk. If an archive is in ZIP64 format and the value in this field is 0xFFFF, the size will be in the corresponding 8 byte zip64 end of central directory field.
4.4.22 total number of entries in the central dir: (2 bytes)
The total number of files in the .ZIP file. If an archive is in ZIP64 format and the value in this field is 0xFFFF, the size will be in the corresponding 8 byte zip64 end of central directory field.
4.4.23 size of the central directory: (4 bytes)
The size (in bytes) of the entire central directory. If an archive is in ZIP64 format and the value in this field is 0xFFFFFFFF, the size will be in the corresponding 8 byte zip64 end of central directory field.
4.4.24 offset of start of central directory with respect to the starting disk number: (4 bytes)
Offset of the start of the central directory on the disk on which the central directory starts. If an archive is in ZIP64 format and the value in this field is 0xFFFFFFFF, the size will be in the corresponding 8 byte zip64 end of central directory field.
4.4.25 .ZIP file comment length: (2 bytes)
The length of the comment for this .ZIP file.
4.4.26 .ZIP file comment: (Variable)
The comment for this .ZIP file. ZIP file comment data is stored unsecured. No encryption or data authentication is applied to this area at this time. Confidential information SHOULD NOT be stored in this section.
4.4.27 zip64 extensible data sector (variable size)
(currently reserved for use by PKWARE)
4.4.28 extra field: (Variable)
This SHOULD be used for storage expansion. If additional information needs to be stored within a ZIP file for special application or platform needs, it SHOULD be stored here.
Programs supporting earlier versions of this specification can then safely skip the file, and find the next file or header.
This field will be 0 length in version 1.0.Existing extra fields are defined in the section Extensible data fields that follows.
This struct is read in during the _RealGetContents
method. To see how it
works, grab a copy of the [self-contained] zipfile module and throw a breakpoint
in there, then inspect the variables involved. You'll see that fp
is a
io.BytesIO
object much like you get when calling io.BytesIO
on a byte stream
from a GET request or read from file, and each time some bytes are read()
in,
the offset (given by the tell()
method) advances accordingly.
Equally important is the 'end of central directory' structure,
Click to show details of end of central directory structure
_ECD_SIGNATURE = 0
The signature of the 'end of central directory' record.
This is always b"\x50\x4b\x01\x02"
_ECD_DISK_NUMBER = 1
_ECD_DISK_START = 2
_ECD_ENTRIES_THIS_DISK = 3
_ECD_ENTRIES_TOTAL = 4
_ECD_SIZE = 5
_ECD_OFFSET = 6
_ECD_COMMENT_SIZE = 7
# These last two indices are not part of the structure as defined in the
# spec, but they are used internally by this module as a convenience
_ECD_COMMENT = 8
_ECD_LOCATION = 9