This document was last updated on Sunday, December 13, 1998
Every
effort has been made to document LZH header formats as well as changes made for
features not yet implemented. Corrections, additions and suggestions are
always welcomed. Header fields in Italics are currently under development
for Huffman Compression Engine only, and should be ignored (skipped) if not
supported, as should any extended header.
If you are a developer of compression utilities in .lzh file formats,
please feel free to jump in and help.
Table
of Contents
Introduction. 1
LZH format. 3
level-0..... 3
level-1, level-2..... 3
Level 0 header structure.. 4
Level 1 header structure.. 5
Level 2 header structure.. 6
Extended headers.... 7
Handling of extended headers.... 9
Method ID............. 10
Variances 11
Huffman Compression Engine II....... 11
Generic time stamp.... 11
OS ID... 12
Links to other LHA utilities and compression references:. 13
0x39 Multi-disk field.... 14
SPAN_COMPLETE. 14
SPAN_MORE......... 14
SPAN_LAST........... 15
size_of_run............. 15
span_mode:............. 15
Termination............. 18
Byte Order:
Little-endian
There are 3
types of LZH headers, level-0, level-1, and level-2.
All .lzh and
.lha files are null terminated. The
last byte of the file should be a 0, but if it's not, it will be interpreted as
null terminated. The reason for this
termination is that it implies that the next header-size is zero. Huffman Compression Engine adds the null but
doesn't need it.
LZH header
Compressed
file
LZH header
Compressed
file
LZH header of
size 0 (1 byte null)
LZH header
Extension
header(s)
Compressed
file
In all cases,
read the first 21 bytes of the header.
After determining the header type, you will then have to handle the
header as needed or suggested.
Offset |
Size in bytes |
Description |
0 |
1 |
Size of
archived file header (h) |
1 |
1 |
1 byte Header checksum |
2 |
5 |
Method
ID |
7 |
4 |
Compressed
file size Refered to as C in subsequent fields. As there are no extended headers in level 0 format archive
headers, this value represents the size of the file data only. |
11 |
4 |
Uncompressed
file size |
15 |
4 |
Original
file date/time (Generic time stamp) |
19 |
1 |
File or
directory attribute |
20 |
1 |
Level
identifier (0x00) Programmers should
only read the first 21 bytes of the header before taking further action |
21 |
1 |
Length of
filename in bytes (Refered to as N in subsequent fields |
22 |
N |
Path and
Filename |
22+N |
2 |
16 bit CRC
of the uncompressed file |
24+N |
C |
Compressed
file data |
Offset |
Size in bytes |
Description |
0 |
1 |
Size of
archived file header (h) |
1 |
1 |
1 byte Header checksum |
2 |
5 |
Method
ID |
7 |
4 |
Compressed
file size Refered to as C in subsequent fields |
11 |
4 |
Uncompressed
file size |
15 |
4 |
Original
file date/time (Generic time stamp) |
19 |
1 |
File or
directory attribute |
20 |
1 |
Level
identifier (0x00) Programmers should
only read the first 21 bytes of the header before taking further action |
21 |
1 |
Length of
filename in bytes (Refered to as N in subsequent fields |
22 |
N |
Path and
Filename |
22+N |
2 |
16 bit CRC
of the uncompressed file |
24+N |
1 |
Operating
System identifier. See OS ID chart |
25+N |
2 |
Next Header
size |
27+N |
3 or more
bytes |
Extended
headers ( Note:
Extended headers are optional, and have no preset maximum. The first byte of an extended header
identifies the type of header, and the last 2 bytes of the header identify
whether or not more headers are defined.
Huffman Compression Engine will use the extended header filename if
both exist in level 1 archives. |
|
C |
Compressed
file data |
Offset |
Size in bytes |
Description |
1 |
2 |
Total size
of headers, including Extended headers for this entry. |
2 |
5 |
Method
ID |
7 |
4 |
Compressed
file size Referred to as C in subsequent fields This value
excludes the size of all Extended headers and only refers to the actual compressed data. This is an improvement, and can be
problematic if not handled properly. |
11 |
4 |
Uncompressed
file size |
15 |
4 |
Original
file date/time (Generic time stamp) |
19 |
1 |
File or
directory attribute. Not supported in all compression utilities. |
20 |
1 |
Level
identifier (0x00) Programmers should
only read the first 21 bytes of the header before taking further action |
21 |
2 |
16 bit CRC
of the uncompressed file |
23 |
1 |
Operating
System identifier. See OS ID chart |
24 |
2 |
Next Header
size |
26 |
3 or more
bytes |
Extended
headers ( Note:
Extended headers are not optional, and have no preset maximum. Minimum compatibility should include a
type1 extended header. The first
byte of an extended header identifies the type of header, and the last 2
bytes of the header identify whether or not more headers are defined. Huffman Compression Engine will use the
extended header filename if both exist in level 1 archives. Your loop for reading of these headers
should include offset 24 for the first assignment and loop until extended
header size = 0. |
|
C |
Compressed
file data |
Unspecified
size fields are intentionally left blank.
ID |
Size |
Description |
0x00 |
2 |
CRC-16 of
header and an optional information byte. |
0x01 |
|
Filename |
0x02 |
|
Directory
name |
0x39 |
|
0x39
Multi-disk field |
0x3f |
|
Uncompressed
file comment. |
0x48 0x4f |
|
Reserved for Authenticity verification |
|
|
|
|
|
|
|
|
|
|
|
|
0x5? |
2 |
UNIX related
information. Optional
information byte |
0x50 |
2 |
UNIX file
permission |
0x51 |
2 |
Group ID |
0x52 |
|
Group Name |
0x53 |
|
User Name |
0x54 |
|
Last
modified time in UNIX time |
|
|
|
0xcn |
|
Under development: Compressed file comment.
Method -lhn- is assumed Compressed comment cannot exceed 64K in size. Applicable range for Huffman Compression Engine (4..8) |
0xdx |
|
Under
development: Operating system
specific header info. These fields
may have different meanings for different platforms. If the file was not created on the same
platform as your own these signatures should be ignored. |
0xd1 |
|
Under
development Autodelete
after autorun |
Proper
procedure for handling of extended headers can be summed up in virtual code:
1
Read the first 21 bytes of the real header to determine the
size of the first extended header. If
the
2
Use the first extended header in the loop that reads
subsequent headers
3
Assign this value to a variable which is used inside the
extended header loop
4
Repeat
5
Allocate enough memory for the header
6
Read the header into the array
7
Assign headersize to a word variable at the last 2 bytes of
the array
8
Goto 3
9
Handle compressed data if it exists.
Code
has not been supplied intentionally. It
is expected that the programmer reading this document has enough knowledge of
programming to perform the task of writing real code.
Signature |
Description |
-lh0- |
No
compression |
-lzs- |
2k
sliding dictionary(max 17 bytes) |
-lz4- |
No
compression |
-lh1- |
4k
sliding dictionary(max 60 bytes) + dynamic Huffman + fixed encoding of
position |
-lh2- |
8k
sliding dictionary(max 256 bytes) + dynamic Huffman (Obsoleted) |
-lh3- |
8k
sliding dictionary(max 256 bytes) + static Huffman (Obsoleted) This method is
not supported by Huffman Compression Engine |
-lh4- |
4k
sliding dictionary(max 256 bytes) + static Huffman + improved encoding of
position and trees |
-lh5- |
8k
sliding dictionary + static Huffman |
-lh6- |
32k
sliding dictionary + static Huffman |
-lh7- -lh8- |
64k
sliding dictionary + static Huffman Lh8
has yet to be discovered except in my own utility. It existed from 0.21d to 0.21M and was actually -lh7- per
Harihiko's review of Yoshi's notes. |
"-lhd-" |
Directory
(no compressed data). This signature
may not contain any information at all.
In some cases it only is used to signify that there are extended
headers with important information.
In level 0 archives it most likely contains the directory for the next
header's file, but in level 1 and 2 headers, it most likely contains nothing
except for possibly the size of the first extended header. |
Huffman
Compression Engine II dynamically sets the method for compression based on the
file size. Reasoning: It makes little sense
to use a 64KB buffer if the file is 1K.
The next section covers variances, which basically define what to do in
known cases where the file signature does not match the dictionary size.
Versions after
0.21M now dynamically size the dictionary buffer according to the size of the
file to be compressed. It is not
uncommon to have signatures of -lh4- thru -lh7-. A future implementation
will include -lh9- through -lhc- and -lhe-, which represent dictionary sizes of
128KB through 2MB. -lhd- is skipped due to the fact that it is a reserved
signature. During 0.21, a
misunderstanding about the method identifier lead me to think that a 64KB
buffer was -lh8-. Although logic
dictated that it should be, 16KB dictionaries were bypassed entirely, which
lead to this confusion. In order to
simplify the problem, decode should use 64KB for -lh6- through -lh8-.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17
16
|<--------
year ------->|<- month ->|<-- day -->|
15 14 13 12 11 10 9 8 7 6 5
4 3 2 1 0
|<--- hour
--->|<---- minute --->|<- second/2 ->|
Offset Length
Contents
0
8 bits year years since 1980
8
4 bits month [1..12]
12 4 bits day [1..31]
16 5 bits hour [0..23]
21 6 bits minite [0..59]
27 5 bits second/2 [0..29]
'ID |
Platform |
M |
MS-Dos |
'2' |
OS/2 |
'9' |
OS9 |
'K' |
OS/68K |
'3' |
OS/386 |
'H' |
HUMAN |
'U' |
UNIX |
'C' |
CP/M |
'F' |
FLEX |
'm' |
Mac |
'w' |
Windows 95,
98 |
'W' |
Windows NT |
'R' |
Runser |
The following
hyperlinks are visible in word97. If
you cannot see these links, you may also jump on the Internet to http://fidonews.webworldinc.com/lzhformat.htm to
see them in your favorite browser.
LHA
World by Dr. Haruyasu Yoshizaki
Dolphin's Home Page The author, Tsugio Okamoto,
maintains "Lha for Unix."
If you are
interested in how LHA works, its source code is a very good place to start.
This site
includes useful info like LHA header specs, in Japanese.
Lha for UNIX patch by Yoshioka
Tsuneo.
MacLHA
for Macintosh systems.
Network
Mahjong International (LHA
in Java)
Micco's HomePage (UNLHA32.DLL UNARJ32.DLL LHMelt)
Haruhiko Okumura's Compression Pointers.
Haruhiko Okumura is the original designer of the lha compression algorithm.
Compatibility
with the above links can only be guessed at this point, as few lha style
compressors support anything above -lh5-.
However, if your interest is in maintaining compatibility with these
other platforms, -K5 added to the command line during compression will force
compatibility with these compression utilities.
The following is a
planned implementation. The original
can be found at:
http://www.creative.net/~aeco/jp/lzhspc01.html
struct MDF {
byte span_mode;
long beginning_offset;
long size_of_run;
}
span_mode: This identifies the mode of
this segment of file. The values are:
#define SPAN_COMPLETE 0
#define SPAN_MORE 1
#define SPAN_LAST 2
This specifies that
the information following this header contains a complete (optionally
compressed) file. This is often unused because MDF is not needed in these
cases. In an unsplit file, the header information and format should follow the
standard LZH format.
This specifies that the
information following this header is incomplete. The uncompressor needs to
concatenate this segment with
information from the following volume.
It should continue to do that until it sees a volume with a header information
that contains span_mode
This specifies that
the information following this header is the last segment of the (optionally
compressed) file.
beginning_offset:
This value specifies
the location in bytes of where this segment (run) of information will fit into.
This is the size of
this segment of information.
This identifies the
mode of this segment of file. The values are:
#define SPAN_COMPLETE 0
#define SPAN_MORE 1
#define SPAN_LAST 2
SPAN_COMPLETE. This
specifies that the information following this header contains a complete
(optionally compressed) file. This is often unused
because MDF is not needed in these
cases. In an unsplit file, the header information and format should follow the
standard LZH format.
SPAN_MORE. This specifies that the
information following this header is incomplete. The uncompressor needs to
concatenate this segment with
information from the following volume.
It should continue to do that until it sees a volume with a header information
that contains span_mode
SPAN_LAST.
SPAN_LAST. This specifies that the
information following this header is the last segment of the (optionally
compressed) file.
beginning_offset: This value specifies
the location in bytes of where this segment (run) of information will fit into.
size_of_run: This is the size of this
segment of information.
The illustration below contain two
volumes with two compressed files, one of them split between the two volumes.
"File 1" is compressed and fits
within the first volume. "File
2" is a file 100 bytes long compressed to 90 bytes. The first 50 bytes of
which resides on the first volume and the
last 40 bytes on the next.
Volume 1
+--------------+
| +----------+ |
| |LZH header| | <- MDF not needed
| +----------+ | <- header unchanged from non-spanned versions of LZH
| | File 1 | |
| | | |
| | | |
| +----------+ |
| |
| +----------+ | <- span_mode = SPAN_MORE
| |LZH header| | <- beginning_offset = 0
| +----------+ | <- size_of_run = 50
| | File 2 | |
| | split | |
| | | |
+--------------+
Volume 2
+--------------+
| +----------+ | <- span_mode = SPAN_LAST
| |LZH header| | <- beginning_offset = 50
| +----------+ | <- size_of_run = 40
| | File 2 | |
| | | |
| | | |
| | | |
| +----------+ |
| +----------+ |
| | [0] | | <- end of volumes, a byte with value zero (0)
| +----------+ |
+--------------+
-----------
In addition to the
above changes, the compressor must before closing the file after writing the
last volume, write a null byte at the end of the file. This byte serves to
inform the decompressor that this is the last volume and no other comes after
it.
This end of volume
byte is needed to tell the decompressor when to stop prompting for the next volume. This termination byte is optional, as the
decompressor may also stop when it has completed a file, i.e., see SPAN_LAST or
just a regular file with
MDF. However if the end of this
completed file coincides with an end of volume, there would be not way for the
compressor detect that the following
volumes and prompt for them.
This termination byte is a way around
this potential bug.
Note that this null byte coincides with
header_size in the LZH header.
Pseudo code for a sample implementation
---------------------------------------
boolean spanned = false;
while(file_available()) {
compress file();
if(size_of_compressed_file >
available_space()) {
while(size_of_compressed_file) {
size_to_write = available_space();
construct_header_with_MDF(size_to_write);
write_header();
write_data(size_to_write);
size_of_compressed_file -=
size_to_write;
prompt_for_next_volume();
spanned = true;
}
}
else {
construct_header_without_MDF(size_of_compressed_file);
write_header();
write_data(size_of_compressed_file);
}
}
if(spanned)
write_null_byte();
Author’s address:
Aeco Systems
826 28th Avenue
San Francisco, CA 94121
Phone: (415) 221-7806
EMail: aeco@creative.net