This Document is a subset of the complete document taken from the Squish Developers Kit Version 2.0 COPYRIGHT AND DISTRIBUTION RESTRICTIONS All of the source code, header files and documentation within the Squish Developers Kit is copyright 1991-1994 by SCI Communications. All rights reserved. SQUISH FILE FORMAT SPECIFICATION This section describes the physical file layout of a Squish message base. This is intended as a reference for developers who are writing their own Squish-compatible programs. For an overview of the Squish message base, see the section of SQUISH.PRN entitled "Using Squish-Format Message Areas". While the Squish MsgAPI library provides a standardized interface to Squish and *.MSG bases for C programmers, authors who use other languages may need to access Squish bases directly. This section describes the implementation details of the Squish file format. Squish Philosophy A standard Squish base consists of two files: a message data file and a message index file. Both files have the same prefix, but the data file has an extension of ".sqd", while the index file has an extension of ".sqi". From an overall point of view, the Squish data file consists of messages stored in a doubly-linked list. The Squish data file includes a header that contains pointers to the first and last frames in the area, in addition to other area-specific information. In the data file, a "frame" is used to hold an individual message. A frame consists of a frame header (which contains links to the prior and next messages), followed by the optional message header, control information and message body fields. This "linked list of frames" approach is ideal for a BBS message base. Almost all message base access is sequential, starting from a particular offset, and reading or writing until the end of the message base is reached. Since each frame header contains the offset of the prior and next messages, no disk accesses are required to find the preceding or following messages. The index file is a flat array of Squish Index (SQIDX) records. The index file is used primarily for performing random access look-ups by message number. Unlike other message base formats, the Squish base is only loosely based on the concept of "message numbers". While all messages have a message number, these numbers can change at any time. By definition, the message numbers in a Squish base always range from 1 to the highest message in the area. Consequently, there are no "gaps" in message numbers, so a Squish message area never needs to be renumbered. While this makes it easy to scan through all of the messages in an area, this also makes it difficult to find one specific message. Consequently, the concept of a "unique message identifier" or (UMSGID) is introduced. When a message is created, it is assigned a 32-bit UMSGID. These identifiers are unique and NEVER CHANGE. Unique message numbers are never "renumbered", so once a UMSGID of a message is obtained, it can always be used to find the current message number of the given message, no matter how many messages have been added or deleted in the interim. Squish Data Types The following integral types are used in the Squish file format definitions: Type Size Description char 1 byte A one-byte unsigned character. word 2 bytes A two-byte unsigned integer. sword 2 bytes A two-byte signed integer. dword 4 bytes A four-byte unsigned integer. FOFS 4 bytes A four-byte unsigned integer. This type is used to store offsets of frames within the Squish data file. UMSGID 4 bytes A four-byte unsigned integer. This type is used to store unique message identifiers for Squish messages. The types above are stored in the standard Intel "backwords" format, with the least significant byte being stored first, and the most significant byte being stored last. A two-byte integer containing 0x1234 would be stored as follows: Offset Value 0 0x34 1 0x12 A four-byte integer containing 0x12345678 would be stored as follows: Offset Value 0 0x78 1 0x56 2 0x34 3 0x12 The Squish file format also uses a number of abstract data types: The SCOMBO type is used for describing a message date/time stamp. This structure has the following format: Name Type Ofs Description date word 0 DOS bitmapped date value. This field is used to store a message date. The first five bits represent the day of the month. (A value of 1 represents the first of the month.) The next four bits indicate the month of the year. (1=January; 12=December.) The remaining seven bits indicate the year (relative to 1980). time word 2 DOS bitmapped time value. This field used to store a message time. The first five bits indicate the seconds value, divided by two. This implies that all message dates and times get rounded to a multiple of two seconds. (0 seconds = 0; 16 seconds = 8; 58 seconds = 29.) The next six bits represent the minutes value. The remaining five bits represent the hour value, using a 24-hour clock. Total: 4 bytes The NETADDR type is used for describing a FidoNet network address. This structure has the following format: Name Type Offset Description zone word 0 FidoNet zone number. net word 2 FidoNet net number. node word 4 FidoNet node number. point word 6 FidoNet point number. If the system is not a point, this field should be assigned a value of zero. Total: 8 bytes In addition, to describe an array of a given type, the "type[n]" notation is used. For example, "char[6]" represents an array of six contiguous characters. Likewise, "UMSGID[12]" represents an array of twelve UMSGID types. Data File Format The Squish data file consists of two major sections: 1) A fixed-length area header, stored at the beginning of the file. 2) A variable-length heap that comprises the rest of the file. This part of the file is used for storing message text. The area header stores pointers to the head and tail of two major "chains" of messages; the message chain and the free chain. The message chain is used to find all active messages in an area. The free chain is used for storing the locations of deleted messages, such that space can be reused at a later point in time. The Squish data file always contains a copy of the following _sqbase structure at file offset 0: Name Type Offset Description len word 0 Length of the _sqbase structure. reserved word 2 Reserved for future use. num_msg dword 4 Number of messages in this Squish base. This should always be equal to the value of the high_msg field. high_msg dword 8 Highest message number in this Squish base. This should always be equal to the value of the num_msg field. skip_msg dword 12 When automatically deleting messages, this field indicates that the first skip_msg messages in the area should not be deleted. (If max_msg=50 and skip_msg=2, this means that the writing program should start deleting from the third message whenever the total message count exceeds 50 messages.) high_water dword 16 The high water marker for this area, stored as a UMSGID. This field is used in EchoMail areas only. This contains the UMSGID of the highest message that was scanned by EchoMail processing software. uid dword 20 This field contains the UMSGID to be assigned to the next message created in this area. base char[80] 24 Name and path of the Squish base, as an ASCIIZ string, not including the extension. This field is optional and will not necessarily be filled out by all applications. (If this field is not supported, it should be initialized to ASCII 0.) begin_frame FOFS 104 Offset of the first frame in the message chain. last_frame FOFS 108 Offset of the last frame in the message chain. free_frame FOFS 112 Offset of the first frame in the free chain. last_free_frame FOFS 116 Offset of the last message in the free chain. end_frame FOFS 120 Offset of end-of-file. Applications will append messages to the Squish file from this point. max_msg dword 124 Maximum number of messages to store in this area. When writing messages, applications should dynamically delete messages to make sure that no more than max_msgs exist in this area. keep_days word 128 Maximum age (in days) of messages in this area. This field is not normally used by applications. However, it is used by SQPACK when performing a message area pack. sz_sqhdr word 130 Size of the SQHDR structure. For compatibility with future versions of the Squish file format, applications should use this value as the size of the SQHDR structure, instead of using a hardcoded "sizeof(SQHDR)" value. reserved char[124] 132 Reserved for future use. Total: 256 bytes To examine the messages in a Squish base, the application needs to follow the message chain. To do this, start with the begin_frame and/or end_frame fields. These fields contain the offsets of the first and last frames (respectively) in the message base. A frame in the message chain consists of a Squish Frame Header structure (SQHDR), followed by the XMSG message header, message control information, and message body. A frame in the free chain consists of a SQHDR structure only. A free frame does not necessarily contain a message. A SQHDR structure always has the following format: Name Type Ofs Description id dword 0 The frame identifier signature. This field must always be set to a value of 0xAFAE4453. next_frame FOFS 4 Frame offset of the next frame, or 0 if this is the last frame. prev_frame FOFS 8 Frame offset of the prior frame, or 0 if this is the first frame. frame_length dword 12 Amount of space ALLOCATED for the frame, not including the space used by the SQHDR itself. msg_length dword 16 Amount of space USED in this frame, including the size of the XMSG header, the control information, and the message text. This field does NOT include the size of the SQHDR itself. clen dword 20 Length of the control information field in this frame. frame_type word 24 This field can contain one of the following frame type values: 0 Normal frame. This frame contains an XMSG header, followed by the message control information and the message body. Normal frames should only be encountered when processing the normal message chain. 1 Free frame. This frame has been deleted, but it can be reused. The amount of available space in the frame is given by the frame_length field. Free frames should only be encountered when processing the free chain. 2 LZSS frame. This frame type is reserved for future use. 3 Frame update. The frame is being updated by another task. This is only a transient frame type; it indicates that the frame should not be manipulated by another task. All other frame types are reserved for future use. reserved word 26 Reserved for future use. Total: 28 bytes For a normal frame type, the XMSG header immediately follows the Squish frame header. The XMSG structure has the following format: Name Type Ofs Description attr dword 0 Message attributes. This is a combination of any of the MSG* attributes. (See below.) from char[36] 4 Name of the user who originated this message. to char[36] 40 Name of the user to whom this message is addressed. subject char[72] 76 Message subject. orig NETADDR 148 Originating network address of this message. dest NETADDR 156 Destination network address of this message. (Used for netmail areas only.) date_written SCOMBO 164 Date that the message was written. date_arrived SCOMBO 168 Date that the message was placed in this Squish area. utc_ofs sword 172 The message writer's offset from UTC, in minutes. Currently, this field is not used. replyto UMSGID 174 If this message is a reply, this field gives the UMSGID of the original message. Otherwise, this field is given a value of 0. replies UMSGID[9] 178 If any replies for this message are present, this array lists the UMSGIDs of up to nine reply messages. umsgid UMSGID 214 The UMSGID of this message. THIS FIELD IS ONLY VALID IF THE MSGUID BIT IS SET IN THE "ATTR" FIELD. Older Squish programs do not always set this field, so its contents can only be trusted if the MSGUID bit is set. __ftsc_date char[20] 218 FTS-0001 compatible date. Squish applications should not access this field directly. This field is used exclusively by tossers and scanners for preserving the original ASCII message date. Squish applications should use the binary dates in date_written and date_arrived to retrieve the message date. Total: 238 bytes Any of the following bitmasks can be used in the XMSG "attr" field: Attribute Value Description MSGPRIVATE 0x00000001 The message is private. MSGCRASH 0x00000002 The message is given a "crash" flavour when packed. When both MSGCRASH and MSGHOLD are both enabled, the message is given a "direct" flavour. MSGREAD 0x00000004 The message has been read by the addressee. MSGSENT 0x00000008 The message has been packed and prepared for transmission to a remote system. MSGFILE 0x00000010 The message has a file attached. The filename is given in the "subj" field. MSGFWD 0x00000020 The message is in-transit; it is not addressed to one of our primary addresses. MSGORPHAN 0x00000040 The message is orphaned. The message destination address could not be found in the nodelist. MSGKILL 0x00000080 The message should be deleted from the local message base when it is packed. MSGLOCAL 0x00000100 The message originated on this system. This flag must be present on all locally-generated netmail for Squish to function properly. MSGHOLD 0x00000200 The message should be given a "hold" flavour when packed. When combined with the MSGCRASH flag, the message is given a "direct" flavour. MSGXX2 0x00000400 Reserved for future use. MSGFRQ 0x00000800 The message is a file request. The filename is given in the "subj" field. MSGRRQ 0x00001000 A receipt is requested. (Not supported by Squish.) MSGCPT 0x00002000 This message is a receipt for an earlier MSGRRQ request. MSGARQ 0x00004000 An audit trail is requested. (Not supported by Squish.) MSGURQ 0x00008000 This message is an update request. The filename is given in the "subj" field. MSGSCANNED 0x00010000 This echomail message has been scanned out to other systems. MSGUID 0x00020000 The "uid" field contains a valid UMSGID for this message. Index File Format The index file provides random access capability for a Squish base. Given a message number, the index file can be used to quickly find the frame offset for that message. Similarly, given a UMSGID, the index file can also be used to find the message number and/or the frame offset for the message. The Squish index file is an array of Squish Index (SQIDX) structures. Each SQIDX structure corresponds to an active message. For a base containing 'n' messages, there are at least 'n' SQIDX structures. (There may also be extra SQIDX frames at the end of the index file, but these will be initialized with invalid values, as described below.) The SQIDX for the first message is stored at offset 0. The SQIDX for the second message is stored at offset 12. The SQIDX for the third message is stored at offset 24. (and so on) The Squish Index structure (SQIDX) has the following format: Name Type Ofs Description ofs FOFS 0 Offset of the frame for this message. A value of 0 is used to indicate an invalid message. umsgid UMSGID 4 Unique message ID for this message. A value of 0xffffffff is used to indicate an invalid message. The umsgid field must always be greater than the umsgid field of the preceding SQIDX structure. UMSGIDs are assigned serially, so this will normally be the case. (A binary search is performed on the index file to translate UMSGIDs, so the umsgid field of the SQIDX headers must always be stored in ascending order.) hash dword 8 The low 31 bits of this field contain a hash the "To:" field for this message. (See below for the hash function.) The high bit is set to 1 if the MSGREAD flag is enabled in the corresponding XMSG header. Total: 12 bytes The following hash function is used to calculate the "hash" field of the SQIDX structure. All variables are 32-bit unless otherwise noted: Set "hash" to a value of 0 For each 8-bit character "ch" in the To: field, repeat: - Shift "hash" left by four bytes. - Convert "ch" to lowercase - Increment the hash by the ASCII value of "ch" - Set "g" to the value of "hash" - Perform a bitwise AND on "g", using a mask of 0xf0000000. - If "g" is non-zero: - Perform a bitwise OR on "hash" with the value of "g". - Shift "g" right by 24 bits. - Perform a bitwise OR on "hash" with the value of "g". Perform a bitwise AND on "hash" with a value of 0x7fffffff. The following C function can be used to calculate such a hash: #include unsigned long SquishHash(unsigned char *f) { unsigned long hash=0; unsigned long g; char *p; for (p=f; *p; p++) { hash=(hash << 4) + (unsigned long)tolower(*p); if ((g=(hash & 0xf0000000L)) != 0L) { hash |= g >> 24; hash |= g; } } /* Strip off high bit */ return (hash & 0x7fffffffLu); }