This document is the official description of the Plucker format.
The Plucker document format supports a multi-page (in the Web sense of 'page') hyperlinked information structure containing both 'rich' text and images. Links must be internal to the document, though external links, in standard URL form, may be included and displayed, but not followed. Images may either be embedded in a text page, as with HTML, or be included as separate stand-alone pages.
Plucker documents are structured so that they can be used both with a file-system-oriented operating system such as Unix or Windows, and with the PalmOS, a non-file-system-oriented OS. To this end, they always begin with a standard PalmOS record database prefix, which consists of four parts: the database header, a record-id list, an AppInfo block, and a SortInfo block. The Plucker format does not use the SortInfo block, which is therefore null, and consequently occupies no space in the document prefix.
The record database prefix is then followed by a sequence of application-specific records. In a Plucker document, this sequence consists of one index record, followed by a series of data records. The index record contains information about the data records, along with some global information, such as the type of compression used. Each data record contains either a page, an image, or data about the document, such as bookmarks or URL data.
The format is big-endian; any multi-byte numeric values specified in this document are big-endian. Images are stored in the Palm image format; for more information on this format please consult http://www.palmos.com/dev/tech/docs/.
The database header is a fixed-size structure of 72 bytes. It contains the name of the database, the Plucker version number, various timestamps (creation, modification, last backup), and several flags. All timestamps are given using the PalmOS standard, seconds since 12:00 AM on January 1, 1904.
Field | Bytes | Type | Notes | ||||||||||||
docName | 32 | String | Must contain a NUL-terminated 7-bit ASCII string (only character codes 0x20-0x7E are valid) giving the name of the document. Because of the terminating NUL character at end, only 31 bytes can actually be used for the name of the document. The first 26 bytes of this string are used by Plucker as a unique ID for the document; names should be unique in the first 26 characters. | ||||||||||||
flags | 2 | Bitfield | Most bits in this field are unused. Unused bits should be set to zero
on document creation, but reader software should not expect them to stay
at this value.
Valid bits are as follows. All numeric values given are big-endian.
| ||||||||||||
version | 2 | Numeric | Version of the Plucker format used in this document. Must have the value 1. | ||||||||||||
creationDate | 4 | Timestamp | Time of document creation | ||||||||||||
modificationDate | 4 | Timestamp | Time document last modified | ||||||||||||
unused1 | 8 | Numeric | Must be zero at document creation, but any specific value should not be relied upon. | ||||||||||||
appInfoOffset | 4 | Numeric | Either zero, if no appInfo is present, or the offset from the beginning of the document to the start of the appInfo block. | ||||||||||||
sortInfoId | 4 | Numeric | Must be zero. | ||||||||||||
magic | 8 | String | Must be the 8 ISO Latin-1 characters "DataPlkr". No terminating NUL character. | ||||||||||||
unused2 | 4 | Numeric | Must be zero at document creation, but any specific value should not be relied upon. |
This list consists of a six-byte list header, followed by one ID entry for each data record in the document. The list header has the structure:
Field | Bytes | Type | Notes |
nextRecordListID | 4 | Numeric | Must be zero. |
numRecords | 2 | Numeric | Number of records in the document, including the index record. |
This is then followed by numRecords entries of the following structure:
Field | Bytes | Type | Notes |
recordOffset | 4 | Numeric | Number of bytes from the start of the document to the beginning of the record |
attributes | 1 | Bitfield | Record attributes -- should be zero. |
uniqueID | 3 | Numeric | A local (document-specific) unique ID for the record. This is not used by Plucker (because it is not preserved by PalmOS through beaming of a document), but must still be different for each record. |
Finally, there are two bytes of zero-padding to bring the structure alignment back to 4 bytes.
Typically, this is only present when the launchable flag is set in the flags field of the database header. No Plucker data aside from icon display information and a versioning string is stored in this block. This block has the following structure:
Field | Bytes | Type | Notes |
signature | 4 | Numeric | Must contain the value 0x6C6E6368. |
hdrVersion | 2 | Numeric | Must have the value 3. |
hdrEncoding | 2 | Numeric | Must have the value 0. |
verStrWords | 2 | Numeric | The number of two-byte words following, containing the version string. |
verStr | 2 * verStrWords | String | NUL-terminated ISO Latin-1 string, padded at end if necessary with a zero byte to an even-byte boundary, containing a version string to display to the user containing version information for the document. |
pqaTitleWords | 2 | Numeric | The number of two-byte words in the following pqaTitleStr. |
pqaTitleStr | 2 * pqaTitleWords | String | NUL-terminated ISO Latin-1 string, padded at end if necessary with a zero byte to an even-byte boundary, containing a title string for iconic display of the document. |
iconWords | 2 | Numeric | Number of two-byte words in the following icon image. |
icon | 2 * iconWords | Image | Image (32x32) in Palm image format to be used as an icon to represent the document on a desktop-style display. The image may not use a custom color map. |
smIconWords | 2 | Numeric | Number of two-byte words in the following icon image. |
smIcon | 2 * smIconWords | Image | Small image (15x9) in Palm image format to be used as an icon to represent the document on a desktop-style display. The image may not use a custom color map. |
This record includes info about the compression type used for the Plucker document and also what IDs the reserved records use. The viewer will use this record to know where to look for the reserved records and whether it must have support for ZLib compression. This record should always be the first record in the Plucker document (i.e. at index 0).
Field | Bytes | Type | Notes |
uid | 2 | Numeric | unique ID for record, always 0x0001 |
version | 2 | Numeric | 0x0002 if data is ZLib compressed, 0x0001 if DOC compressed |
records | 2 | Numeric | number of reserved records |
reserved | 4*records | Numeric | reserved ID array |
The reserved ID array consists of a series of name/ID pairs, where the ID is the unique ID (2 bytes) for the record and the name is a value (2 bytes) from the following list.
There are several different types of data records.
Each data record starts with a header, having the following structure:
Field | Bytes | Type | Notes | ||||||||||||||||||||||||
uid | 2 | Numeric | unique ID for record | ||||||||||||||||||||||||
paragraphs | 2 | Numeric | number of paragraphs | ||||||||||||||||||||||||
size | 2 | Numeric | total length of data before compression | ||||||||||||||||||||||||
type | 1 | Numeric | Data type. Must be one of the following:
| ||||||||||||||||||||||||
reserved | 1 | Numeric | Must be zero. |
This data format supports two forms of compression, DOC and ZLIB. That part of a data record that occurs after the header is compressed as a single chunk. All compressed records in a single document must use the same compression format. Compressed records may be mixed with uncompressed records.
DOC compression is the the format invented for early Palm usage.
ZLIB compression uses the ZLib format documented in Internet RFCs 1950 and 1951. See also http://www.gzip.org/zlib/manual.html for a description of the library used to perform the compression and decompression.
Plucker documents may be keyed to a specific string of 40 or fewer ASCII characters, called the owner-id. When such a key is specified, zlib compression must be used in the document. When an owner-id is specified, the beginning of each zlib-compressed data segment is XOR'ed with a value derived from the key, after compression, and must be XOR'ed again with the derived value before being decompressed. If an owner-id is specified for a document, the metadata record must exist, and must contain an OwnerID subrecord giving the CRC-32 of the owner-id string.
The derived value mentioned above is a 40-byte value constructed by forming 10 strings by concatenating the owner-id string with itself 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11 times, then taking the CRC-32 values of each of these concatenations, then packing those 32-bit values in big-endian order into a 40-byte buffer.
For text data the data record header is followed by a series of paragraph headers, each representing a paragraph block in the text data. This series of paragraph headers is then followed by the compressed or uncompressed text data. Each paragraph header has the form:
Field | Bytes | Type | Notes |
size | 2 | Numeric | Total length of paragraph before compression. NOTE: No text data should be larger than 32k. If the original document is larger than 32k, then the parser must split it into several records. |
attributes | 2 | Bitfield | Paragraph info. The high-order 13 bits are reserved for future use and should be set to zero; the 3 low-order bits contain a numeric value in the range [0..7] giving the amount of extra paragraph spacing (2*value pixels). |
The (uncompressed) text data contains a character stream of ISO Latin-1 characters, interspersed with 'functions'.
A function is introduced in the text stream by a NULL character (0x00), followed by a one-byte function code and up to 7 bytes of data. The 3 LSB of the function code represent the remaining function data length; the 5 MSB denote the actual function code. The following functions are valid:
Function Code | Description | Bytes | Arguments |
0x0A | Page link begins | 2 | record ID |
0x0C | Paragraph link begins | 4 | record ID, paragraph offset |
0x08 | Link ends | 0 | no data |
0x11 | Set font | 1 | font specifier |
0x1A | Embedded image | 2 | image record ID |
0x22 | Set margin | 2 | left margin, right margin |
0x29 | Alignment of text | 1 | alignment |
0x33 | Horizontal rule | 3 | 8-bit height, 8-bit width (pixels), 8-bit width (%, 1-100) |
0x38 | New line | 0 | no data |
0x40 | Italic text begins | 0 | no data |
0x48 | Italic text ends | 0 | no data |
0x53 | Set text color | 3 | 8-bit red, 8-bit green, 8-bit blue |
0x5C | Multiple embedded image | 4 | alternate image record ID, image record ID |
0x60 | Underline text begins | 0 | no data |
0x68 | Underline text ends | 0 | no data |
0x70 | Strike-through text begins | 0 | no data |
0x78 | Strike-through text ends | 0 | no data |
0x83 | 16-bit Unicode character | 3 | alternate text length, 16-bit unicode character |
0x85 | 32-bit Unicode character | 5 | alternate text length, 32-bit unicode character |
The function arguments have the following definitions:
Argument | Bytes | Notes | ||||||||||||||||||||||||||||||||||||||||||||
record ID | 2 | This is either a reference to a record in Plucker document (that is, a real record ID), or an index into the list of URLs, for URLs which have not been included in the document. | ||||||||||||||||||||||||||||||||||||||||||||
image record ID | 2 | reference to image in Plucker document | ||||||||||||||||||||||||||||||||||||||||||||
paragraph offset | 2 | paragraph number (starting from 0) to jump to | ||||||||||||||||||||||||||||||||||||||||||||
font specifier | 1 | The font concept used in Plucker is that of a 'standard' font, along
with bold and italic versions of that font. There is no font notion
corresponding to HTML's <BIG> or <SMALL>. In this markup,
boldness and size are specified with a font specifier; italic is specified
with a separate function code. There are currently 9 font specification
values, with the following meanings (the actual PalmOS fonts used by the
Palm viewer are also given):
| ||||||||||||||||||||||||||||||||||||||||||||
left margin | 1 | left margin in pixels | ||||||||||||||||||||||||||||||||||||||||||||
right margin | 1 | right margin in pixels | ||||||||||||||||||||||||||||||||||||||||||||
alignment | 1 | alignment code (left = 0, right = 1, center = 2, justify = 3) | ||||||||||||||||||||||||||||||||||||||||||||
height | 1 | height of horizontal rule in pixels, if not given a default value of 2 pixels will be used | ||||||||||||||||||||||||||||||||||||||||||||
width (pixels) | 1 | width in pixels, should be 0 if percentage value should be used | ||||||||||||||||||||||||||||||||||||||||||||
width (%) | 1 | width as the percentage between the current left and right margins. The default is 100% | ||||||||||||||||||||||||||||||||||||||||||||
alternate text length | 1 | When a Unicode character not representable in ISO-Latin-1 is encountered in an HTML document, a Unicode-character function code is inserted, with the 16-bit or 32-bit value of the character. This is followed by a "alternate representation" of the character in ISO-Latin-1 text. This parameter gives the length, in bytes, of the alternate text span. If the viewer can present the Unicode character directly, display of the alternate text should be suppressed. | ||||||||||||||||||||||||||||||||||||||||||||
16 or 32 bit Unicode character | 2, 4 | When a Unicode character not representable in ISO-Latin-1 is encountered in an HTML document, a Unicode-character function code is inserted, with the 16-bit or 32-bit Unicode character code for the character, which this parameter supplies. This is followed by a "alternate representation" of the character in ISO-Latin-1 text. If the viewer can present the Unicode character directly, display of the alternate text should be suppressed. |
The image data consists of an image in Palm image format, compressed or uncompressed as specified in the document's index record. The image may in addition be internally compressed, via any of the compression techniques allowed in the Palm image format. Images must be less than 60k, uncompressed.
Field | Bytes | Type | Notes |
to_offset | 2 | Numeric | offset to TO string |
cc_offset | 2 | Numeric | offset to CC string |
subject_offset | 2 | Numeric | offset to SUBJECT string |
body_offset | 2 | Numeric | offset to BODY string |
strings | 0+ | String sequence | A concatenated sequence of one or more NUL-terminated US-ASCII strings. Each contains a header-value, which follows the contraints on header values laid down in IETF RFC 2822. Header folding is not allowed. Any of the four headers shown above may be absent; header values should be accessed via the above offsets. |
In practice, there are two kinds of records used to store the URL strings, the URL handling data record, which serves as an index into the sequence of strings, and the URL data record, one or more of which contain the actual strings.
The URL handling data is used to find the record ID of the record which contains the correct URL string. It contains a series of 2 byte number pairs.
Field | Bytes | Type | Notes |
last_url | 2 | Numeric | the ordinal number of the last URL in record |
id | 2 | Numeric | record ID for record |
Field | Bytes | Type | Notes |
URLs | 1+ | String sequence | a concatenated sequence of NUL-terminated URL strings following the constraints of IETF RFC 1738. The list may contain up to 200 URLs (only text and image records are included, other records are represented only by the presence of a NUL; that is, by an empty string) |
These records may or may not be compressed. This is indicated by the type in the header. These records are used by the Details form to display the URL of the current record and by the External Reference form to display the URL of not collected pages. From either form you can copy the URL to a Memo to remind you to pluck it at a later date.
Field | Bytes | Type | Notes |
bookmarks | 2 | Numeric | number of bookmarks |
offset | 2 | Numeric | offset to the start of the bookmark data (counting from the beginning of the record) |
names | < 21*bookmarks | String sequence | A concatenated sequence of NUL-terminated strings, each a bookmark name (each name is max 20 chars) |
bookmark_data | 4*bookmarks | Bookmark Data | block of data for the location of the external bookmarks (see below) |
The bookmark data is a series of uid/offset pairs.
Field | Bytes | Type | Notes |
uid | 2 | Numeric | unique ID for record |
offset | 2 | Numeric | paragraph offset |
Each Plucker document can be assigned to a number of named categories. This record stores the names of default categories for the document. The data consists of a concatenated series of NUL-terminated strings that should be used as the default category/categories for this document.
There should only be one of these per document. This record begins with a two byte numeric value, giving the number of subrecords that follow, followed by that number of subrecords. The subrecords are a sequence of tagged variable length items. Each subrecord consists of three fields:
Field | Type | Bytes | Description |
type code | Numeric | 2 | Specifies what piece of extra information is in this subrecord |
length | Numeric | 2 | Number of 2-byte words in the argument |
argument | (type code specific) | 2 * length | Data |
The following table describes the valid subrecord type codes, and describes the structure of the associated data for each subrecord type. Subrecords with unknown type codes should be ignored.
Type code | Name | Description | Argument | ||||||||||||
1 | CharSet | This is the character set and encoding used by text records in this document, unless otherwise specified for particular records. | a two-byte numeric value, specifying the IETF IANA MIBenum value for the character set. See the IANA registry of character sets for valid values. | ||||||||||||
2 | ExceptionalCharSets | This is a list of text records which use a charset other than that specified by the default CharSet. Note that if no default CharSet is specified, the default charset should be thought of as "unknown". | a sequence of (length / 2) record-ID, IANA-MIBenum pairs, where
MIBenum values are as specified for CharSet. The invalid MIBenum
value of 0 (zero) is used for records which have an unknown charset, if
necessary.
| ||||||||||||
3 | OwnerID | This is the CRC-32 of the specified owner-id for the document, if any. Note that associating an owner-id with a document also affects the calculation of zlib compression. | a four-byte numeric value giving the CRC-32 of the owner-id string. | ||||||||||||
4 | Author | The name of the author of the document. | A string value in the document's default character set, padded at the end with NUL characters to an even number of bytes. | ||||||||||||
5 | Title | The full title of the document. | A string value in the document's default character set, padded at the end with NUL characters to an even number of bytes. | ||||||||||||
6 | PublicationDate | The date and time this document was created. | A 4-byte unsigned integer giving the number of seconds from 12:00 AM on January 1, 1904, to the time when this document was created. |
© Copyright 2000 Michael Nordström <micke@sslug.dk> · Copyright 2001 Bill Janssen <bill@janssen.org | $Id: DBFormat.html,v 1.13 2002/06/11 23:59:11 janssen Exp $ |