Introduction This document and the referenced ASM files contain details of the database structure used by: FortŠ FreeAgent (Freeware) NewsReaders and Agent (Commercial) NewsReaders. Since FortŠ considers details of their database structure proprietary, and has not provided a User Technical Reference, this document represents my attempt to compile one for my own use. I am not a trained programmer, some may take offense of the liberties I've taken, and errors I've made in syntax, and (lack of) structure. This is especially true where I needed to invent labels. I have not attempted to use structure definitions in the ASM files yet, so some corrections are inevitable. I welcome your suggestions, corrections, and constructive criticism. At the same time, I quite willing to respond to questions. Ray Delio mailto: rdelio@worldnet.att.net I - General DAT File Structure All versions of [Free]Agent use the same (basic) structure. Page0 Master_bitmap. It has a bit set for each full 2MB block allocated (8*512*512=2MB or 2,097,152 bytes). Page1 The first page in each block is a block_bitmap(n). It is created "when needed", and deleted when empty (nulls) It has a bit set for each page (512 bytes) allocated. The last bit is never set (max=x7F), 3F de-allocates the last page in the block. Page2-N Where records are stored. Page# == UID II - General Message Record Structure Some Definitions. OLD = FreeAgent (All) and Agent (thru ver 99e) NEW = Agent ver 99f and later.(MIME) [Free]Agent = Either Agent or FreeAgent. Abreviations used to define different record structures: S=Subject A=Author ID=MessageID R=RefList P=Parts M=MessageNUM P#=PartNum (Index0) (i.e 1st is 0) L=Lines @t=Tab @c@l=CRLF(line end) PageList and prefix. A degenerate pagelist is a page list without the list of pages. Only the pre-list part of the pagelist exists in the message record before the article is retrieved in all [Free]Agents. Exceptions exist for messages with large headers and Combined headers in NEW [Free]Agents only. The full PageList and prefix is not retained in FreeAgent or OLD Agents when a message is saved. The list part of the pagelist, and the prefix are removed, (and there is no Index.) I don't know if some of the earlier versions limit the size of headers to what will fit in a single record. In NEW Agents, the pagelist, prefix, and Index are an integral part of the message structure, but the code handles either kind of record, in most cases. The exceptions, however, cause corruption. An OLD saved message, with a degenerate Pagelist and no prefix, is corrupted when copying to a Folder, or between folders. An XRef is normally added in copying from a NGs to a Folder. But nothing is nothing is normally added when copying between folders. When only one header is obtained for an incomplete multipart, no pagelist is created, but the [n/N] tag is modified, and lines: is appended. Stat = 3 generates the broken page icon. (An IDX file item.) However, if headers are combined, the full pagelist and prefix are created. A message record can have any of following components or records. PageList - The record at page0 of the message consisting of an 8 byte prefix, and a list of the UIDs assigned to this message. If only page0 is used, the pagelist record degenerates to an 8 byte header prefix. This is the master record for the message, having the UID displayed for the message and referenced in the IDX file. Prefix - The 32 byte prefix to a message that has an Index. Header - The Agent header, or a Combined Agent header XRefs - The X-tagged text items which can preceed the message text. MsgText - The contents of the message, including imbedded attachments, the Disposition Text that Agent uses to replace deleted attachments and message sections. (Removed is the correct term) Index - Where Agent stores the message pointers and such. II - Message Structures 1. OLD (Three forms) Header Record (512 bytes) pagelist (8 bytes) header Message Record (n pages) pagelist (512 bytes) prefix (32 bytes) header msgtxt Saved Message Record (512 bytes) pagelist (8 bytes) header disposition text 2. NEW (Three forms) Header Record (512 bytes) pagelist (8 bytes) header Header Record (n pages) pagelist (512 bytes) prefix (32 bytes) header Message Record (n pages) pagelist (512 bytes) prefix (32 bytes) header msgtxt index III - Structure Details See : DAT.ASM for - (an attempt at) A generalized DAT file structure IDX.ASM for - The IDX file structure IV - Upgrade Problems 1. Copying of moving a saved message from a NG to a folder, or between folders will result in a corrupt record with the following structure. pagelist (8 byte) prefix (32 byte) - added disptxt_Old Index - added It displays OK (in most cases?), but will not launch. 2. Multipart saved messages will fail to launch if any of the saved headers was deleted. (Not required by OLD versions.) If all are present, the parts may/will be combined and corrupted similar to the above case. Other messages/headers may be corrupted as well. In one case, I observed 2 headers following the message corrupted. V - ORPHAN vs DELETED (PURGED) vs FRAGMENT(s) messages. An orphan message is a message w/o an IDX entry whose records are not marked available in the block_bitmaps. Else its a deleted message. Orphan messages can not be deleted from within Agent, and are the primary cause of extraordinarily large DAT files. (compaction doesn't work either.) Orphans are created when the IDX file is "corrupted" (usually means it's been truncated.) NOTE: The bottom of the IDX file (after the index records) is where the NG (default) properties , and bitmaps are stored. That area is only updated when the file is closed. It is re-created using Group Defaults if it is missing. Lost NG properties is a good indicator that orphans were created. A message fragment is an incomplete deleted message. Retrieving headers and messages results in over-writing these records. Compacting removes these fragments from the DAT file. VI - What's next? A - Understand what drives compaction, and How Agent uses un-allocated space. Deletting a message is straight forward. The IDX entry deleted, and all pages are de-allocated. Deleting a Section, or an attachment, causes: 1. A new pagelist to be written at the EOF. 2. The old pagelist is de-allocated. 3. If the msgbase is compacted, the message records are re-located and become contiguous. 4. Otherwise, new Headers, and other message parts occupy the space freed. 5. Simply de-allocating a page doesn't enable compaction. Two messages could claim the same page if the IDX record wasn't deleted. B - Sorting, Finding Subject text takes too long. Would (smaller) fixed length fields improve sort/find performance? (The speed of Author sorts would seem to say yes!) Can the code be hacked to do this? Example:Replace subject with extracted KeyWords? Use Tagged: Subject (or optionally new subject) for replies? C - Write some programs to demo what's been learned. 1. I have the structure of an OLD (saved) message, that will survive the copy/move problem that comes with upgrading. a) An upgrade conversion program would be easy to write. b) A "down-grade" program would be trickier. It offers the advantage of reducing the DAT size by 66% or more. (For saved/purged messages.) c) Another nice feature would be separate headers for multipart mixed messages, preferably (in my opinion), using the OLD (compact) header. NOTE: The old header takes a few ticks longer to display the message pane. The filesize info isn't in the DAT file, so the directory structure must be searched for file info. The Up-side is, it's obvious if the saved file has been deleted, moved, or re-named. (Size not displayed) ACKNOWLEDGEMENTS The labels I used are derivatives of those I've seen others use. But, I must admit, the structures and syntax of "High Level" languanges, including BASICA and QBASIC, confounds me. So, the information contained in this document is the result of my own efforts. It was obtained by studying [Free]Agent's DAT and IDX file structures. Thanks to: Jim Bradley For his help in understanding How Agent works, and for his contributions to alt.usenet.offline-reader.forte-agent Ron Menck (somewhere in internet land) For informing me that CRC was a hash function. Sherlog For the code to generate hashed fields. (Tried but not applied) And last but not least to FortŠ Inc, for recruiting me into their development program, and inspiring me to this effort.