NTX file format --------------- Version of this document: 2.0 by Cesar A. Gil, January 06, 1997 Introduction The NTX file format is the format of index files used by Clipper. It is much more efficient than NDX files, with respect to performance and file size (usually more than twice the speed, and only 80% of the size of an equivalent NDX file). The extension is also different to indicate that its format is not the same as NDX. The NTX file stores a B-tree. A B-tree consists of nodes pointing to sub-nodes. The B-tree allows you to traverse the nodes in a depth-first order, thus allowing you to retrieve the keys in crescent order. It also allows you to retrieve a given key or its closest-matching key with an efficient algorithm, since you must only traverse a few nodes to locate a specific record. However, as far as theory is concerned, nothing beyond a diagram of a B-tree will be given here. Updating an NTX file may be very difficult, since after a change you must keep the B-tree balanced. However updating the B-tree is a laborious task, the information given here may easily allow you to read the NTX file. Diagram of a B-tree root node ----------------------- | . | 50 | . | 60 | . | ----------------------- | | | +----------------+ | +-----------+ | | | ----------------------- ----------------------- -------------- | . | 10 | . | 20 | . | | . | 52 | . | 56 | . | | . | 62 | . | ----------------------- ----------------------- -------------- | | | | | | | /// | /// /// /// /// /// | ----------------------- | . | 15 | . | 16 | . | ----------------------- | | | /// /// /// In this diagram we have assumed that each node can hold up to 2 keys, however a B-tree can be specified to have any maximum number of keys per node (n >= 2). A node doesn't need to be used up to the maximum. For example, in this diagram we have a node which is not fully used (the node which only contains the value 62). We also didn't care here to create a balanced tree, as would happen in a real situation, since we are concerned about giving the reader a feeling of how it is to traverse a B-tree. The diagram is also intended to help visualize the structure of the NTX file. Each node contains pointers between its key values. For example, look at the root node. The pointer to the left of the value 50 points to a sub-tree that contains every key lesser than 50. The pointer bewteen the values 50 and 60 points to a sub-tree containing every value between 50 and 60. The pointer after the value 60 points to a sub-tree containing all values greater than 60. At some given time we will arrive at a point where there will be no more branching. We indicate that by an empty pointer (represented here by ///). In NTX files, an empty pointer (that does not point to a sub-tree) contains a 0. Now it is very easy to see that, if you traverse the tree in a depth- first search, you'll retrive all the key values in order. It's also very easy to define an algorithm to locate a given key efficiently in the tree. The difficult part is updating the B-tree, by inserting, changing or removing records, because the B-tree must be kept balanced. The algorithm for doing so is very complex and literature is available on the subject. Description of an NTX file The NTX file consists of pages of 1K. The first 1K page contains a header. The other pages contain nodes of the B-tree. (Each 1K page will hold a node). Header structure: Offset 00h unsigned signature - contains 09 if good NTX file 02h unsigned version - version of indexing system that the file was created under (0x1 is typical but may change) 04h long root - file offset (within the NTX file) where the root page is. 08h long free_page - file offset (withing the NTX file) where the first of a list of free pages is. A 0 value ends the list. The Clipper index system is intended to reuse free pages. 0Ch unsigned item_size - the size of the item. 0Eh unsigned key_size - the size of the index key 10h unsigned key_decim - number of decimal positions for the key, if it is a numeric key. 12h unsigned max_item - maximum number of items per page. 14h unsigned min_item - minimun number of items per page (max_item/2 is typical) 16h char key_expr[256] - an array containing the key expression for the index. The expression must be terminated by null (ASCII 0) and its length cannot be greater than 256 bytes. 272h unsigned unique - contains the value of the UNIQUE option for the index. (1=index is UNIQUE; 0=isn't UNIQUE) Comments: (1) The size of the index key (key_size) is determined when you issue the command INDEX ON key_expr TO file.ntx The key_expr is evaluated for each record of the data file. The value returned by evaluating key_expr has its size calculated. It is assumed that all the evaluations of key_expr will return values of the same type and length. If you define a key_expr that returns values of different lengths, the Clipper indexing system will give unpredictable results and, eventually, an error will happen. An example (of what you shouldn't do) is: INDEX ON TRIM(CLIENT_NAME) TO CLIENTS Based on the determined key_size, Clipper will calculate how many items a page can hold and will thus determine max_items and min_items. Remember, each item holds a key value, whose size depends on the key_size. That's why the item size is dynamically determined. Node structure: Offset 00h unsigned item_count - how many items this particular page holds. 02h unsigned item_offset[item_count] - holds the offsets, within the page, where the items are located. ?? item area The size of each item is: 8 + key_size. The last item doesn't contain a key value, only a pointer to a sub-tree. Each item has the following structure: long page_offset - file offset (within NTX file) where sub-tree is. If there's no subtree, a 0 is stored. long recno - record number for this key value (in the DBF file) char key_value[key_size] For example, the node: ----------------------- | . | 50 | . | 60 | . | ----------------------- | | | to page to page to page at at at offset x offset y offset z Will be represented in a page containing: Offset 00h unsigned item_count = 3 02h unsigned item_offset[0] = 08h 04h unsigned item_offset[1] = 12h 06h unsigned item_offset[2] = 1Ch 08h item[0] : long page_offset : x long recno : DBF record number pointed by key value 50 char key_value[2] : '50' 12h item[1] : long page_offset : y long recno : DBF record number pointed by key value 60 char key_value[2] : '60' 1Ch item[2] : long page_offset : z Conclusion: The NTX file structure is easy to access. The main concern is about the fact that each NTX file will have items of different sizes and pages with different number of items. Source code example - NTXVIEW.C : Author : Cesar A. Gil Date : January 06, 1997 Language : C Use : NTXVIEW file.ntx Description: lists the contents of the ntx file's header, then traverses all the tree in depth-first order, listing all the keys in order. --------------------------------------------------------------- /* * ntxdump.h */ #define MAX_KEY 256 #define BUF_SIZE 1024 typedef struct { unsigned type; unsigned version; long root; long next_page; unsigned item_size; unsigned key_size; unsigned key_dec; unsigned max_item; unsigned half_page; char key_expr[MAX_KEY]; char unique; } NTX_HEADER; typedef struct { long page; long rec_no; char key[1]; } NTX_ITEM; typedef struct { unsigned item_count; unsigned item_offset[1]; } NTX_BUFFER; --------------------------------------------------------------- /* * ntxview.c */ #include "stdio.h" #include "stdlib.h" #include "fcntl.h" #include "io.h" #include "ntxview.h" NTX_HEADER ntx_header; int ntx_handle; void view_page(long page_offset); void main(int argc, char ** argv) { if( argc != 2 ) { printf( "Use: ntxhead arquivo.ntx\n" ); exit(1); } ntx_handle = open(argv[1], 0x8000); if( ntx_handle == -1 ) { printf( "Open error: %s\n", argv[1] ); exit(1); } /* Reads header */ if( read(ntx_handle, &ntx_header, sizeof(NTX_HEADER)) != sizeof(NTX_HEADER) ) { printf( "Error reading header of %s\n", argv[1] ); exit(1); } /* Prints header information */ printf( "***** NTX header *****'n" ); printf( "Signature %xh\n", ntx_header.type ); printf( "Version %xh\n", ntx_header.version ); printf( "Root %lxh\n", ntx_header.root ); printf( "Next page %xh\n", ntx_header.next_page ); printf( "Item size %xh\n", ntx_header.item_size ); printf( "Key size %xh\n", ntx_header.key_size ); printf( "Key decimals %xh\n", ntx_header.key_size ); printf( "Max item %xh\n", ntx_header.half_page ); printf( "Key expr: '%s'\n", ntx_header.key_expr ); printf( "Unique: %d\n", ntx_header.unique ); view_page(ntx_header.root); close(ntx_handle); } void view_page(long page_offset) { char * page; NTX_BUFFER * buffer; NTX_ITEM * item; unsigned i; /* Allocates page */ page = malloc(BUF_SIZE); buffer = (NTX_BUFFER *) page; if( page == NULL ) { printf( "Error allocating memory.\n" ); exit(1); } /* Moves to the given page offset within the file */ lseek(ntx_handle, page_offset, 0); /* Reads page */ if( read(ntx_handle, buffer, BUF_SIZE) != BUF_SIZE ) { printf( "Error reading page %lxh\n", page_offset ); exit(1); } /* Traverses the vector, item by item */ for( i = 0; i < buffer->item_count; i ++ ) { item = (NTX_ITEM *) (page + buffer->item_offset[i]); if( item->page != 0 ) view_page(item->page); printf("dbf recnum: %5lx Key: %.*s\n", item->rec_no, ntx_header.key_size, &item->key); } /* Handles last pointer */ item = (NTX_ITEM *) (page + buffer->item_offset[buffer->item_count]); if( item->page ) view_page(item->page); /* Deallocates page */ free(page); } --------------------------------------------------------------- - end of document -