File Formats of Micro Focus Cobol This COBOL system provides three types of data file organization: relative, indexed and sequential. Additionally, sequential files fall into one of three categories: record sequential, printer sequential and line sequential. Record sequential, relative and indexed files can contain records that are either all of fixed length, or records that are of variable length. These files have fixed or variable format respectively. Printer sequential and line sequential files contain records that are implicitly variable length and have separate file formats. Record sequential, relative and indexed files allow two different formats: fixed and variable. The file format is specified explicitly or implicitly as described in the section Fixed and Variable Format later in this document. Fixed and variable format map indirectly onto two types of file structure: fixed and variable. The physical structure of each of these file types as they exist on disk is explained in this document. This information is provided for anyone who wants to understand the nature of data files produced by programs created using this COBOL system, or to process them outside the COBOL system where appropriate. It can also be useful for debugging programs. However, you do not need to understand these file structures to use data files from COBOL programs. You are advised not to process the files yourself using byte-stream I/O, but to use COBOL syntax or the file handler call interface (documented in an add-on product). This ensures that applications will function properly if file formats are enhanced or developed in the future. Fixed and Variable Format The format of the record sequential, relative and indexed file can be explicitly or implicitly fixed or variable. The file format is always fixed unless one of the following conditions is specified for the file or file record: - The RECORDING MODE IS V clause which always creates variable format. - The RECORD IS VARYING clause which creates variable format provided no RECORDING MODE IS F clause is present. - The OCCURS...DEPENDING ON clause which creates variable format when you set the NOODOSLIDE Compiler directive. - The RECMODE"V" Compiler directive which creates variable format for each file where no RECORDING MODE IS F or RECORD CONTAINS n CHARACTERS clauses are present. - The RECMODE"OSVS" Compiler directive which creates variable format for files that contain fixed record definitions of different lengths. - The data compression feature which creates variable format. Basic File Structures There are four basic structures used for all files: fixed, variable, line sequential and printer sequential. Fixed is the structure used by fixed format record sequential and relative files, and contains only fixed length records. The size of each record is equal to the length of the largest record definition for the file. Variable is the structure of variable format record sequential and relative files, and fixed and variable format indexed files. Variable structure files can contain fixed or variable length records. Line sequential is the structure of files with the line sequential organization. Line sequential files are designed to enable you to read source or text files created with the system editor. As such, the format is operating system dependent but typically contains variable length records with trailing spaces removed. See the section Line Sequential Organization later in this document for further details. Printer sequential is the structure of files that are destined for a printer, either directly or by later spooling of a disk file. They contain vertical and horizontal tab controls. The structure of these files reflects what is required to drive a printer and so is independent of the operating system. See the section Printer Sequential Files later in this document for further details. The following sections describe these file structures. Fixed Structure Fixed structure files contain no record or file header information. The records are all the same length, that length being determined by the longest record defined in the File Description (FD) in the program's File Section. Variable Structure Any files containing variable length records, with the exception of line sequential files and files destined for the printer, contain a block of 128 bytes of header information at the start of the file. Each record in the file is preceded by a 2- or 4-byte control field. The top 4 bits of this field indicate the status of the record. A value of 0100 in these bits means that this record is a user data record. Any other value means that this record has either been deleted or is used internally. The remainder of the control field contains the length of the record. For all files where the maximum record size is less than 4096 bytes (excluding the prefix), the prefix is 2 bytes long. For all other files, the prefix is 4 bytes long. Each record always starts on the next 4-byte boundary in the file. You must not alter the header information or the control fields in any way since these are maintained by this COBOL system. Record Header Types First 4 bits Record type __________________________________________________________________________ 1 (0001) A system record (IDXFORMAT"4" files only). This contains duplicate occurrence details in the data file. 2 (0010) Deleted record (available for reuse via the Free Space list). 3 (0011) System record. 4 (0100) User data record. 5 (0101) Reduced user data record (indexed files only). The 16-bit word immediately following the data record, as indicated by the length in the header, contains the space between the end of the data record and the start of the next record header. 6 (0110) Pointer record (indexed files only). The first 4 bytes following the record header contain the offset in the file to the location of the user data record. 7 (0111) User data record referenced by a Pointer record. 8 (1000) Reduced user data record referenced by a Pointer record. The first record in every variable structure file is a system record called the File Header record. This is normally 128 bytes long. The record header for each record starts on a 4-byte boundary. Consequently, a record may be followed by up to three padding characters, usually spaces. These padding characters are not included in the record length. Variable structure File Header record description: Offset Size Description of the field _________________________________________________________________________________ 0 4 Length of the file header. The first 4 bits are always set to 3 (0011 in binary) indicating that this file header record is a system record. The remaining bits contain the length of the file header record. If the maximum record length is less than 4095 bytes, the length is 126 and is held in the next 12 bits; otherwise it is 124 and is held in the next 28 bits. Hence, in a file where the maximum record length is less than 4095 bytes, this field contains x"30 7E 00 00". Otherwise, this field contains x"30 00 00 7C". 4 2 Database sequence number, used by add-on products supplied with this COBOL system. 6 2 Integrity Flag. Indexed files only. If this is non-zero when the header is read, it indicates that the file is corrupt. 8 14 Creation date and time in YYMMDDHHMMSSCC format. Indexed files only. 22 14 Reserved. 36 2 Reserved. Value 62 decimal; x"00 3E". 38 1 Not used. Set to zeros. 39 1 Organization. 1 = Sequential 2 = Indexed 3 = Relative 4 1 Not used. Set to zeros. 41 1 Data compression routine number. 0 = No compression 1 = CBLDC001 2-127 = Reserved for internal use 128-255 = User-defined compression routine number 42 1 Not used. Set to zeros. 43 1 File format. 0 = Default 1 = C-ISAM 2 = LEVEL II COBOL 3 = Indexed file format used by this COBOL system 4 = IDXFORMAT"4" 44 4 Reserved. 48 1 Recording mode. 0 = Fixed format 1 = Variable format For indexed files, the Recording Mode field of the index file takes precedence. 49 5 Not used. Set to zeros. 54 2 Not used. Set to zeros. 56 2 Maximum record length. Example: with a maximum record of length 80 characters, this field will contain x"00 50". 58 2 Not used. Set to zeros. 60 2 Minimum record length Example: with a maximum record of length 2 characters, this field will contain x"00 02". 62 46 Not used. Set to zeros. 108 4 Version and build data for the indexed file handler creating the file. Indexed files only. 112 16 Not used. Set to zeros. Structure of Each File Organization The following sections describe the physical structure of the four data file organizations. Sequential Organization Two types of sequentially organized file are available under this COBOL system: record sequential and printer sequential. These file types are described in the following sections. Record Sequential Files Record sequential files are intended to cater for binary data. These files consist of a series of either fixed or variable length records. The order of records in these files is set by the order of WRITE statements when the file is created. The record order does not change once it has been set. New records are added to the end of the file. Each record in a record sequential file (except the first record) has a unique record which precedes it, while each record (except the last record) also has a unique record that follows it. Record sequential files that are fixed length and are not destined for the printer have no record delimiter; the end of one record is immediately followed by the beginning of the next. Fixed Format Record Sequential Structure In a fixed format record sequential file, each record immediately follows the previous record in the file. Each record is the same length as the maximum length record. +--------------------------------------------+ | Fixed length record | +--------------------------------------------+ | Fixed length record | +--------------------------------------------+ . . . . +--------------------------------------------+ | Fixed length record | +--------------------------------------------+ Variable Format Record Sequential Structure In a variable format record sequential file each record written is preceded by a record header containing the length of the record; the record is written at the length defined in the program; the file contains a standard variable structure file header record. Up to three padding characters can follow a record to ensure that the next record starts on a four-byte boundary. +----------------------------------------------+ | File Header record - 128 bytes | | | +--------+----------------------------+---+----+ | Header | Variable length record | | +--------+----------------------------+---+--------+--+ | Header | Variable length record | | +--------+-----------------------------------------+--+ . . . . . . . . +--------+-------------------------------------+---+ | Header | Variable length record | | +--------+-------------------------------------+---+ Printer Sequential Files You can define a sequential file as a printer sequential or printer destined file by specifying one of the following clauses: - LINE ADVANCING in the SELECT statement - ASSIGN TO PRINTER For printer sequential files, specifying the WRITE statement without the BEFORE or AFTER clause has the same effect as if you had specified AFTER 1. Specifying the WRITE statement with the BEFORE or AFTER clauses gives explicit vertical positioning which you must only use for files destined for the printer. Using these clauses for any other type of file will generally corrupt the file. Printer sequential files should not be opened for INPUT or I/O. Printer sequential file format consists of a sequence of print records which are terminated by a carriage return (x"0D") with zero or more vertical positioning characters between the print records. A print record consists of zero or more printable characters. The OPEN statement causes a x"0D" to be written to the file to ensure that the printer is located at the first character position before printing the first data record. The WRITE statement causes trailing spaces to be removed from the record before it is written to the printer with a terminating x"0D". The BEFORE or AFTER clause specified in the WRITE statement causes one or more line-feed characters (x"0A"), a form-feed character (x"0C"), or a vertical tab character (x"0B") to be sent to the printer after or before writing the data record. Printer Sequential Structure +----+ | 0D | +----+ | 0A | +----+------------------------------------------+----+ |Print record | 0D | +----+----+----+--------------------------------+----+ | 0A | 0A | 0A | +----+----+----+--------------------------+----+ |Print record | 0D | +--------------------------------+----+---+----+ |Print record | 0D | +----+---------------------------+----+ | 0C | +----+ | 0D | +----+----+ | 0A | 0A | +----+----+-----------------------------+----+ |Print record | 0D | +---------------------------------------+----+ . . . . . . Line Sequential Organization Line sequential files are implemented to be consistent with your system editor and any other similar utilities that use text files. They are strictly operating system dependent; however, the scheme used by the PC-DOS, OS/2 and UNIX operating systems is widely used and is described here. A record delimiter is written after every record. The delimiter character(s) vary depending on your operating environment. See the environment specific sections for line sequential files below for further information. Line sequential files hold variable length text records, each containing zero or more displayable or non-displayable characters. A WRITE statement removes trailing spaces from the data record then adds the system record delimiter. A READ statement removes the record delimiter and if necessary pads the record area with trailing spaces or returns surplus text as following records. Each text record is followed by a record delimiter chosen by the operating system to be consistent with your system editor. The record delimiter varies depending on your operating system. See the environment specific sections for line sequential files below for further information. A line sequential file must not be described as a printer destined file and must not use the BEFORE or AFTER clause in the WRITE statement. System editors expect text to contain only displayable characters. However, line sequential files allow non-displayable characters with a value of less than x"20" (space) to be written to and read from them. During a WRITE operation, non-displayable characters in the record area are written to the file, each with a preceding LOW-VALUES or null character (x"00") to show that they are not text characters. A READ operation on the file removes the preceding LOW-VALUES characters added during the WRITE operation. You can prevent null insertion when writing to the file either by specifying the -N run-time switch, or by a call to functions 46 or 47 of routine x"91" to turn the N switch on or off, respectively, for a particular file. During a WRITE operation, any tab characters in a line sequential file (x"09") are expanded to every eighth character position; that is, the character following a tab will be in one of the columns 9, 17, 25, 33, and so on. You can compress space characters to tabs during output using either the +T run-time switch, or a call to function 48 or 49 of routine x"91" to turn the T switch on or off, respectively, for a particular file. Line Sequential Files on DOS, Windows and OS/2 The record delimiter x"0D0A" is used on DOS, Windows and OS/2. Any single byte x"1A" (user terminate run code) is used as an unconditional file terminator (except when preceded by a null character, as described below). If no x"1A" character is encountered, the physical end of the file serves as the file terminator. When the file is closed, a terminating x"1A" character is NOT written. Instead, the length of the file is used to determine where it ends. On input, this COBOL system uses just the x"0A" as the record delimiter. Additional device control characters (such as x"0D", x"0B", x"0C") are discarded. x"1A" acts as a record delimiter and also denotes the end of the file. If you turn the N run-time switch off, you must make sure that any COMP data does not contain bytes with a value of x"1A" (end-of-file character) or x"0D" (record delimiter). Line Sequential Files on UNIX The record delimiter on UNIX is a single byte x"0A" (the default). However, for line sequential and relative files only, this default record delimiter can be changed to that used by DOS, Windows and OS/2. If you turn off the N run-time switch (-N), you must make sure that any COMP data does not contain bytes with a value of x"0A" (record delimiter). Line Sequential Structure +-----------------------------------------------+---------+ | Variable length record |delimiter| +----------------------------------+---------+--+---------+ | Variable length record |delimiter| +----------------------------------+---------+--+---------+ | Variable length record |delimiter| +-----------------------------------------------+---------+ . . . . . . . . . +----------------------------------------------+---------+ | Variable length record |delimiter| +----------------------------------------------+---------+ Relative Organization Relative file organization enables you to access any record randomly by specifying its ordinal position within the file. Data held in relative files can consist of fixed or variable format records which are of fixed length, the length being the length of the longest record defined for the file. This is necessary so that the COBOL file handling routines can quickly calculate the physical location of any record given its record number within the file. Each record is uniquely identified by a record number. The first record in the file is record number one, the second record is number two, and so on. Each record is followed by a record marker unless it is a variable length file which indicates the current state of the record. In a variable format file, the marker follows the fixed length slot. The marker varies depending on your environment. See the environment specific information sections for relative files below for further information. When you delete a record from a relative file, the only action is to change that record's marker. However, the contents of a deleted record physically remain in the file until a new record is written. If, for security reasons, you want to make sure that the data does not exist in the file, then you must overwrite the record using the REWRITE statement before you delete it. A fixed format relative file can be processed as a fixed format sequential organization file by defining the maximum record length to be larger than that for the relative file (see the sections on operating environment specific information for details). A variable format relative file cannot be processed as a sequential organization file. The length of a relative file is determined by the largest record number used when actually writing a record to the file. Relative File Organization on DOS, Windows and OS/2 On DOS, Windows and OS/2, the current state of the record is indicated by a two-byte marker as follows: Marker (hex) Description __________________________________ 0D0A Record present 0D00 Record deleted or never written. A fixed format relative file can be processed as a fixed format sequential file by defining the maximum record length to be two characters larger than that for the relative file. The size of a relative file on DOS, Windows and OS/2 is calculated as follows. Fixed format: (max-rec-len + 2) * largest-record-number Variable format: 128 + (max-rec-len + 2 + header) * largest-record-number where header is 2 if max-rec-len is less than 4096, otherwise header is 4. Relative File Organization on UNIX On UNIX, the current state of a record for fixed length relative records is indicated by a one-byte marker as follows: Marker (hex) Description __________________________________ 0A Record present 00 Record deleted or never written The current state of a record for variable length relative records is indicated by a two-byte marker as follows: Marker (hex) Description 0D0A Record present 0D00 Record deleted or never written A fixed format relative file can be processed as a fixed format sequential file by defining the maximum record length to be one character larger than that for the relative file. The size of a relative file on UNIX systems is calculated as follows. Fixed format: (max-rec-len + 1) * largest-record-number Variable format: 128 + (max-rec-len + 2 + header) * largest-record-number where header is 2 if max-rec-len is less than 4096, otherwise header is 4. Fixed Format Relative Structure A fixed format relative file is the same as a fixed format sequential file, except each record is followed by a record marker. +-------------------------------------------+------+ | Fixed length record - Record 1 |marker| +-------------------------------------------+------+ | Fixed length record - Record 2 |marker| +-------------------------------------------+------+ . . . . . . +-------------------------------------------+------+ | Fixed length record - Record i deleted |marker| +-------------------------------------------+------+ . . . . . . +-------------------------------------------+------+ | Fixed length record - Record j - unused |marker| +-------------------------------------------+------+ . . . . . . +-------------------------------------------+------+ | Fixed length record - Record n |marker| +-------------------------------------------+------+ UNIX For relative files in random access, writing records 1, 2 and 9 will occupy the same disk space as creating a file containing records 1, 2 and 3 on UNIX.