The design goals of the EBS file format have been:
It is our hope that the EBS file format will motivate scientists working on the analysis of bio-signals to exchange their tools and data sets as public domain software, because similar positive influences of standard file formats have been observed in other scientific communities (e.g. computer graphics, astronomy and operating systems) where well-known scientists have developed a lot of freely available high quality software.
[Note: Having two possible positions of the variable header information allows to change, insert or delete information in the variable header without having to move the encoded signal data as well as reading files while other programs are still adding data to the end of part (3) (on-line processing).]
----------------------------------- | Fixed Header (32 bytes) | (1) +---------------------------------+ | Variable Header | (2) +---------------------------------+ | Encoded Signal Data (4*d bytes) | (3) +---------------------------------+ | Optional Second Variable Header | (4) -----------------------------------Most integer values in the fixed and variable headers are coded as 32-bit words stored in 4 bytes beginning with the most significant byte (Bigendian format). If the value is a signed integer type, then the usual 2-complement representation of negative values will be used. E.g., the value -3 is stored as 0xff,0xff,0xff,0xfd and 1024 is stored as 0x00,0x00,0x04,0x00 (in this text, the prefix '0x' indicates a hexadecimal number as in the C programming language and two hex digits form an 8-bit byte value). All 32-bit integer values in the fixed and variable headers are aligned to 32-bit boundaries, i.e. their start byte position relative to the first byte of the file is always a multiple of 4.
----------------------------------- | Identification Code (8 bytes) | +---------------------------------+ | Data Encoding ID (4 bytes) | +---------------------------------+ | Number n of channels (4 bytes) | +---------------------------------+ | Number m of samples (8 bytes) | +---------------------------------+ | Length d of Data Part (8 bytes) | ----------------------------------- Byte | Value | Meaning -------+---------------+--------------------------------- 0 | 0x45 | ASCII character 'E' 1 | 0x42 | ASCII character 'B' 2 | 0x53 | ASCII character 'S' 3 | 0x94 | another ID character 4 | 0x0a | " 5 | 0x13 | " 6 | 0x1a | " 7 | 0x0d | " 8-11 | see 2.3 | Encoding ID 12-15 | any | number n of channels (unsigned) 16-23 | any | number m of samples per channel (unsigned) | | stored as a 64-bit value or all bytes are | | 0xff if unspecified. 24-31 | any | length d of the data part (3) in 32-bit words | | (i.e. part (3) is 4*d bytes long) or all bytes | | are 0xff if part (4) is not present. | 32- | here begins the first variable header part (2) of an EBS file
[Note: Don't worry about the 64-bit values! Today, most implementations just check, whether the bytes 16-19 have the value 0x00 and read the bytes 20-23 as the 32-bit number of samples, because their operating system can't deal with 64-bit values and with files longer than a few gigabytes. It is all right if your implementation just gives a nice error message for EBS files with more then e.g. 4294967295 samples, but some applications might need files in which the number of samples can't be described with 32-bit (e.g. long-time recordings) and new operating systems support files of this length.]
If all bytes from position 16 to 23 have the value 0xff, then this indicates that the length of the whole file is NOT determined by the fixed header. Instead, the end of the data part (3) is determined by the operating system. This is called an EBS file with 'unspecified length' and may be used when recorded data has to be accessed while the recording is still in progress and part (3) is still growing. In this case, the program can read sequences of n sample values until the first end-of-file condition is signaled by the operating system. The undefined length value is only allowed in combination with TIME-BASED ORDER data encodings (see section 2.3) and no second variable header can be present in files with unspecified length.
[Note: In some (often called 'compressed') variable length encoding formats for the data part (3), the values n and m (number of channels and number of samples) from the fixed header can not be used to predict the exact size of the data part, because in compressed formats, the number of bits per sample is not always fixed. This makes it impossible to find the start of the second variable header part (4) quickly (i.e., without going through the whole data part). In order to avoid this problem, the length of the data part d is stored separately if a second variable header is present.]
If the number of samples is not specified in the fixed header (m = 0xffffffffffffffff), then no second part of the variable header is allowed and d also has the value 0xffffffffffffffff.
Each attribute in the variable header is stored as a TLV (tag, length, value) sequence. A tag is a 32-bit unsigned Bigendian integer number that identifies the type of information stored in the attribute (e.g. patient name). Some tag numbers and the meaning and syntax of the following attribute value are already defined in appendix A, but other new ones may be easily defined for special applications according to the rules in appendix B. The tag number is followed by an unsigned 32-bit length indicator l that specifies the number of 32-bit words (i.e. l*4 bytes) of the directly following value of the attribute. The number of bytes in an attribute value is always a multiple of four.
Both variable header parts end with the special tag 0x00000000. If part (4) is present, these are normally the last bytes of the file. The final special tag 0x00000000 in part (2) is directly followed by the first byte of the data part (3). The tag 0xffffffff is reserved and must not be used in any EBS file. The format of both variable header parts is:
-------------------------- | tag (4 bytes) | +------------------------+ | length l (4 bytes) | +------------------------+ | value (l*4 bytes) | +------------------------+ ... tag, length, value ... +------------------------+ | 0x00000000 | --------------------------The interpretation of the value bytes depends completely on the value of the tag number. Most values are simple data types like integer numbers or text-strings or are sequences of these simple types. If not otherwise specified, the values of attributes defined in this text in appendix A use the following encoding for various simple types and it is recommended that attributes in new additional attributes use the same encoding where this is appropriate. All simple types are encoded so that their length in bytes is always a multiple of four. Simple data types without fixed length (e.g. strings and floating point numbers) are self delimiting (e.g. with final zero bytes).
'3.14' 0x33,0x2e,0x31,0x34,0x00,0x00,0x00,0x00 '-.1' 0x2d,0x2e,0x31,0x00 '+0.910e+45' 0x2b,0x30,0x2e,0x39,0x31,0x30,0x65,0x2b,0x34,0x35,0x00,0x00The Extended Backus-Naur Form (EBNF) grammar of all possible real numbers (without the final 0x00 bytes) is
['-'|'+'] {digit} ['.' {digit}] [('e'|'E') ['-'|'+'] digit {digit}]where digit is a character from '0' to '9', [] means optional, | describes a choice and {} means zero, one or several times. At least one digit must be present before the optional exponential part. The special value "not-a-number" (NaN) is represented by the empty string
0x00,0x00,0x00,0x00.
[Note: If you are unfamiliar with ISO 10646, it is sufficient to know that ASCII and ISO 8859-1 (ISO Latin 1) characters have the same code in this 16-bit character set, i.e. you get the correct 16-bit value by prefixing each ASCII or Latin-1 byte with 0x00. Check a copy of the ISO 10646 standard or of the compatible Unicode Standard (Version 1.1 or higher) if you want to support other characters (e.g., Cyrillic, Greek, Chinese, Japanese, IBM PC, etc.) and need to know their 16-bit codes.]
If text-strings are allowed to span several lines, the code 0x000a (LF, line feed) should be used as the only line separator between these lines. The last line is not followed by another 0x000a code. Strings always end with one or two 0x0000 codes so that the number of bytes in the string including the two or four 0x00-bytes at the end is always a multiple of four. If not otherwise specified, single-line text-strings should not have more than 64 characters (not including the 1 or 2 0x0000 codes at the end), but application programs must be able to cope with longer lines, e.g. by truncating them. Multi-line strings may have any number of lines but should also have not more than 64 characters per line (not including the 0x000a line separation code and the 0x0000 end markers) if not otherwise specified. An example text-string is:
'hello' 0x00,0x68,0x00,0x65,0x00,0x6c,0x00,0x6c,0x00,0x6f,0x00,0x00
Appendix A defines a lot of commonly used attribute tags and the semantic of their values and appendix B defines which tag values you may use to define your own attribute types.
The least significant bit of each attribute tag specifies, whether the attribute value contains information about specific channels (bit is 1) or not (bit is 0). In this way, programs that add, remove or rearrange channel data in EBS files can leave unknown attributes with even tag numbers in the file. They should remove unknown attributes with odd tag numbers and modify odd numbered attributes that are known to the programmer, because their content might assume a special channel layout in the file that does not exist any more after the file modification.
Each attribute tag shall appear not more than once in the variable headers.
The Encoding ID number stored in byte 8 to byte 11 of the fixed header may indicate one of the following data types and data encodings (others might be added in future versions of this specification):
time channel 1 channel 2 channel 3 n = 3 0 20 13 1493 1 5 7 307 2 -11 9 421 3 ... ... m-1will be stored as
0x00,0x14,0x00,0x0d,0x05,0xd5,0x00,0x05,0x00,0x07, 0x01,0x33,0xff,0xf5,0x00,0x09,0x01,0xa5,...(length: 2*n*m bytes, i.e. d >= (n*m*2)/4).
0x00,0x14,0x00,0x05,0xff,0xf5,...,0x00,0x0d,0x00,0x07,0x00,0x09,..., 0x05,0xd5,0x01,0x33,0x01,0xa5,...
0x14,0x00,0x0d,0x00,0xd5,0x05,0x05,0x00,0x07,0x00,0x33,0x01,0xf5, 0xff,0x09,0x00,0xa5,0x01,...
0x14,0x00,0x05,0x00,0xf5,0xff,...,0x0d,0x00,0x07,0x00,0x09,0x00,..., 0xd5,0x05,0x33,0x01,0xa5,0x01,...
0x80,0x00,0x14,0x80,0x00,0x0d,0x80,0x05,0xd5,0xf1,0xfa,0x80,0x01, 0x33,0xf0,0x02,0x72,...The length of the data part in bytes can't be predicted with the parameters in the fixed header if this compressed encoding is used (d >= n*(m+2)/4).
0x80,0x00,0x14,0xf1,0xf0,...,0x80,0x00,0x0d,0xfa,0x02,...,0x80, 0x05,0xd5,0x80,0x01,0x33,0x72,...
[Note: It is expected that CIB_16 will be the most popular format. If you are confused by the many different encodings, just support CIB_16 and reject other EBS encodings with other encoding IDs with a nice error message. There are tools available that allow easy conversion between the different encodings. On some popular processors, you might perhaps prefer CIL_16 if you operate on very huge data sets with efficient methods (e.g. memory mapped files). Time will show, whether the uncompressed TIME-BASED ORDER formats will be of use, and among the compressed formats, TI_16D will perhaps be the most popular version for archive and transfer purposes until more efficient compression techniques are available. If you have only one single channel, then there will be no difference between the TIME-BASED ORDER format and the corresponding CHANNEL-BASED ORDER format. Before you use a coin to decide whether you should indicate a TIME-BASED ORDER or a CHANNEL-BASED ORDER format, it is recommend to use the ID of the CHANNEL-BASED ORDER encoding.]
If a second variable header is present, between 0 and 3 zero padding bytes have to be appended after the above described encodings of the recording in order to give the whole data part a length in bytes that is a multiple of four. This will guarantee a 32-bit alignment for the second variable header part.
As a convention, program user interfaces should give the channels numbers beginning with 1 and samples should be numbered beginning with 0.
[Note: It seems to be most natural for most people to start with 0 for points of time, e.g. digital clocks count from 0 to 59, but only computer scientists find it as obvious that the first channel might also have the number 0). This convention makes user interfaces of programs operating on EBS files more consistent. The numbering convention is only defined for numbers visible to the user of a program and is not intended for variables used internally within a program or for attributes in the variable header.]
The Encoding IDs in the range from 0x80000000 to 0xfffffffe are reserved for private additional encodings and the encoding ID 0xffffffff is reserved and must not be used in EBS files.
[Note: Please use random numbers for your private encoding IDs in the range 0x80000000 to 0xfffffffe and don't simply start at 0x80000000 in order to keep the odds of collisions with other peoples' private IDs small.]
If the need for a new standardized encoding arises, please contact the EBS coordinator (see appendix C) and it is likely that other standard encodings will be added.
Attributes that do not refer to individual channels and thus have an even tag number:
0x00000002 IGNORE (length: any) This attribute should just be ignored by any application. It allows to remove an attribute without having to copy the whole file by just overwriting the tag field of this attribute with the tag number of IGNORE. This attribute may have any arbitrary value, but applications which delete attributes should fill the value with 0x00 bytes so that critical information (e.g. patient names in published files) will surely be destroyed and not only be made invisible. This is the only attribute that may appear several times in a variable header. 0x00000004 PATIENT_NAME (length: > 0 words, <= 33 words) This single-line text-string may contain the full name of the person of whom the signals have been recorded. 0x00000006 PATIENT_ID (length: > 0 words, <= 33 words) This single-line text-string may contain additional information that is used to identify the patient, e.g. a patient number in a hospital, etc. 0x00000008 PATIENT_BIRTHDAY (length: 2 words) This numeric string contains the birthday of the patient in the 'yyyymmdd' format stored as ASCII digits (not as 16-bit UCS-2 characters!). E.g., '19930210' (0x31,0x39,0x39,0x33, 0x30,0x32,0x31,0x30) means February 10, 1993. (This format is one of the date/time formats defined in ISO 8601.) 0x0000000a PATIENT_SEX (length: 1 word) This 32-bit integer value is 1 for male and 2 for female patients. (The numbers are those specified by ISO 5218.) 0x0000000c SHORT_DESCRIPTION (length: > 0 words, <= 33 words) A single-line text-string that summarizes with a few words the contents of the file. This attribute is intended for listings of many EBS files where each EBS file is listed in a single line. 0x0000000e DESCRIPTION (length: > 0 words) A multi-line text-string that may tell the user of a file everything he/she might need to know in addition to the standardized attributes, e.g. the conditions under which the recording has been made, etc. 0x00000010 SAMPLE_RATE (length: > 0 words) The value is the sample rate in Hz stored as a floating point number. E.g., a sample rate of 1024 per second (1024 Hz) might be stored as 0x31,0x30,0x32, 0x34,0x00,0x00,0x00,0x00 ('1024'). 0x00000012 INSTITUTION (length: > 0 words, <= 33 words) This single-line string may contain the name of the institution, where the file has been recorded, processed, etc. 0x00000014 PROCESSING_HISTORY (length: > 0 words) This attribute is a sequence of multi-line strings. Each string may describe a processing step that has been performed in order to produce this file. This might e.g. be the command line that has been used to start a program or a list of parameters that have been applied. A program may add its own processing description as another string to the end of the already existing sequence. Also text information about the equipment used to record the data and who did the recording or processing can be stored here. The number of multi-line text-strings in this attribute is determined by the length of the attribute. 0x00000016 LOCATION_DIAGRAM (length: > 0 words) This attribute contains a graphical diagram of the object (e.g. brain, head, whole body, ...) from which the recorded data has originated or any other diagram that may be used to describe the positions of sensors/electrodes. The attribute CHANNEL_LOCATIONS may assign to channels coordinates in this diagram. In this way, software can generate pictures that indicate the position of electrodes/sensors on or in the body. This attribute contains the background graphic for these pictures and attribute CHANNEL_LOCATIONS contains the coordinates for channel markers. The value of LOCATION_DIAGRAM is a complete Computer Graphics Metafile (CGM) as defined in ISO 8632. Only the binary encoding of a CGM file as defined in ISO 8632-3 is used. The end of the CGM file is filled with 0x00 to a length in bytes divisible by 4. All coordinates are specified as 16-bit integer values (i.e. VDC TYPE is integer and INTEGER PRECISION is 16, which is the default for the binary CGM encoding). The VDC EXTEND should be specified for each picture. The attribute may contain several pictures in the metafile. As most applications won't need the full power of the CGM format, the following subset of CGM elements is suggested as a minimum requirement for software that uses this attribute: BEGIN METAFILE, END METAFILE, BEGIN PICTURE, BEGIN PICTURE BODY, END PICTURE, METAFILE VERSION, METAFILE ELEMENT LIST, VDC EXTENT, POLYLINE Programmers may of course support more CGM functionality (e.g. colors, text, arcs, fill patterns, etc.) as defined in ISO 8632 and it is possible that later versions of this standard will add additional elements to this minimal subset if necessary. Programs may ignore additional elements and warn the user that the displayed diagram might be incomplete or may ignore the whole attribute if additional elements are present. Appendix F gives a short introduction into the minimal CGM subset specified here.Attributes that refer to a special channel layout and that have to be changed by programs which change, add, move or delete channels:
0x00000001 PREFERRED_INTEGER_RANGE (length: (1+1)*n words) For integer data, this attribute gives display software a hint, which value range might be most interesting in the data. The value consists of a recommended display minimum (32-bit signed integer) followed by a recommended display maximum (32-bit signed integer) for each channel beginning with channel 1. E.g., if in 16-bit signed integer data most good values are in the range -2048 to +2047 in all channels, then, if the value of this attribute is 0xff, 0xff,0xf8,0x00,0x00,0x00,0x07,0xff (repeated for each channel), it will be easy for a visualization program to find a nice default scaling factor. If both the minimum and the maximum value for a channel are equal (e.g. both are zero), then no preferred integer range is specified for this channel as it would be the case for all channels if this attribute were not present. 0x00000003 UNITS (length: >= (1+1)*n words) This attribute contains a sequence of physical unit specifications, one for each channel. It assigns each channel an SI unit (e.g. mA, mV, nT) and a quotient of a physical quantity and the encoded sample value that represents it. Each unit specification is a sequence of a floating point value and a single-line text-string. The floating point number is the number with which the sample value must be multiplied in order to get the physical value (e.g. '0.0025' if a sample value of 400 represents 1.0 mV and the specified unit in the text-string is 'mV'). The quotient is followed by a single-line text-string with the usual abbreviation for the SI unit (not more than 8 characters (= 20 bytes) long). E.g., the text-string for Microvolts is 0x00,0xb5,0x00,0x56,0x00,0x00,0x00,0x00. Only linear relations between the physical quantity and the sample value in the encoded data can be described with this attribute. If the float number is 'not a number' (0x00,0x00,0x00,0x00), the physical unit and quantity is unspecified for this single channel as it would be for all channels if the whole attribute were absent. In this case, the unit text-string should also be empty. 0x00000005 CHANNEL_DESCRIPTION (length: >= (1+1)*n words, <= (5+33)*n words) The attribute consists of a sequence of 2*n single-line text-strings, one pair for each channel. The first string in a pair must not contain more than 8 characters (not including the 1 or 2 0x0000-words at the end of each string). This string contains a very short name for the channel that might e.g. be used to label it in diagrams, etc. E.g., in EEG recordings, this will often be the name of the electrode position in the usual 10-20-system, like "F4-A1", "C4-Cz", etc. The second single-line text-string in the pair that follows directly behind each short label string may contain additional descriptive text for each channel that does not fit in the short 8 character label (e.g., in EEG recordings information about electrodes with bad contact, etc.). 0x00000007 CHANNEL_GROUPS (length: >= 3 words) Each channel may belong to zero, one or several groups. A channel group might e.g. be used to group channels from the same biological source (e.g., one group for EEG and one group for ECG channels) so that they can be more conveniently selected together or shown in different colors in interactive programs. The CHANNEL_GROUPS attribute contains a sequence of group descriptions. A single group description consists of - a single-line text-string with a short name for the group (e.g. "EEG") with not more than 8 characters, followed by - a single-line text-string with a description of the group (this may of course be the empty string 0x00000000 if no description is available), followed by - an unsigned 32-bit integer number g with the number of channels in this group which is followed by - g unsigned 32-bit integer numbers with the numbers of the channels (with 0 being the first channel) that belong to this group. If groups are associated with numbers in a user interface, then the first group in this attribute should be assigned number 1. 0x00000009 EVENTS (length: any) This attribute allows to mark events or time intervals in the recording for all channels together or for individual channels. Each event or interval belongs to one event list and each event list has a short name and a description text. In addition, each single event or interval may have a description string. The attribute contains a sequence of event lists. The number of event lists is determined by the length of the attribute. Each event list consists of - a single-line text-string with the short name (not more than 8 characters), followed by - a multi-line description string, followed by - the number e (unsigned 32-bit integer) of events/intervals in this event group, followed by - a sequence of e events or intervals. Each single event or interval in an event list is described by the following sequence - An unsigned 32-bit integer channel number. The first channel is represented by number 0 and 0xffffffff indicates that this event or interval is not associated with a single channel. - An unsigned 64-bit integer number that represents the position (the first sample has position 0) of the event or the start position of an interval. - An unsigned 64-bit integer number that has the value 0x0000000000000000 for events or represents the length of an interval if it has any other value. - A single-line text-string (as usual not more than 64 characters long) may contain a textual description of the type of event or interval that has been marked or just an empty string. The whole event/interval sequence in each event list consists of these event/interval descriptions sorted ascending by their start sample number (second integer value). 0x0000000b RECORDING_TIME (length: 2 or 4 words) This is the time when the recording of the physical signals started. Two different formats are allowed, either only the date (as in PATIENT_BIRTHDAY) or date and time. The date and time format is 'yyyymmddThhmmss' stored as ASCII digits (not 16-bit UCS-2 characters!), the ASCII character 'T' and one final 0-byte. E.g. '19930211T153159' stored as 0x31,0x39,0x39,0x33,0x30,0x32,0x31,0x31,0x54, 0x31,0x35,0x33,0x31,0x35,0x39,0x00 means that the recording started on February 11, 1993, 3:31:59 pm local time. If no time is available, the date alone may be stored as '19930211' or in bytes 0x31,0x39,0x39,0x33,0x30, 0x32, 0x31,0x31. [Note: These attribute formats are two of the date/time formats specified in ISO 8601. The ASCII 'T' has been inserted for compatibility with the ISO standard. This attribute has an odd tag number, because it has to be modified or removed if a beginning part of a recording is removed from an EBS file as then the recording time of the first sample number changes.] If this attribute is either not exactly 4 words long and has not a 'T', a 0x00 and ASCII digits at the specified positions, and is not 2 words long and contains only ASCII digits, then it should be ignored, because it could be another ISO 8601 time format that might be specified as an alternative in a future version of this standard if necessary (e.g. with time zone, milliseconds, several concatenated intervals of time). 0x0000000d CHANNEL_LOCATIONS (length: any) This attribute may only be present together with a LOCATION_DIAGRAM attribute. It defines the locations of sensors/electrodes in the coordinate space (VDC) of the graphical diagrams in LOCATION_DIAGRAM. Each channel may have zero, one or several positions, i.e. a channel may appear on several places in a diagram and in different diagrams. A channel may be associated with several single points or with pairs of points, which might be represented graphically as arrows from the first point to the second one. The value of this attribute is a sequence of positions (each is a point or an arrow representing a channel) and each position is a sequence of the following six 32-bit integer values: - channel number (the first channel has number 0, unsigned value). - picture number (the first picture in the CGM file of LOCATION_DIAGRAM has number 0, unsigned value). - X1 coordinate (signed value) - Y1 coordinate (signed value) - X2 coordinate (signed value) - Y2 coordinate (signed value) Several positions can have the same channel number. For point positions, X1 and Y1 are the coordinates of the points and X2 and Y2 have the special value 0x80000000. For arrow positions, X1 and Y1 are the coordinates of the tail and X2 and Y2 are those of the head. Arrows may e.g. be used to indicate that a channel represents the difference potential between two electrode positions. The coordinates are all inside the CGM VDC extent. 0x0000000f FILTERS (length: >= n words) Information about the filters that have been applied to each channel may be stored here. The attribute contains a sequence of filter lists, one for each channel. It may only be present if also a SAMPLE_RATE attribute is present. For each channel, the filter list consists of a sequence of filter specifications followed by 0xffffffff (i.e. the attribute value contains at least one final 0xffffffff for each channel). The following filter specifications may appear in a filter list: - lowpass filter: it is specified by a sequence of the following three values. o The first 32-bit integer number 0x00000001 identifies the filter as a lowpass filter. o The second parameter is the cutoff frequency of the filter [the usual -3 dB limit, i.e. the frequency where the output voltage has been decreased to 1/sqrt(2) (71%) of the input voltage] which is stored as a positive floating point value in Hz. o The third value describes the falloff after the cutoff frequency. It stores the attenuation in dB per decade as a negative floating point value. If this value is not known, a not-a-number value (0x00000000) may be used here. [Note: A -20 falloff value represents a filter where the output voltage has decreased to -20 dB (that is 10% of its input voltage) at a frequency which is 10 times the cutoff frequency (decade). This is identical to the alternative description that the filter has a -6 dB/octave falloff, i.e. the output voltage has dropped to 50% (-6 dB) at double cutoff frequency. In general, a p-pole filter (also known as a filter of order p) is stored as the value -20*p.] - highpass filter: it is specified by a sequence of the following three values. o The first 32-bit integer number 0x00000002 identifies the filter as a highpass filter. o The second parameter is the cutoff frequency of the filter [the usual -3 dB limit, i.e. the frequency where the output voltage has been decreased to 1/sqrt(2) (71%) of the input voltage] which is stored as a positive floating point value in Hz. [Note: If you are interested in the time constant t in seconds of a highpass or lowpass filter and you know only the cutoff frequency f in Hz: t = 1 / (2*pi*f).] o The third value describes the falloff before the cutoff frequency. It stores the attenuation in dB per decade as a negative floating point value. If this value is not known, a not-a-number value (0x00000000) may be used here. - notch filter: it is specified by a sequence of the following three values. o The first 32-bit integer number 0x00000003 identifies the filter as a notch filter which attenuates only the frequencies around a single peak frequency. o The second parameter is the peak frequency of the filter (the most attenuated frequency) which is stored as a positive floating point value in Hz. o The third value describes the falloff around the peak frequency. It stores the attenuation in dB per decade as a negative floating point value. If this value is not known, a not-a-number value (0x00000000) may be used here.Feel free to use those of the attributes you need, to use none at all or to define your own attribute tags as described in the next appendix.
In order to avoid collisions, the range of tag numbers is separated into 4 parts. In this way, the following methods for assigning new tags are possible:
0x0000000 FINAL TAG must not be used as an attribute tag number. 0x00000001 - 0x0000ffff STANDARD AREA attribute tags defined in appendix A of this text 0x00010000 - 0x7fffffff RESERVATION AREA attribute tags defined in intervals that have been individually reserved by the EBS coordinator for people or institutions uniquely. These people may again reserve subintervals of their tag area for other people, etc. So no one has to fear that his attribute tag will be used by someone else with a different meaning by accident which might cause confusion later. Contact the EBS coordinator if you need your own interval. 0x80000000 - 0x87ffffff FREE AREA attribute tags that may be freely used by everyone with the risk that the same attribute is also used by someone else for a different purpose. Please use a random number within this interval and do not simply start at 0x80000000. 0x88000000 - 0xfffffffe FREE STRING AREA These tag numbers may be used as freely as those in the FREE AREA, but universal programs that allow to display even unknown attributes may assume that the values of attributes with tags in the FREE STRING AREA may be interpreted as single displayable multi-line text-strings. 0xffffffff ILLEGAL TAG may not be used as an attribute tag number.Please remember that the least significant bit of the tag number indicates whether it might be necassary to change the attribute contents if the data part has been modified and thus can't be selected at random.
The current EBS coordinator is the author of this text,
Markus Kuhn
Internet Mail:
ftpebs@rrze.uni-erlangen.de or
mskuhn@cip.informatik.uni-erlangen.de
Anonymous FTP:
ftp.uni-erlangen.de pub/ebs/
The overall structure of the file format is dominated by the separation in 3 parts: fixed header, one or two variable headers and the data part.
We decided not to use a pure ASCII format, because encoding and decoding the data part as ASCII numbers separated by space, tab or new line codes is extremely inefficient in both required storage space and coding time. E.g. 16-bit signed integers need 48 bits in a fixed length ASCII decimal encoding (like in '-03445') and e.g. about 28-35 bits for typical 12-bit EEG data if a format with separating spaces and without leading zeros is used (which is a variable length format unsuitable for direct addressing of sample values). Even a hexadecimal format would have doubled the memory requirements and would have made some very efficient implementation techniques impossible. The fact that computer systems with word sizes that are not powers of two (e.g. the old 12-bit PDPs) have nearly completely disappeared in the scientific environment allows an efficient binary format to be used in a portable way.
We could have decided to encode at least the headers as ASCII text. This would have been seen by many people as very easy to understand, but would have had the following disadvantages:
The fixed header contains only the information needed by all programs in order to read in the data set and in order to determine whether the data can be read in at all or if the file is encoded in an unsupported way. The purpose of the first 8 bytes is to allow programs that can read in other formats in addition to EBS to detect if the current input file has been stored in EBS format or not. We obviously selected the name of the format in ASCII characters as the first 3 bytes. The remaining 5 bytes have been selected so that they will most likely be altered if something has been made wrong during a file transmission. These bytes are:
0x94: An arbitrary byte with the most significant bit set to 1. Not 8-bit clean channels or character set translation functions will likely change this byte. It should also be changed as a version indicator if incompatible changes are made to this specification. 0x0a: ASCII control character line feed (LF). File transfer programs sometimes add a 0x0d (CR) after this byte. 0x13: ASCII flow control character Ctrl-S stops transmission on some channels and is removed on others. 0x1a: Ctrl-Z is the MS-DOS end-of-file marker and will cause problems if the file has not been opened in binary mode. 0x0d: ASCII control character carriage return (CR) will be removed by some file transfer programs.These additional test bytes have only been added, because they are very easy to implement and might help to detect common file handling errors more quickly. They do NOT guarantee data integrity. We felt that mechanisms for data integrity like checksums, digital signatures and forward error correction codes should be applied to complete EBS files with more general packing/encryption tools where this is necessary and should not be included in the EBS specification.
Some system tools like graphical file managers detect file types by characteristic first bytes. In this way, EBS files can easily be represented with a suitable icon.
In order to make it easier to read in the file headers as memory mapped files with processors that can only read 32-bit integer values starting on 32-bit boundaries in the memory, all 32-bit values in the EBS file start on 32-bit boundaries. In addition, the two 64-bit values in the fixed header start on 64-bit boundaries. The consequence of this layout is that all strings, etc. in the headers have to be padded with 0x00 bytes to the next 32-bit boundary, but this can easily be done (together with the UCS-2 translation) in the string read/write routine, etc. once and for all times.
The number of samples must be specified in the fixed header, because it can not be determined for all encodings from the file length, because it is in some applications necessary to know it in advance for memory allocation and because it is necessary to find the first sample value of each channel in CHANNEL-BASED ORDER encodings. All integer values in the fixed and variable headers are stored with at least 32-bit, because today's computers can operate easily with these values and because more integer formats (e.g. also 8-bit and 16-bit) would need more read/write functions and would make 32-bit alignment more difficult.
The variable header is one of the reasons for the flexibility of the format. Arbitrary information can be stored in it, but programs only have to pick out the attributes which they are interested in. It would have been possible to specify the length of the first variable header part or the start of the data part in the fixed header. But this would have made it necessary to calculate the length of the variable header in advance which is quite clumsy to implement or it would have been necessary to jump back to the fixed header after the variable header had been written which makes pipeline processing and sequential file access impossible. Jumping over the variable header by looking at the attribute length indicators is quite easy to implement on the other hand.
It is better to have the variable header stored in front of the data part if it should be readable while the data is still written or if pipeline processing is used. A variable header at the end of the file has the advantage that modifications to it are possible without having to make a copy of the whole file in order to move the encoded data (which might often comprise many hundred megabytes and would need a lot of time and temporary storage to copy). Consequently both places are available for variable header information.
In the variable header, one of the simple types must be able to represent real numbers. Among the alternatives
The currently defined signed 16-bit integer data type for the data part seems to be suitable for nearly all applications, because it allows efficient processing of data from 12-bit A/D converters and because converters with more than 16-bit are used only by very few people. A 12-bit data type would have made processing a little bit more difficult and the storage gain is still higher with the 8-bit difference encoding of 16-bit values. However, adding further data types to EBS like 8-bit signed integers and 4-byte floating point numbers is easily possible.
The TIME-BASED ORDER format is the natural choice for recording equipment and other applications where the number of samples is not known in advance. The CHANNEL-BASED ORDER is much more efficient for processing applications that use only the data of one channel at a time, because then, only the bytes for this channel have to be fetched from mass storage devices. As there are good reasons for both alternatives and as they can easily be converted, both are supported in the EBS format. In a typical EBS usage scenario a conversion program from a vendor specific recording equipment to EBS is necessary and it is a good idea to do the TIME-BASED ORDER to the more efficient CHANNEL-BASED ORDER conversion in this program.
The only compatibility problem for binary formats is that there exist two different integer encodings on the hardware market: Bigendian and Littleendian. Both alternatives are supported in EBS, because they can easily be converted and because this allows at one Institution all data to have the format optimized for the local hardware. However, the performance gains of a suitable byte sex are not as serious as those of the decision for a binary encoding or for the CHANNEL-BASED ORDER, so using the Bigendian format as the preferred format (i.e. CIB_16) is encouraged.
The number of predefined attributes has been limited as much as possible, because this makes the implementation of most of them more likely. It would have been possible to add much more text attributes (e.g. who did the recording, type of equipment, diagnosis, ...), but all of this information can easily be included in the DESCRIPTION or in the PROCESSING HISTORY attribute. The INSTITUTION attribute has been added as an exception to this rule, because some people prefer to have this string printed or displayed separately at a prominent place by their software. The attributes PROCESSING_HISTORY, CHANNEL_GROUPS and EVENTS have no special integer value with the number of processing steps, channel groups or events, because this allows attribute management functions that simply add a few bytes at the end of an attribute value to be used universally to add another item to these lists.
/* Demo program for reading EBS files */ #includeLibrary functions for reading/writing/modifying EBS files allow much easier EBS file management./* for old (non ANSI C) versions of stdio.h */ #ifndef SEEK_SET #define SEEK_SET 0 #endif /* Read in a Bigendian 32-bit integer from a file */ long fgeti32(FILE *f) { long i; i = (long) getc(f) << 24; i |= (long) getc(f) << 16; i |= (long) getc(f) << 8; i |= (long) getc(f); return i; } int main(int argc, char **argv) { FILE *fin; unsigned long samples_hi, samples; unsigned long length_hi, length; int channels; unsigned long tag; unsigned long attribute_length; long pos, data_start; int second_part, ready; unsigned short c; /* ... open fin, etc. ...*/ /* read fixed header */ if ((fgeti32(fin) != 0x45425394) || (fgeti32(fin) != 0x0a131a0d) || (fgeti32(fin) != 0x00000001) || feof(fin)) { fprintf(stderr, "Input file is not in EBS CIB-16 format!\n"); exit(1); } channels = fgeti32(fin); samples_hi = fgeti32(fin); /* number of samples: 2x32-bit */ samples = fgeti32(fin); length_hi = fgeti32(fin); /* length of data part: 2x32-bit */ length = fgeti32(fin); if (samples_hi != 0 || (length_hi != 0 && !(length_hi == 0xffffffff && length == 0xffffffff))) { fprintf(stderr, "Input file is too long for this program!\n"); exit(1); } /* read variable header */ second_part = 0; ready = 0; do { /* read attributes until final tag appears */ while ((tag = fgeti32(fin)) != 0) { attribute_length = fgeti32(fin); pos = ftell(fin); switch (tag) { case 4: /* PATIENT_NAME */ printf("patient name is "); do { c = fgetc(fin) << 8; /* read in 16-bit Unicode character */ c |= fgetc(fin); if (c) { if (c < 127) putchar(c); /* print only ASCII characters and */ else putchar('?'); /* '?' for other Unicode characters */ } } while (c); printf(".\n"); break; default: /* just ignore other attributes */ break; } /* jump to the next attribute */ fseek(fin, pos + attribute_length * 4, SEEK_SET); } if (!second_part) { /* if there is a second variable header part then remember the start of the data part and jump over it */ data_start = ftell(fin); if (length_hi != 0xffffffff || length != 0xffffffff) { second_part = 1; fseek(fin, data_start + length * 4, SEEK_SET); } else ready = 1; } else ready = 1; } while (!ready); /* read data */ fseek(fin, data_start + ( - 1) * samples * 2, SEEK_SET); /* ... */ }
A binary encoded CGM file consists of a sequence of CGM elements very similar to the attributes in the EBS variable headers. Most integer values are 16-bit long, are stored with the most significant byte first (Bigendian) and have a 16-bit alignment. The elements have a class number and an identifier number (both together used like the EBS tag number) and a length indicator. Two forms are possible: a short-form element for element parameter data lengths between 0 and 30 bytes and a long-form element for arbitrary parameter lengths.
A short-form element starts with a 16-bit header of the form
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 bit ------------------------------------------------- | class | identifier | length | -------------------------------------------------and is followed by the number of data bytes indicated in the lower 5 bits which are the parameters of this element. If the number of data bytes is odd, a single zero padding byte follows which gives the whole element including the two header bytes an even number of bytes and preserves the 16-bit alignment. The data length in a short form element may be between 0 and 30 bytes.
Long-form elements start with a 32-bit header of the form
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 bit ------------------------------------------------- | class | identifier | 1 1 1 1 1| word 1 +-----------------------------------------------+ | P| partial length | word 2 -------------------------------------------------followed by between 0 and 32767 bytes. If the bit P (partition flag) is 1, then after the indicated number of data bytes another word with a partition flag and a 15-bit partial length field follows which is again followed by the indicated number of data bytes and if its P bit is still 1, another length word will follow after the data bytes, etc. A very long long-form element might look like this:
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 bit ------------------------------------------------- | class | identifier | 1 1 1 1 1| word 1 +-----------------------------------------------+ | 1| partial length | word 2 +-----------------------------------------------+ ... 'partial length' bytes ... +-----------------------------------------------+ | 1| partial length | +-----------------------------------------------+ ... 'partial length' bytes ... +-----------------------------------------------+ | 0| partial length | +-----------------------------------------------+ ... 'partial length' bytes ...A zero padding byte is added again after the element if the number of bytes of the element is odd in order to preserve the 16-bit alignment.
The following elements are used in the minimal subset for LOCATION_DIAGRAM:
element name class identifier no-op 0 0 BEGIN METAFILE 0 1 * END METAFILE 0 2 * BEGIN PICTURE 0 3 BEGIN PICTURE BODY 0 4 END PICTURE 0 5 METAFILE VERSION 1 1 * METAFILE ELEMENT LIST 1 11 * VDC EXTENT 2 6 POLYLINE 4 1 * these elements must be present in every CGM fileA CGM file (and consequently also a LOCATION_DIAGRAM value) starts with a BEGIN METAFILE element which is followed by a part called 'metafile descriptor'. After the metafile descriptor elements follow zero, one or several pictures and finally an END METAFILE element. No-op elements can have any parameter length and have to be ignored.
---------------------------------------------------------------------- | BEGIN METAFILE | metafile descriptor | pictures ... | END METAFILE | ----------------------------------------------------------------------Reading applications may ignore the data part of BEGIN METAFILE and simple writing applications should put a single zero byte in the data part of this first element (followed by a padding byte). The END METAFILE element has no parameters, its length field is always zero. The metafile descriptor must contain at least the two elements METAFILE VERSION and METAFILE ELEMENT LIST. Simple reading applications may just ignore them and simple writing applications should give METAFILE VERSION a single 16-bit integer value 1 as its parameter. The parameter of METAFILE ELEMENT LIST is a list of the class and identifier codes of the non-mandatory elements that might appear in the file (which allows to determine quickly which subset of CGM is supported by the application that wrote the file). Programs that write only CGM files using this minimal subset should use the 11 16-bit integer numbers 5 (the number of elements specified), 0, 3, 0, 4, 0, 5, 2, 6, 4 and 1 as parameters to METAFILE ELEMENT LIST.
The BEGIN METAFILE element and the suggested metafile descriptor look like this
0x00,0x21,0x00,0x00, 0x10,0x22,0x00,0x01, 0x11,0x74,0x00,0x05,0x00,0x00,0x00,0x03,0x00,0x00,0x00,0x04, 0x00,0x00,0x00,0x05,0x00,0x02,0x00,0x06,0x00,0x04,0x00,0x01The END METAFILE element is
0x00,0x40.After the metafile descriptor elements, a sequence of pictures follows. Each picture has the following structure:
----------------------------------------------------------------------- | BEGIN PIC. | pic. descr. | BEGIN PIC. BODY | pict. elem. | END PIC. | -----------------------------------------------------------------------Each picture starts with a BEGIN PICTURE ELEMENT and ends with an END PICTURE element. Reading applications may ignore the parameter of BEGIN PICTURE and simple writing applications can just use a single zero byte (as with BEGIN METAFILE). The elements BEGIN PICTURE BODY and END PICTURE have no parameters (i.e., their length field is always zero). The BEGIN PICTURE BODY element separates the picture descriptor elements from the elements that represent the graphical objects (here only lines) of the picture.
The only required picture descriptor element in this minimal subset of CGM is VDC EXTENT. It has 4 16-bit signed integer values as parameters (length 8 bytes): The X coordinate of the lower left corner, the Y coordinate of the lower left corner, the X coordinate of the upper right corner and the Y coordinate of the upper right corner. These two points define the VDC extent, a rectangular area which contains the parts of the coordinate space that contains the diagram. Display software must be capable of scaling the VDCs (virtual device coordinates) used in the picture elements so that the VDC extend is always mapped to a suitable size on the output device. This scaling should use the same scaling factor for each axis in order to preserve the aspect ratio. The positive direction of the X and Y axis is also determined by the VDCs of the lower left and the upper right points given in the VDC EXTENT element.
The only required graphical picture element in this subset that may appear between BEGIN PICTURE BODY and END PICTURE BODY is POLYLINE. This element represents a sequence of connected lines. Its parameters are 2*p 16-bit signed integer values (length field: 4*p) which are VDCs of p points stored as pairs of X and Y coordinates. The line is drawn from the first point to the second, from the second point to the third, ..., and from point p-1 to point p.
If unknown elements appear in a CGM file, the application should either warn the user that it might not be able to display the full diagram correctly and ignore the unknown elements or it may ignore the whole CGM file.
[Note: Using the CGM standard as the format for the LOCATION_DIAGRAM attribute allows easy extension of the graphical capabilities of this attribute, because only the used subset of CGM has to be enlarged and no new graphic format extensions have to be invented. In addition it allows to use existing CGM tools for designing the diagrams.]
attribute value -- This is the sequence of bytes contained in an attribute. Its length is always a multiple of four bytes and may be up to 16 gigabytes.
Bigendian -- In 'Gulliver's travels' by Jonathan Swift a politician which insists on opening an egg on the big end first. In computer architecture the property of a microprocessor to store the more significant bytes of a word at the lower addresses in memory. Littleendians do it the other way.
CGM (computer graphics metafile) -- A file format for storage of pictures as collections of graphical elements (e.g. lines, text, circles, etc.) defined in ISO 8632.
channel-based order -- A data part layout in which the sample values of a single channel for the complete recording time are stored together sorted by the recording time. All these channel recordings are stored together sorted by their channel number.
compressed encoding -- A storage representation of sample values that is more efficient in storage capacity than the natural encoding of using equally sized machine words for each sample value independently of all other sample values.
data part -- This is the part of an EBS file that contains nothing but encoded bio-signal data values (and up to 3 zero padding bytes at the end if a second variable header is present).
EBS (extensible bio-signal file format) -- The type of computer file specified in this text suitable for the exchange, processing and storage of bio-signal recordings and additional information.
first variable header part -- The attributes and the first final tag that are located directly after the fixed header and before the data part.
fixed header -- The first 32 bytes of every EBS file form the fixed header, which contains information needed by all programs that process EBS files.
ISO -- Short name for the 'International Organization for Standardization' in Geneva. You can order ISO standards from your local national standards body (e.g. ANSI, DIN, BSI, AFNOR, etc.).
Littleendian -- see Bigendian.
multi-line text-string -- A simple data type that is used as a part of many attribute value syntaxes. If not otherwise specified, it should not contain more than 64 characters per line encoded in the UCS-2 character set. Lines are separated by the line feed control character 0x000a.
recording -- A complete collection of all sample values within a certain interval of time measured at a certain sample frequency.
sample value -- A numeric representation of a physical or other quantity at a point of time associated with a channel.
second variable header part -- The attributes and the final tag that are located directly after the data part. This part of the variable header may be absent.
single-line text-string -- A simple data type that is used as a part of many attribute value syntaxes. If not otherwise specified, it contains up to 64 characters encoded in the UCS-2 character set and no line feed control characters.
tag -- An attribute tag is a 32-bit number that identifies the type of an attribute, i.e. it indicates the syntax and semantic of an attribute value.
time-based order -- A data part layout in which the sample values of a single point in time are stored together sorted by the number of their channel. All collections of these samples for a single point in time are stored together sorted by their recording time.
UCS-2 -- The 2-byte encoding of the 'Universal Multiple-Ocetet Coded Character Set' (UCS) defined in ISO 10646. This character set is also known under the more popular name 'Unicode'.
variable header -- The part of an EBS file that contains the information which is only needed by some applications. This information is stored in attributes.