8SVX IFF 8-Bit Sampled Voice ============================ 1. Introduction --------------- This is the IFF supplement for FORM "8SVX". An 8SVX is an IFF "data section" or "FORM" (which can be an IFF file or a part of one) containing a digitally sampled audio voice consisting of 8-bit samples. A voice can be a one-shot sound or - with repetition and pitch scaling - a musical instrument. The 8SVX format is designed for playback hardware that uses 8-bit samples attenuated by a volume control for good overall signal-to-noise ratio. So a FORM 8SVX stores 8-bit samples and a volume level. A similar data format (or two) will be needed for higher resolution samples (typically 12 or 16 bits). Properly converting a high resolution sample down to 8 bits requires one pass over the data to find the minimum and maximum values and a second pass to scale each sample into the range -128 to 127 (signed byte). So it's reasonable to store higher resolution data in a different FORM type and convert between them. For instruments, FORM 8SVX can record a repeating waveform optionally preceded by a startup transient waveform. These two recorded signals can be pre0synthesized or sampled from an acoustic instrument. For many instruments, this representation is compact. FORM 8SVX is less practical for an instrument whose waveform changes from cycle to cycle like a plucked string, where a long sample is needed for accurate results. FORM 8SVX can store an "envelope" or "amplitude contour" to enrich musical notes. A future voice FORM could also store amplitude, frequency, and filter modulations. FORM 8SVX is geared for relatively simple musical voices, where one waveform per octave is sufficient, the waveforms for the different octaves follow a factor-of-two size rule, and one envelope is adequate for all octaves. You could store a more general voice as a LIST containing one or more FORMs 8SVX per octave. A future voice FORM could go beyond one "one-shot" waveform and one "repeat" waveform per octave. 2. Standard Data and Property Chunks ------------------------------------ FORM 8SVX stores all the waveform data in one body chunk "BODY". It stores playback parameters in the required header chunk "VHDR" and any optional property chunks "NAME","(c) ", and "AUTH" must all appear before the BODY chunk. Any of these properties may be shared over a LIST of FORMs 8SVX by putting them in a PROP 8SVX. - Background There are two ways to use FORM 8SVX: as a one-shot sampled sound or as a sampled musical instrument that plays "notes" (as for a MOD or SMUS format). Storing both kinds of sounds in the same kind of FORM makes it easy to play a one-shot as an instrument or vice-versa. A one-short sound is a series of audio data samples with a nominal playback rate and amplitude. The recipient program can optionally adjust or modulate the amplitude and playback data rate. For musical instruments, the idea is to store a sampled (or pre-synthesized) waveform that will be parameterized by pitch, duration, and amplitude to play each "note". The creator of the FORM 8SVX can supply a waveform per octave over a range of octaves for this purpose. The intent is to perform a pitch by selecting the closest octave's waveform and scaling the playback data rate. An optional "one-shot" waveform supplies an arbitrary startup transient, then a "repeat" waveform is iterated as long as necessary to sustain the note. A FORM 8SVX can also store an envelope to modulate the waveform. Envelopes are mostly useful for variable-duration notes but could be used for one-shot sounds too. The FORM 8SVX standard has some restrictions. For example, each octave of data must be twice as long as the next higher octave. Most sound driver software and hardware imposes additional restrictions. E.g. the Amiga sound hardware requires an even number of samples in each one-shot and repeat waveform. - Required Property VHDR The required property "VHDR" holds a Voice8Header structure as defined in these C declarations and following documentation. This structure holds the playback parameters for the sampled waveforms in the BODY chunk (see below): #define ID_8SVX MakeID('8','S','V','X') #define ID_VHDR MakeID('V','H','D','R') /* A fixed-point value, 16 bits to the left of the point and 16 to the right. A Fixed is a number of 2^16-ths, i.e. 65536ths */ typedef LONG Fixed; #define Unity 0x10000L /* Unity = Fixed 1.0 = maximum volume */ /sCompression: Choice of compression algorithm applied to the samples */ #define sCmpNone 0 /* not compressed */ #define sCmpFibDelta 1 /* Fibonacci-Delta encoding */ typedef struct { ULONG oneShotHiSamples, /* # samples in the high octave 1-shot part */ repeatHiSamples, /* # samples in the high octave repeat part */ samplesPerHiCycle; /* # samples/cycle in high octave, else 0 */ UWORD samplesPerSec; /* data sampling rate */ UBYTE ctOctave, /* # octaves of waveforms */ sCompression; /* data compression technique used */ Fixed Volume; /* playback volume from 0 to Unity */ } Voice8Header; A FORM 8SVX holds waveform data for one or more octaves, each containing a one-shot part and a repeat part. The fields 'oneShotHiSamples' and 'repeatHiSamples' tell the number of audio samples in the two parts of the highest frequency octave. Each successive (lower frequency) octave contains twice as many data samples in both its one-shot and repeat parts. One of these two parts can be empty across all octaves. The field 'samplesPerHiCycle' tells the number of samples/cycle in the highest frequency octave of data, or else 0 for "unknown". Each successive octave contains twice as many samples/cycle. This field is needed to compute the data rate for a desired playback pitch. Actually, 'samplesPerHiCycle' is an average number of samples/cycle. If the one-shot part contains pitch bends, store the samples/cycle of the repeat part in 'samplesPerHiCycle'. The division 'repeatHiSamples'/'samplerPerHiCycle' should yield an integer number of cycles. The field 'samplesPerSec' gives the sound sampling rate. A program may adjust this to achieve frequency shifts or vary it dynamically to achieve pitch bends and vibrato. The field 'ctOctave' tells how many octaves of data are stored in the BODY chunk. The field 'sCompression' indicates the compression scheme, if any, that was applied to the entire set of data samples stored in the BODY chunk. Note that the whole series of data samples is compressed as a unit. The field 'volume' gives an overall playback volume for the waveforms (all octaves). It lets the 8-bit data samples use the full range -128 through 127 for good signal-to-noise ratio. The playback program should multiply this value by a "volume control" and perhaps by a playback envelope. - Optional Text Chunks NAME,(C) ,AUTH,ANNO Several text chunks may be included in a FORM 8SVX to keep ancillary information. The optional property "NAME" names the voice (or instrument), for instance "tubular bells". The optional property "(c) " holds a copyright notice for the voice. The Chunk ID "(c) " serves as the copyright characters. The chunk types "NAME","(c) ", and "AUTH" are property chunks. Putting more than one NAME (or other) property in a FORM is redundant. A property should be shorter than 256 characters. The optional data chunk "ANNO" holds any text annotations typed in by the author. An ANNO chunk is not a property chunk, so you can put more than one in a FORM 8SVX. You can make it any length up to 2^31 - 1 characters. Syntactically, each of these chunks contains an array of 8-bit ASCII characters in the range " " (SP, hex 20) through "~" (tilde, hex 7F), just like a standard "TEXT" chunk. The chunk's 'ckSize' field holds the count of characters. #define ID_NAME MakeID('N','A','M','E') #define ID_Copyright MakeID('(','c',')',' ') #define ID_AUTH MakeID('A','U','T','H') #define ID_ANNO MakeID('A','N','N','O') Remember to store a zero-value pad byte after odd-length chunks. - Optional Data Chunks ATAK and RLSE The optional data chunks ATAK and RLSE together give a piecewise-linear "envelope" or "amplitude-contour". This contour may be used to modulate the sound during playback. It's especially useful for playing musical notes of variable durations. Playback programs may ignore the supplied envelope or substitute another. #define ID_ATAK MakeID('A','T','A','K') #define ID_RLSE MakeID('R','L','S','E') typedef struct { UWORD duration; /* segment duration in milliseconds, > 0 */ FIXED dest /* destination volume factor */ } EGPoint; ATAK and RLSE chunks contain an EGPoint Array, piecewise-linear envelope. The envelope defines a function of time returning Fixed volume. It's used to scale the nominal volume specified in the Voice8Header. To explain the meaning of these chunks, we'll overview the envelope generation algorithm. Start at 0 volume, step through the ATAK contour, then hold at the sustain level (the last ATAK EGPoint's dest), and then step through the RLSE contour. Begin the release at the desired note stop time minus the total duration of the release contour. Remember to multiply the envelope function by the nominal voice header volume and by any desired note volume. Note: The number of EGPoints in either an ATAK or RLSE chunk is ckSize/sizeof(EGPoint). - Data Chunk BODY The BODY chunk contains the audio data samples. #define ID_BODY MakeID('B','O','D','Y') typedef character BYTE; /* 8bit signed number */ The BODY contains data samples grouped by octave. Within each octave are one-shot and/or repeat portions. In general, the BODY has 'ctOctave' octaves of data. The highest frequency octave comes first, comprising the fewest samples as given by 'oneShotHiSampels'+'repeatHiSamples'. Each successive octave contains twice as many samples as the previous octave. The number of samples in the BODY chunk is ((2^0) + (2^1) + ... + (2^(ctOctave-1))) * (oneShotHiSamples + repeatHiSamples). To avoid playback 'clicks', the beginning and end of the one-shot portion should be at about the same level. 3. FORM 8SVX File Format Layout ------------------------------- FORM 4-byte ID size 4-byte Size (rest of file after next 4-bytes) 8SVX 4-byte ID VHDR 4-byte ID = Voice8Header size 4-byte sizeof(Voice8Header) data sizeof(Voice8Header) bytes of data . . . Any other optional chunks (ATAK,RLSE,NAME,ANNO,etc.) BODY 4-byte ID = Sampled Data size 4-byte data size data size(BODY) bytes of data all odd-length chunks are zero-padded