=============================================================================
		    Maxis XA Audio File Format Description	    5-01-2002
=============================================================================

By Valery V. Anisimovsky (samael@avn.mccme.ru)

In this document I'll try to describe audio file format used in some Maxis
games, in particlular, SimCity3000 (perhaps, in some other Maxis games) for
music, speech and sfx files.

The files this document deals with have extension: .XA. Note that the
extension of audio files of this format may be different from that.

Throughout this document I use C-like notation.

All numbers in all structures described in this document are stored in files
using little-endian (Intel) byte order.

==================
1. XA File Header
==================

The XA file has the following header:

struct XAHeader
{
  char	szID[4];
  DWORD dwOutSize;
  WORD	wTag;
  WORD	wChannels;
  DWORD dwSampleRate;
  DWORD dwAvgByteRate;
  WORD	wAlign;
  WORD	wBits;
};

szID -- string ID, which is equal to "XAI\0" (sound/speech) or "XAJ\0" (music).

dwOutSize -- the output size of the audio stream stored in the file (in bytes).

wTag -- seems to be PCM waveformat tag (0x0001). This corresponds to the
(decompressed) output audio stream, of course.

wChannels -- number of channels for the file.

dwSampleRate -- sample rate for the file.

dwAvgByteRate -- average byte rate for the file (equal to (dwSampleRate)*(wAlign)).
Note that this also corresponds to the decompressed output audio stream.

wAlign -- the sample align value for the file (equal to (wBits/8)*(wChannels)).
Again, this corresponds to the decompressed output audio stream.

wBits -- resolution of the file (8 (8-bit), 16 (16-bit), etc.).

Note that the part of the header from (wTag) until (wBits) is really
WAVEFORMATEX structure (the contents of PCM .WAV fmt chunk).

================
2. XA File Data
================

Right after the XA header comes the compressed audio stream. The compression
algorithm used is EA ADPCM (see below). Music files in SimCity3000 are
stereo 22050 Hz 16-bit, and speech/sfx are mono 22050 Hz 16-bit.

====================================
3. EA ADPCM Decompression Algorithm
====================================

During the decompression four LONG variables must be maintained for stereo
stream: lCurSampleLeft, lCurSampleRight, lPrevSampleLeft, lPrevSampleRight
and two -- for mono stream: lCurSample, lPrevSample. At the beginning of the
audio stream you must initialize these variables to zeros.
Note that LONG here is signed.

The stream is divided into small blocks of 0x1E (stereo) or 0xF (mono) bytes.
You should process all blocks in their turn. Here's the code which
decompresses one stereo stream block.

BYTE  InputBuffer[InputBufferSize]; // buffer containing data for one block
BYTE  bInput;
DWORD i;
LONG  c1left,c2left,c1right,c2right,left,right;
BYTE  dleft,dright;

bInput=InputBuffer[0];
c1left=EATable[HINIBBLE(bInput)];   // predictor coeffs for left channel
c2left=EATable[HINIBBLE(bInput)+4];
dleft=LONIBBLE(bInput)+8;   // shift value for left channel

bInput=InputBuffer[1];
c1right=EATable[HINIBBLE(bInput)];  // predictor coeffs for right channel
c2right=EATable[HINIBBLE(bInput)+4];
dright=LONIBBLE(bInput)+8;  // shift value for right channel

for (i=2;i<0x1E;i+=2)
{
  left=HINIBBLE(InputBuffer[i]);  // HIGHER nibble for left channel
  left=(left<<0x1c)>>dleft;
  left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
  left=Clip16BitSample(left);
  lPrevSampleLeft=lCurSampleLeft;
  lCurSampleLeft=left;

  right=HINIBBLE(InputBuffer[i+1]); // HIGHER nibble for right channel
  right=(right<<0x1c)>>dright;
  right=(right+lCurSampleRight*c1right+lPrevSampleRight*c2right+0x80)>>8;
  right=Clip16BitSample(right);
  lPrevSampleRight=lCurSampleRight;
  lCurSampleRight=right;

  // Now we've got lCurSampleLeft and lCurSampleRight which form one stereo
  // sample and all is set for the next step...
  Output((SHORT)lCurSampleLeft,(SHORT)lCurSampleRight); // send the sample to output

  // now do just the same for LOWER nibbles...
  // note that nubbles for each channel are packed pairwise into one byte

  left=LONIBBLE(InputBuffer[i]);  // LOWER nibble for left channel
  left=(left<<0x1c)>>dleft;
  left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
  left=Clip16BitSample(left);
  lPrevSampleLeft=lCurSampleLeft;
  lCurSampleLeft=left;

  right=LONIBBLE(InputBuffer[i+1]); // LOWER nibble for right channel
  right=(right<<0x1c)>>dright;
  right=(right+lCurSampleRight*c1right+lPrevSampleRight*c2right+0x80)>>8;
  right=Clip16BitSample(right);
  lPrevSampleRight=lCurSampleRight;
  lCurSampleRight=right;

  // Now we've got lCurSampleLeft and lCurSampleRight which form one stereo
  // sample and all is set for the next step...
  Output((SHORT)lCurSampleLeft,(SHORT)lCurSampleRight); // send the sample to output
}

HINIBBLE and LONIBBLE are higher and lower 4-bit nibbles:
#define HINIBBLE(byte) ((byte) >> 4)
#define LONIBBLE(byte) ((byte) & 0x0F)
Note that depending on your compiler you may need to use additional nibble
separation in these defines, e.g. (((byte) >> 4) & 0x0F).

EATable is the table given in the next section of this document.

Output() is just a placeholder for any action you would like to perform for
decompressed sample value.

Clip16BitSample is quite evident:

LONG Clip16BitSample(LONG sample)
{
  if (sample>32767)
     return 32767;
  else if (sample<-32768)
     return (-32768);
  else
     return sample;
}

As to mono sound, it's just analoguous -- you should process the blocks each
being 0xF bytes long:

bInput=InputBuffer[0];
c1=EATable[HINIBBLE(bInput)];	// predictor coeffs
c2=EATable[HINIBBLE(bInput)+4];
d=LONIBBLE(bInput)+8;  // shift value

for (i=1;i<0xF;i++)
{
  left=HINIBBLE(InputBuffer[i]);  // HIGHER nibble for left channel
  left=(left<<0x1c)>>dleft;
  left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
  left=Clip16BitSample(left);
  lPrevSampleLeft=lCurSampleLeft;
  lCurSampleLeft=left;

  // Now we've got lCurSampleLeft which is one mono sample and all is set
  // for the next input nibble...
  Output((SHORT)lCurSampleLeft); // send the sample to output

  left=LONIBBLE(InputBuffer[i]);  // LOWER nibble for left channel
  left=(left<<0x1c)>>dleft;
  left=(left+lCurSampleLeft*c1left+lPrevSampleLeft*c2left+0x80)>>8;
  left=Clip16BitSample(left);
  lPrevSampleLeft=lCurSampleLeft;
  lCurSampleLeft=left;

  // Now we've got lCurSampleLeft which is one mono sample and all is set
  // for the next input byte...
  Output((SHORT)lCurSampleLeft); // send the sample to output
}

So, you should process HIGHER nibble of the input byte first and then
LOWER nibble for mono sound.

Of course, this decompression routine may be greatly optimized.

==================
4. EA ADPCM Table
==================

LONG EATable[]=
{
  0x00000000,
  0x000000F0,
  0x000001CC,
  0x00000188,
  0x00000000,
  0x00000000,
  0xFFFFFF30,
  0xFFFFFF24,
  0x00000000,
  0x00000001,
  0x00000003,
  0x00000004,
  0x00000007,
  0x00000008,
  0x0000000A,
  0x0000000B,
  0x00000000,
  0xFFFFFFFF,
  0xFFFFFFFD,
  0xFFFFFFFC
};

===========
5. Credits
===========

Dmitry Kirnocenskij (ejt@mail.ru)
Worked out EA ADPCM decompression algorithm.

Nicholas Sales (nicsales@mweb.co.za)
Provided me with SimCity3000 decoding stuff thereby inspired
me to decode the formats and write the plug-in for GAP.

-------------------------------------------

Valery V. Anisimovsky (samael@avn.mccme.ru)
http://bim.km.ru/gap/
http://www.anxsoft.newmail.ru
http://anx.da.ru
On these sites you can find my GAP program which can search for XA audio
files in game resources, extract them, convert them to WAV and play them back.
There's also complete source code of GAP and all its plug-ins there,
including XA plug-in, which could be used for further details on how you
can deal with this format.