ICQ Chat File Specification v1.3 -------------------------------- Last updated: March 14, 2006 Written by: Remy Lebeau Lebeau Software Web: http://www.lebeausoftware.org Email: info@lebeausoftware.org -------------------------------- ********* DISCLAIMER ********* Permission to copy and distribute this document is granted so long as the title, author's name, URL, and this disclaimer are retained. Any additions or modifications made to the original content should be clearly marked as such, and submitted to the author for inclusion in future official releases of this specification by the author. Contributors will be credited. In no event shall Lebeau Software or any individuals listed in this document be held liable for any use of the information contained herein. The information contained in this specification was gathered through observation of ICQ .cht files ("chat files"), and may not be entirely accurate or complete. No files were decompiled or disassembled in order to gather this information. Use it at your own risk. This specification is in no way sponsored, supported, or condoned by ICQ Inc., Mirabilis, or AOL, and is the sole work of Lebeau Software and credited individuals. ICQ is Copyright ©1998-2002 ICQ Inc. All Rights Reserved. ****************************** Foreword ================================================================================== The following specification outlines the binary format of the ICQ chat file (.cht) format. I wrote this spec while I in the process of creating my own player application for playing ICQ chat files. ICQ's own built-in player is quite limited in the chats it's able to play back, so I wanted to make a new player that could play back just about any chat, particularly those chats that ICQ refuses to play itself. The main requirement ICQ has for being able to play a chat is that the chat file must be listed in the user's database files. Without those entries, ICQ's player won't play a chat, even a valid one. And ICQ has no options for importing chats from file, or for playing back chats from other accounts. Meaning, user 1234567 can only play back a chat that was recorded by user 1234567, and usually only on the same machine it was originally recorded on. I wanted to designed my player to work past all of these limitations. Data Types ================================================================================== All data types mentioned in this specification are based on C/C++ data types: BYTE = 8-bit unsigned char range: (hex: 0x00 to 0xFF) (decimal: 0 to 255) WORD = 16-bit unsigned short range: (hex: 0x0000 to 0xFFFF) (decimal: 0 to 65,535) DWORD = 32-bit unsigned int range: (hex: 0x00000000 to 0xFFFFFFFF) (decimal: 0 to 4,294,967,295) Basic Data Structures ================================================================================== ========================== == LV (Length-Value) ========================== data type description --------- ----------- WORD # of bytes of following data n BYTEs raw data ========================== == Color ========================== data type description --------- ----------- BYTE red value BYTE green value BYTE blue value BYTE reserved ========================== == String ========================== All strings are terminated with BYTE 0x00. Whenever an LV is used to describe a String, the size value of the LV includes the terminating BYTE 0x00. Therefore, a value of 0 for the size of the LV would mean that no data is present for the string at all. Extended Data Structures ================================================================================== The following structures begin with the following header: data type description --------- ----------- BYTE structure type DWORD size of data, in bytes The structure type is indicated in parenthesis in the following definitions. NOTE - If the major version (?) in the Chat Header is 6 (or higher?), I have noticed that the reported size is sometimes 1 byte more than, or 1 byte less than, the actual data size. I do not know why. ========================== == Foreground Color (0x00) == Background Color (0x01) ========================== data type description --------- ----------- Color foreground/background color ========================== == Font Name (0x10) ========================== data type description --------- ----------- LV name of font face WORD ? ========================== == Font Style (0x11) ========================== data type description --------- ----------- DWORD font style bits The style is one or more of the following values OR'd together: 0x0001 - bold 0x0002 - italics 0x0004 - underline 0x0008 - strikeout ========================== == Font Size (0x12) ========================== data type description --------- ----------- WORD font size WORD ? The font size is in logical points. To convert the value to logical units, use this formula: height = -( (size * logical pixels per inch) / 72 ) ========================== == Unknown (0x13) ========================== This structure is currently not defined. I do not know what it is for, but it does not appear to be vital to playback of the chat. The data size is always 0. Chat Header ================================================================================== Most chat files begin with a header containing information about the users involved in the chat, and the duration of the overall chat. data type description --------- ----------- DWORD major version DWORD minor version DWORD number of users in the chat LV users LV duration LV date The users is a comma-delimited string, with each user formatted as "UIN:nickname" The duration is a string in "hh:nn:ss" notation, as follows: hh = hours, with leading 0 as needed (00 - 23) nn = minutes, with leading 0 as needed (00 - 59) ss = seconds, with leading 0 as needed (00 - 59) The date is a string in "dddd, mmmm dd, yyyy" notation, as follows: dddd = unabbreviated day of the week (Monday - Sunday) mmmm = unabbreviated name of the month (January - December) dd = day of the month, with leading 0 as needed (01 - 31) yyyy = 4-digit year NOTE - I have noticed that extra hours are sometimes added to the duration. In other words, a duration of "16:30:00" might actually be only 30 minutes in length rather than 16 hours and 30 minutes. Sometimes the hous are accurate, though. I do not know why this happens. Multi-File Chats ================================================================================== Some chat files do not represent complete chat sessions. ICQ breaks up large chat sessions into multiple files. As such, some files are actually a continuation of a previous chat file. These files do not include a Chat Header at all. They begin with the Chat Stream immediately (see further below). Unfortunately, without access to ICQ's database entries regarding available chats, there is no information present in the chat file to identify which chat session a continuation file belongs to. The Chat Stream will include UIN events and User Entered events (see further below) near the beginning of the stream to specify which users are still active in the chat. Since there is no chat header present, the nicknames of the user will not be available, as well as the version of the chat file. Fortunately, the approximate major version of the chat file can be determined when the first UIN event (see futher below) is encountered, because that event has a different layout depending on the version of the chat file. If the data size DWORD is missing from the event's structure header (see further above), the major version can be assumed to be 5. If the data size DWORD is present, the major version can be assumed to be 6. Chat Stream ================================================================================== Everything after the Chat Header is the data stream. All characters should be handled/displayed as-is, except for those characters that have special meaning, as follows: 0x00 - beginning of a Stream Event (see further below for details) 0x0D - start a new line 0x08 - backspace - erase the last character/operation Extended ASCII Characters ================================================================================== Extended ASCII characters between 0x80 (128) and 0xFF (255) are usually (but not always) treated as Stream Events (see further below). When used as events, extended ASCII characters are preceeded with BYTE 0x00, and have the same structure header as defined for the Extended Data Structures (see further above). The structure type is the actual character value, and the data size is always 0. When not used as events, extended ASCII characters appear by themselves, with no preceeding BYTE 0x00, and no data size DWORD following the character BYTE. Stream Events ================================================================================== Within the Chat Stream are embedded structures that describe the various events that occur during the chat. Normally, all Stream Events are preceded by BYTE 0x00, to separate them from the rest of the stream. However, I have noticed that if the major version (?) in the Chat Header is 5 (or less?), the preceeding BYTE 0x00 is sometimes omitted! This seems to be the case whenever Stream Events are chained together one after the other. The first Stream Event in the chain will have the preceeding BYTE 0x00 to begin the sequence, and then each of the trailing events may or may not have the preceeding BYTE 0x00. On the other hand, if the major version (?) is 6 (or higher?), then every Stream Event is always preceded by BYTE 0x00 as expected. UIN Stream Events specify which user is the active user. All file data following an UIN event refers to the specified user until another UIN event is encountered. All of the Extended Structures that are outlined further above can appear in the Chat Stream as Stream Events. When they do, they indicate that a change has occured in that particular value for the active user, and as such contain the new value that should be applied from that point on. All stream events begin with the same structure header as defined for the Extended Data Structures (see further above). ========================== == New Line (0x0A) ========================== The active user has begun a new line of text. ========================== == User Left Chat (0x0B) ========================== The active user has left the chat. ========================== == Beep (0x07) ========================== Play a beep sound. ========================== == Backspace (0x08) ========================== Erase the last character of the active user's display. ========================== == Icon Name (0x14) ========================== data type description --------- ----------- LV name of icon to display Display the appropriate Emote, Smiley, or Action icon for the active user. In ICQ, this name cooresponds to an entry found in the Windows Registry, which contains the full file path to the bitmap file that should be displayed. This entry is stored in the following key (or a subkey) of the Registry: HKEY_CURRENT_USER\Software\Mirabilis\ICQ\DefaultPrefs\ChatBmps This event is usually (but not always) preceeded by an Emote (0x1B), Action (0x1C), or Smiley (0x1D) Stream Event (see further below). ========================== == Timing (0x19) ========================== data type description --------- ----------- DWORD offset, in seconds, from the beginning of the chat This event appears approx. every 10 seconds in the chat. This seems to be used as a means of being able to play back typing of characters at their original speeds from when the chat was being recorded. ========================== == LOL (0x1A) ========================== Play a laughing sound. ========================== == Emote (0x1B) ========================== Play an emote sound. ========================== == Action (0x1C) ========================== Play an action sound. ========================== == Smiley (0x1D) ========================== data type description --------- ----------- DWORD ? Play a smiley sound ========================== == User Entered Chat (0xEE) ========================== data type description --------- ----------- DWORD size of data, in bytes DWORD ? WORD ? 1 BYTE 0x00 DWORD ? DWORD ? BYTE 0x00 Color font color WORD ? 34 BYTEs name of font face The user's UIN is specified be a preceeding UIN event. This event is always 64 BYTEs total. The font face is a String. If the String does not take up the full 34 BYTEs, the rest of the field will contain random data to ensure 34 BYTEs. Just ignore the unused bytes if any exist. The WORD preceeding the font face is not a valid length for the string, thus they are not a LV combo. This is the only place in the chat where a string is not described with an LV. I do not know why it was set up this way. ========================== == UIN (0xFF) ========================== data type description --------- ----------- DWORD UIN of the now-active user If the major version (?) of the Chat Header is 5 (or less?), the data size DWORD is not present in the structure header. The UIN value immediately follows the structure type, and is always 4 bytes. If the major version (?) of the Chat Header is 6 (or higher?), the data size DWORD is always present in the structure header. Change History ================================================================================== ========================== == v1.3: March 14, 2006 ========================== The bytes that were previously thought to be alignment padding in the various structures have been identified as being the actual data size of each structure. The first DWORD of the "User Entered Chat" event has been identified as as extra data size for the event. A new section added describing multi-file chats in more detail. ========================== == v1.1: November 13, 2003 ========================== The starting LV of the Chat Header has been shown to not be an LV afterall. For the moment, it has been separated into two DWORDs, possibly used for major/minor version values. This is still unconfirmed, but seems to be more logical. Additional observations regarding how Stream Events are identified in older versions of chat files. There is a change in how UIN Stream Events are handled in older versions. ========================== == v1.0: January 4, 2002 ========================== Initial Release Acknowledgements ================================================================================== I'd like to thank the following people for their contributions to this project: Hernan Gonzalez (hgonzal@sinectis.com.ar) for informing me about the fixed 64 BYTE size for the "User Entered Chat" event. Tarmo Pikaro (tapika@yahoo.com) for providing ICQ99a chat files for study. ~¤ÐåBðñd0ð7~ (UIN 8259083), Black Moons (UIN 16440811) and Victor Anderson (UIN 14847018) for participating with me in a 4-user chat session to study how multiple users are handled in a single chat session. Paolo Claudio (leopietro8@yahoo.com) for providing a chat file that disproved what I originally thought the first 8 bytes of the chat header stood for. ================================================================================== Copyright ©2001-2006 Lebeau Software. All Rights Reserved.