WORKSHEET FILE FORMAT FROM LOTUS APPENDIX B - THE FORMULA COMPILER Copyright(c) 1984, Lotus Development Corporation 161 First Street Cambridge, Massachusetts 02142 (617) 492-7171 Electronic Edition, December, 1984 All Rights Reserved APPENDIX B: The Formula Compiler This appendix describes the internal workings of the formula compiler. The compiler transforms an ASCII string of characters representing a formula to its Reverse Polish code. The basic algorithm utilizes and SR parser (SR = shift and reduce). The aim of the parser is to apply a set of reduction rules which embody the syntax of the compiler to an input string. Formula code is compiled to a temporary buffer. Lexicon Analysis A lexical analyzer breaks up the input string into lexical units called tokens. A token is a substring of the original input string operand, operator, or special symbol (such as comma, parentheses, etc.) In addition, the lexical analyser supplies two special tokens, "beginning of formula" (boform) and "end of formula" (eoform), to facilitate the compilation process. The lexical analyzer identifies and processes literals (both number and string), cell and range references, operators, and function calls. It assigns a unique code to each distinct operator, function, or type of operand. A function with no arguments is treated like a number. Syntax Analysis The syntactical analysis of a formula is accomplished by processing a list of tokens in left-to-right order. A stack called the syntax is also used during the syntactical scan. The basic algorithm is as follows: Repeat the following steps: 1) Get the next token 2) If the token is a literal or cell reference: a) Push the number code on the syntax stack b) Push the number code on the syntax stack 3) If the token is a range reference: a) Compile code to push the range reference b) Push the range code on the syntax stack 4) Otherwise push the token code for the token on the syntax stack. For each syntax rule, if the pattern on the top of the syntax matches the rule pattern take the action associated with the rule and start scanning from the beginning for any additional rules which may apply. When a token code is pushed on the syntax stack, an additional word of zeros is also pushed on the stack. This is used when compiling function calls to hold the function's argument count. Rule Matching A relatively small number of rules are used to process formulas of arbitrary complexity. If a rule matches the top of the syntax stack, then the compiler takes a specific action and rule scanning starts again with the first rule. Each rule matches certain patterns on the syntax stack. A typical rule might be: if the top of the stack is the token for right parenthesis, and the next-to-top is a number, and the second form the top is a left parenthesis, then pop the top three items from the syntax stack and push the number on the syntax stack. This rule can be more succinctly represented as: Stack Before After Action ) number ( number none The Rules The following are the syntax rules used to process formulas. Note that the order of the rules is important. The rules for compilation of operators used additional tables which assign a precedence number and opcode to each legal unary and binary operator. Thus, for example, there is a single token code for minus sign (-), but there are two opcodes one for unary minus and one for binary minus. In addition, these two operators, while lexically identical, also have different precedence. In general, operators of higher precedence will be performed before operators of lower precedence are performed left-to-right. All special operators (boform, eoform, parentheses, comma, etc.) are implicitly assigned a precedence of zero. Rule 1 Termination test Stack Before After Action eoform Output a return code to compile buffer number Return, indicating successful compile boform Rule 2 Function argument processing Stack Before After Action ' Error if range argument illegal for number or range function. ( ( Increment argument count on stack function function Rule 3 Process final function argument Stack Before After Action ) Error if range argument illegal for number or range function. ( Increment argument count on stack function number Compile function opcode If list function, compile argument count; otherwise error is wrong argument count. Rule 4 Parenthesis removal Stack Before After Action ) Compile parenthesis opcode number ( number operator operator Rule 5 Binary operators Stack Before After Action op2 If binary op na 3 < = na 3 > = na 3 < na 3 > na 3 #and# na 1 #or# na 1 #not# 2 na Example: Using the above rules, we can now see how a particular formula is compiled. Let us consider the following formula: 3+5*6 This is broken up by the lexical analyzer into seven tokens. boform 3 + 5 * 6 eoform The syntax scans proceed as follows until a matching rule is found: Stack boform number + number boform number + boform number boform Compile buffer push 3 push 3 push 3 push 5 At this point, rule 5 is invoked, but since the precedence of boform is zero, no action is taken. Stack * number number * + number number + boform number boform Compile buffer push 3 push 3 push 5 push 5 push 6 At this point, since the binary precedence of + is lower than the binary precedence of *, rule 5 does apply, and the opcode for * is compiled. The stack is reduced by replacing number * number by number and scan is made, but no further rule applies. Stack number eoform + number number + boform number boform Compile buffer push 3 push 3 push 5 push 5 push 6 push 6 Rule 5 applies again, and the opcode for + is compiled, reducing the stack to boform, number, eoform. Rescanning finds a match on rule 1 which compiles a return opcode and terminates. The final compiled code is thus: push 3 push 5 push 6 * + return A Note on the Decompiler The algorithm for the formula decompiler was taken verbatim from: Writing Interactive Compilers and Interpreters, P.J. Brown, John Wiley and Sons, 1979. See chapter 6.2. The algorithm itself is described on pages 216 and 217. This algorithm is also described in the following article. More on the Re-creation of Source Code from Reverse Polish, P.J. Brown, Software Practice and Experience, Vol 7, 545-551 (1977). WORKSHEET COLUMN DESIGNATORS Most records within the 1-2-3 Condensed Worksheet format are specified with column/row designators (for example, column 0, row 0 equals A1). When determining the column designator, the table below will help make conversion easier. Column Hex Dec Column Hex Dec Column Hex Dec A 0 1 BA 34 52 DA 68 104 B 1 1 BB 35 53 DB 69 105 C 2 2 BC 36 54 DC 6A 106 D 3 3 BD 37 55 DD 6B 107 E 4 4 BE 38 56 DE 6C 108 F 5 5 BF 39 57 DF 6D 109 G 6 6 BG 3A 58 DG 6E 110 H 7 7 BH 3B 59 DH 6F 111 I 8 8 BI 3C 60 DI 70 112 J 9 9 BJ 3D 61 DJ 71 113 K A 10 BK 3E 62 DK 72 114 L B 11 BL 3F 63 DL 73 115 M C 12 BM 40 64 DM 74 116 N D 13 BN 41 65 DN 75 117 O E 14 BO 42 66 DO 76 118 P F 15 BP 43 67 DP 77 119 Q 10 16 BQ 44 68 DQ 78 120 R 11 17 BR 45 69 DR 79 121 S 12 18 BS 46 70 DS 7A 122 T 13 19 BT 47 71 DT 7B 123 U 14 20 BU 48 72 DU 7C 124 V 15 21 BV 49 73 DV 7D 125 W 16 22 BW 4A 74 DW 7E 126 X 17 23 BX 4B 75 DX 7F 127 Y 18 24 BY 4C 76 DY 80 128 Z 19 25 BZ 4D 77 DZ 81 129 AA 1A 26 CA 4E 78 EA 82 130 AB 1B 27 CB 4F 79 EB 83 131 AC 1C 28 CC 50 80 EC 84 132 AD 1D 29 CD 51 81 ED 85 133 AE 1E 30 CE 52 82 EE 86 134 AF 1F 31 CF 53 83 EF 87 135 AG 20 32 CG 54 84 EG 88 136 AH 21 33 CH 55 85 EH 89 137 AI 22 34 CI 56 86 EI 8A 138 AJ 23 35 CJ 57 87 EJ 8B 139 AK 24 36 CK 58 88 EK 8C 140 AL 25 37 CL 59 89 EL 8D 141 AM 26 38 CM 5A 90 EM 8E 142 AN 27 39 CN 5B 91 EN 8F 143 AO 28 40 CO 5C 92 EO 90 144 AP 29 41 CP 5D 93 EP 91 145 AQ 2A 42 CQ 5E 94 EQ 92 146 AR 2B 43 CR 5F 95 ER 93 147 AS 2C 44 CS 60 96 ES 94 148 AT 2D 45 CT 61 97 ET 95 149 AU 2E 46 CU 62 98 EU 96 150 AV 2F 47 CV 63 99 EV 97 151 AW 30 48 CW 64 100 EW 98 152 AX 31 49 CX 65 101 EX 99 153 AY 32 50 CY 66 102 EY 9A 154 AZ 33 51 CZ 67 103 EZ 9B 155 (CONTINUED) Column Hex Dec Column Hex Dec FA 9C 156 HA DO 208 FB 9D 157 HB D1 209 FC 9E 158 HC D2 210 FD 9F 159 HD D3 211 FE AO 160 HE D4 212 FF A1 161 HF D5 213 FG A2 162 HG D6 214 FH A3 163 HH D7 215 FI A4 164 HI D8 216 FJ A5 165 HJ D9 217 FK A6 166 HK DA 218 FL A7 167 HL DB 219 FM A8 168 HM DC 220 FN A9 169 HN DD 221 FO AA 170 HO DE 222 FP AB 171 HP DF 223 FQ AC 172 HQ EO 224 FR AD 173 HR E1 225 FS AE 174 HS E2 226 FT AF 175 HT E3 227 FU BO 176 HU E4 228 FV B1 177 HV E5 229 FW B2 178 HW E6 230 FX B3 179 HX E7 231 FY B4 180 HY E8 232 FZ B5 181 HZ E9 233 GA B6 182 IA EA 234 GB B7 183 IB EB 235 GC B8 184 IC EC 236 GD B9 185 ID ED 237 GE BA 186 IE EE 238 GF BB 187 IF EF 239 GG BC 188 IG FO 240 GH BD 189 IH F1 241 GI BE 190 II F2 242 GJ BF 191 IJ F3 243 GK CO 192 IK F4 244 GL C1 193 IL F5 245 GM C2 195 IM F6 246 GN C3 195 IN F7 247 GO C4 196 IO F8 248 GP C5 197 IP F9 249 GQ C6 198 IQ FA 250 GR C7 199 IR FB 251 GS C8 200 IS FC 252 GT C9 201 IT FD 253 GU CA 202 IU FE 254 GV CB 203 IV FF 255 GW CC 204 GX CD 205 GY CE 206 GZ CF 207 ANALYSIS OF 1-2-3 WORKSHEET FILE The worksheet shown below was created in 1-2-3 and saved to disk. Key: A2..A5 Named Range (code 11) EXAMPLE A2: Label (code 15) 100 A3: Integer (code 13) 12.5 A4: Number (code 14) 87.5 A5: Formula (+A3-A4) (code 16) The example shown below is a partial hex dump of this worksheet file. By reading each record header, you can determine the type of record you are encountering. The record header will also tell you the length of that follows the header. By analyzing the record header, you can read the records you want and skip unrelated records. 362B:0100 06 00 08 00 00 00 00 00 00 00 362B:0110 04 00 2F 00 01 00 01 02 00 01 00 FF 03 00 01 00 362B:0120 00 04 00 01 00 00 05 00 01 00 FF 07 00 1F 00 00 362B:0130 00 01 00 71 00 09 00 08 00 14 00 00 00 00 00 00 362B:0140 00 00 00 00 00 00 00 04 00 04 00 48 00 00 0B 00 362B:0150 18 00 54 45 53 54 00 00 00 00 00 00 00 00 00 00 362B:0160 00 00 00 00 01 00 00 00 04 00 18 00 19 00 00 FF 362B:0170 FF 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF 362B:0180 362B:05C0 362B:05D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 362B:05E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 362B:05F0 00 00 00 00 71 71 01 00 0F 00 0E 00 FF 00 00 01 362B:0600 00 27 45 58 41 4D 50 4C 45 00 0D 00 07 00 FF 00 362B:0610 00 02 00 64 00 362B:0620 10 00 1B 00 FF 00 00 04 00 00 362B:0630 00 00 00 00 E0 55 40 0C 00 01 00 80 FE BF 01 00 362B:0640 80 FF BF 0A 03