OEM Manual Adaption

3. Extended Character Support

Extended character support (ECS) allows for use of GW-BASIC with the Kanji, Hanguel, and Chinese character sets. Double-byte characters occupy two character positions, and therefore the cursor and line wrap functions in the full screen editor differ slightly from those of some other editors. Normally, the cursor is between the current cursor marker and one position to the left. With the full screen editor, the cursor is to the left of the entire character, whether the cursor marker is on the first or second position occupied by the character. This may effect the insert and delete modes slightly.

In addition, if you try to type a double-byte character at the last position on a physical line, the entire character is wrapped and a blank is placed at the last position on the line. The blank is not included in the logical line, however.

In addition to the above differences are the following changes to the GW-BASIC language.


3.1 Statement and Function Additions

This section documents additional statements and functions provided in addition to the standard GW-BASIC language to support national language character sets and other extended character capabalities.


CDBL$ Function

Purpose:

Converts single-byte ASCII characters in the string x$ to their double-byte equivalents.

Syntax:

CDBL$(x$)

Comments:

This function converts every single-byte ASCII character in x$ that is in the set 0-9, a-z, A-Z, plus (+), and minus (-) to its double-byte equivalent character.
The resulting string is one byte longer for every character that is converted.

If the result is longer than 255 bytes, a "String too long" error results.

Example:

a represents a single-byte ASCII character that has no equivalent double-byte character
k represents a single-byte ASCII character that has an equivalent double-byte character
K represents a double-byte character

PRINT CDBL$(akakKaa)

    Output:

    aKaKKaa

Extended Character Support:

This function is supported only in double-byte versions of GW-BASIC.



CSNG$ Function

Purpose:

Convert double-byte characters in stringexpression to their single-byte ASCII equivalents.

Syntax:

CSNG$(stringexpression)

Comments:

This function converts every double-byte character in stringexpression that is in the set 0-9, a-z, A-Z, plus (+), and minus (-) to its single-byte ASCII equivalent character.
The resulting string is one byte shorter for every character that is converted.

Example:

a represents a single-byte ASCII character
k represents a double-byte ASCII character that has an equivalent single-byte ASCII character
K represents a double-byte character that has no equivalent single-byte ASCII character

PRINT CSNG$(aAaAKaa)

    prints the following:

    aaaaKaa

Extended Character Support:

Only double-byte versions of GW-BASIC support this function.



JIS$ Function

Purpose:

Converts the first character of the string expression to JIS representation

Syntax:

JIS$(x$)

Comments:

The JIS$ function is used only with versions of GW-BASIC that provide double-byte character support.

If the first character of the string expression is a double-byte character, JIS$ returns a four-byte string containing the ASCII digits of the JIS code for the character.

If the first character of the string expression is not a double-byte character, JIS$ returns a three-byte string containing the ASCII digits of the ASCII code for the character.

Example:

The following line returns a string containing the four ASCII digits 215F:

10 JIS$(CHR$(&H81)+CHR$(&7E))

The following line returns a string containing the three ASCII digits 041:

10 JIS$("A")

Extended Character Support:

This function is supported only in double-byte versions.



KLEN Function

Purpose:

Returns the number of characters in the specified string expression

Syntax:

KLEN(x$)

Comments:

KLEN works exactly like LEN, except that it counts characters instead of bytes. Note that LEN(x$) minus KLEN(x$) is equal to the number of double-byte characters in x$.

Extended Character Support:

This function is supported only in double-byte versions of GW-BASIC.



KPOS Function

Purpose:

Returns the byte number of character number in the string expression x$.

Syntax:

KPOS(x$,character number)

Comments:

If the string expression contains fewer than character number characters, KPOS returns 0.

If the last byte of the string expression is the first byte of a double-byte character (in this case, the second byte is missing), that byte is ignored.

Example:

The variable x$ contains a string

   'aKaKKaa'

where

   'a'

represents an ASCII character and

   'K'

is a two-byte MSKDC double-byte character equal to:

   KPOS (x$, 1) = 1

(true for all x$ except the null string)

   KPOS (x$, 2) = 2
   KPOS (x$, 3) = 4
   KPOS (x$, 4) = 5
   KPOS (x$, 5) = 7
   KPOS (x$, 6) = 9
   KPOS (x$, 7) = 10
   KPOS (x$, 8) = 0

KPOS (x$, 8) = 0 since the string contains only 7 characters.

Extended Character Support:

Only double-byte versions of GW-BASIC support this function.



KTN$ Function

Purpose:

Converts the first character of the specified string expression from MSKDC to KUTEN representation

Syntax:

KTN$(string)

Comments:

If the first character of the string expression is a Kanji character, KTN$ returns a four-byte string containing the ASCII digits of the KUTEN code for the character.
For example:

   KTN$("Kanji graphic for &K0132")

   & = special Kanji character

returns a string containing the four ASCII digits 0132.

If the first character of the string expression is not a Kanji character, KTN$ returns a three-byte string containing the ASCII digits of the ASCII code of the character.
For example:

   KTN$("A")

returns a string containing the three ASCII digits 065.


Extended Character Support:

Only double-byte versions of GW-BASIC support this function.


3.2 Statement and Function Differences

The following pages describe language differences in the GW-BASIC language arising out of implementation of extended character support.


ASC Function


For implementations that support double-byte characters, the ASC function behaves as follows: If the first character of the string is an MSKDC double-byte character, ASC returns the MSCDC representation of the character. If the first character of the string is not a double-byte character, returns the code of the first byte of the string, just as it does in nondouble-byte implementations of GW-BASIC. Note that double-byte characters are always two bytes long, so a string of length one will always be regarded as a nondouble-byte character, regardless of the code contained in it.

For example,

ASC("ABC")

returns 65 (the ASCII code for A), since the first character of the string is not a double-byte character. Similarly,

ASC(CHR$(&H21) + CHR$(&H5F))

returns 318, the MSCDC representation of the double-byte character whose MSKDC representation is hexadecimal 215F.



CHR$ Function


For implementations that support double-byte characters, the syntax for this function is:

CHR$(n [,n...])

You may use multiple arguments. If you specify an argument greater than 255, the MSCKC character is mapped to a double-byte MSKDC character. For example, CHR$(318) returns a double-byte string containing the hexadecimal codes 81 and 7E.



DATA Statement


DATA statements support double-byte characters.



INKEY$ Function


If double-byte characters are supported by the implementation, INKEY$ returns two-byte string containing the MSKDC representation for double-byte characters. For non-double-byte characters, INKEY$ returns a one-byte string containing ASCII code.



INPUT Statement


In implementations that support double-byte characters, INPUT variables can contain double-byte characters.



INPUT# Statement

Syntax:

INPUT# filenumber, variable 1 [, variable 2]...

Purpose:

Reads data items from a sequential device or file and assigns them to program variables.

Comments:

The filenumber is the number used when the file is opened for input. The variable list contains the variable names that will be assigned to the items in the file. (The variable type must match the type specified by the variable name). Unlike INPUT, INPUT # does not print a question mark.

The data items in the file should appear just as they would if you were entering data in response to an INPUT statement. With numeric values, leading spaces, carriage returns, and linefeeds are ignored. The first character encountered that is not a space, carriage return, or linefeed is assumed to be the start of a number. The number terminates on a space, carriage return, linefeed, or comma.

If GW-BASIC is scanning the sequential data file for a string item, it will also ignore leading spaces, carriage returns, and linefeeds. The first character encountered that is not a space, carriage return, or linefeed is assumed to be the start of a string item. If this first character is a quotation mark ("), the string item will consist of all characters read between the first quotation mark and the second. This means a quoted string may not contain a quotation mark as a character. If the first character of the string is not a quotation mark, the string is an unquoted string, and will terminate on a comma, carriage return, or linefeed (or after 255 characters have been read). If end-of-file is reached when a numeric or string item is being INPUT, the item is terminated.

Example:

INPUT 2, A, B, C

Extended Character Support:

In implementations that support double-byte characters, the list of variables can contain double-byte characters.



INPUT$ Function

For implementations that support double-byte characters, you can use the following syntax:

INPUTyen-sign (number of characters, [#] filenumber)

This function performs the same action as INPUT$ except that INPUTyen-sign counts double-byte MSKDC double-byte characters as single characters. Therefore, the string that is returned will contain the specified number of characters but will contain more bytes if double-byte charcters are encountered in the data. If the result is longer than 255 bytes, a 'String too long' error results. A good way to avoid this error is to make number-of-characters less than or equal to 127.



PRINT USING Statement

For implementations that support double-byte characters, some of the characters differ from those used on other versions of BASIC. The differences are:


Double-Byte Character Non-Double-Byte Character

& \
@ &
yen yen $$
* * yen * * $



SCREEN Function

This function returns the MSCKC code for the character at the specified position, if that character is a double-byte character.



STRING$ Function

For implementations that support double-byte characters, ASCII code n or the first character of x$ may specifiy a double-byte character. In this case, the string that is returned is twice as long as that returned for non-double-byte characters. For example,

STRING$(5, 318)
returns a ten-byte string containing five copies of the double-byte character represented by MSCDC 318.



VAL Function

For implementations that support double-byte characters, the VAL string argument may contain double-byte character codes.