BASIC Variable Format
From Bitchin100 DocGarden
Jump to navigationJump to search
Per Ron Wiesen, from a discussion on the Bitchin100 Mailing List:
Variable Format - (RRW) For M100: The format of BASIC variables as they appear in the variable table is shown here. Where BASIC encounters a variable name, it searches the variable table for the variable name of the appropriate variable type: numeric double-precision, numeric single-precision, numeric integer, or string. Where the name is not found, BASIC expands the table and appends the variable at the end of the variable table (i.e, defines the previously undefined variable). Where the name is found, the variable content follows in the case of numeric variable types and a descriptor of content follows in the case of the string variable type. Each varible defined within the variable table consumes a 3-byte header and from 2 to 8 more bytes depending on the variable type. The header contains a "Type" byte and a "Name" in 2-character form. The value of "Type" also serves as a skip-count (relative to first byte of content) to the next (if any) header. +-----------+----------+----------+-----------+ | DP | SP | Integer | String | | Format | Format | Format | Format | ---------------------+-----------+----------+----------+-----------+ Byte 0 Type | 8 | 4 | 2 | 3 | Byte 1 Name, 1st character | Byte 2 Name, 2nd character - NUL for 1-character names | ---------------------+-----------+----------+----------+-----------+ Byte 3<-<(VARPTR) | S & E | S & E | LSB | Len | Byte 4 | BCD M | BCD M | MSB | Addr Low | Byte 5 | BCD # | BCD # | - | Addr High | Byte 6 | BCD # | BCD L | - | - | Byte 7 | BCD # | - | - | - | Byte 8 | BCD # | - | - | - | Byte 9 | BCD # | - | - | - | Byte 10 | BCD L | - | - | - | ---------------------+-----------+----------+----------+-----------+ LSB = Least significant byte of intger. MSB = Most significant byte of integer. Bit 7 contains the sign of the integer. BCD L = Least significant BCD byte, contains least significant pair of 4-bit BCD digits. BCD H = Most significant BCD byte, contains most significant pair of 4-bit BCD digits. BCD # = Middle BCD bytes. Each digit of the number is represented by one of the 4-bit values in the two nibbles in each byte. S & E = Sign and exponent of each number. Bit 7 contains the sign of the floating point number. Bits 0-5 determine where the decimal point is to be inserted. For example, if this byte contained a 65, the sign would be positive (value 64) and the decimal point would be placed after the 1st digit (value 1), and before the second digit (#.############# in DP Format or #.##### is SP Format). The purpose of bit 6 is unknown, but it may be a marker for the "currently selected" variable. Addr = Address to string content. This can be to a string constant within a BASIC program statement (e.g., V$ = "constant") or it can be within the string area BASIC (e.g., V$ = SPACE$(2). Len = Length of string. The LEN(var$) function returns this value. - Array Format - (RRW) For M100: SV$(1,2,3) is an example of a subscripted variable name of the string type [the name is SV, type identifier $ means string]. Element (1,2,3) belongs to a set of variables which are organized into a 3-dimensional array [three indices separated by commas]. Where BASIC encounters a subscripted variable name, it searches the array table for the array name and type: numeric double-precision, numeric single-precision, numeric integer, or string. Where the name and type combination is not found, BASIC checks if all the indices are less than 11. BASIC issues a Bad Subscript (BS) error where any index exceeds 10. Where all the indices are less than 11, BASIC expands the array table by appending an array variable structure that allocates a maximum index of 10 for every dimension and then BASIC assigns the variable content of the particular element of the array (i.e, defines the previously undefined variable). The statement SV$(1,2,3)="hello" for example appends an 11 by 11 by 11 array (1331 elements) to the array table and then defines the content of element (1,2,3) as the string "hello". Where the name and type combination is found, BASIC compares the index to the maximum index for each dimension. BASIC issues a Bad Subscript (BS) error where any index exceeds its maximum. Where all subscripts are within range, BASIC uses the subscripts to locate the variable content of a particular element. The list of all elements are located at the end of an array structure. The order of the elements is an interleave arrangement with ascending indice/dimension. The format for the content (in the case of numeric types) or for the descriptor of content (in the case of the string type) is the same as for the simple unsubscripted variables (See variable table shown in prior email message "Re: [M100] Strange - behavior of VARPTR(FN$)"). The format of BASIC arrays as they appear in the array table is shown here. The array table contains an array header and appended list of array variable content for each array that is dimensioned. The array header contains 6 bytes plus 2 bytes for each dimension in descending dimensional order. The format for an array header is shown below. -------------------------------------------------------------------- Byte 0 Type: 8=double-precision, 4=single-precision, 2=integer, 3=string Byte 1 Name, 1st character Byte 2 Name, 2nd character - NUL for 1-character names Byte 3 Length (LSB) of array relative to Byte 5 Byte 4 Length (MSB) of array relative to Byte 5 Byte 5 Number of dimensions -------------------------------------------------------------------- Byte 6 Maximum index (LSB) for last dimension Byte 7 Maximum index (MSB) for last dimension -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- . (LSB) next dimension . (MSB) next dimension . (LSB) 1st dimension . (MSB) 1st dimension -------------------------------------------------------------------- The list of array elements appends the array header. The number of bytes in the list depends upon the maximum index for each dimension and the value of "Type" which indicates the number of bytes for each element. For example, the DIM SV$(1,2,3) statement defines a 3-dimension array of the string type with a maximum index of: 2 for the first dimension, 3 for the second dimension, and 4 for the last dimension. The 72-byte list of the SV$ array has 24 elements (2 x 3 x 4) where each string element is a 3-byte descriptor (Len, Addr Low, and Addr High) of content. The ascending indice order of the elements within this list is indicated below. Byte00 SV$(0,0,0) Byte18 SV$(0,0,1) Byte36 SV$(0,0,2) Byte54 SV$(0,0,3) Byte03 SV$(1,0,0) Byte21 SV$(1,0,1) Byte39 SV$(1,0,2) Byte57 SV$(1,0,3) Byte06 SV$(0,1,0) Byte24 SV$(0,1,1) Byte42 SV$(0,1,2) Byte60 SV$(0,1,3) Byte09 SV$(1,1,0) Byte27 SV$(1,1,1) Byte45 SV$(1,1,2) Byte63 SV$(1,1,3) Byte12 SV$(0,2,0) Byte30 SV$(0,2,1) Byte48 SV$(0,2,2) Byte66 SV$(0,2,3) Byte15 SV$(1,2,0) Byte33 SV$(1,2,1) Byte51 SV$(1,2,2) Byte69 SV$(1,2,3) As regards File Control Blocks (FCBs), I refer to them as File Buffers (FBs). And I make a distinction between: the always allocated "Inner File Buffer", versus the "Outer File Buffer(s) which are allocated via the MAXFILES command. Please refer to the URL listed below and see its first illustration. Note the upper division of RAM (seen at bottom of illustration) where the Inner File Buffer is located in RAM just above the $tring Space while the Outer File Buffers, if any are allocated, append the Inner File Buffer and adjoin the Point of Himem. [ http://www.club100.org/library/doc/ramabout.html ] See below for related information, and especially for granular detail about the 265-cell internal structure of a File Buffer. Bear in mind that for every allocated File Buffer (which includes the Inner File Buffer), 2 additional cells of RAM are consumed in the upper division of RAM, consequently, each File Buffer exhibits a total RAM consumption of 267 cells. - RAM Dynamic Areas & Related Pointers - (RRW) For M100: The operating system within the standard ROM manages the dynamics of RAM. The entire RAM is divided into several areas. The MAXRAM protected area is bound to the highest RAM address, which is address FFFFH. Peppered throughout the MAXRAM protected area are address pointers that relate to the other areas of RAM. The memory boundaries and sizes of the areas of RAM are dynamic, that is, the boundaries move and the sizes change as a consequence of operation. Thus the related pointers change, but always reflect congruence of the areas within RAM. RAM is partitioned into three regions. In order by ascending address, these regions are: an "upward" growth region, bound to the lowest equipped RAM address. an "unused" region shrinks due to growth in the other regions. a "downward" growth region, bound to the highest RAM address of FFFFH. Below is a RAM map which identifies each area, its related pointer or algorithm for memory address, and its description. The map lists areas in ascending address order: the various areas within the upward growth region, the start and end boundaries of the unused region, and the various areas within the downward growth region. The left side of the map lists pointers and algorithms where the following sybology appears. [ ] Means the 16-bit word content of whatever appears between the brackets. ( ) Means the 8-bit byte content of whatever appears between the parenthesis. H Means preceeding value is expressed in hexidecimal notation. Values that aren't followed by this symbol are expressed in decimal notation. + - * = Are conventional math operators which appear in algorithms. Symbols __________ and -- -- -- denote, respectively, the congruence of adjoined areas, and subordinate areas or conditional parts of an area. __________ [FAC0H]-> Start of .BA area. Also is lowest equipped RAM address which is determined by a cold start: 8000H for a 32K RAM, A000H for a 24K RAM. Noname.BA always exists at end of this area. -- -- -- [F99AH]-> Noname.BA which is the so-called "unsaved" BASIC program. Also designated as Suzuki. Minimum size 2 bytes - the zero doublet which is an end of program marker. __________ [FBAEH]-> Start of .DO area which follows the end of Noname.BA. Noname.DO always exists at end of this area. -- -- -- [F9A5H]-> Noname.DO which is the so-called "PASTE" buffer. Also designated as Hayashi. Minimum size 1 byte - the EOF which is end of file marker. __________ [FBB0H]-> Start of .CO area which follows the end of Noname.DO. Minimum size 0 bytes. Each .CO file in this area has a 6-byte header where the 2nd and 3rd bytes are the "Len" of the code image that appends the header. So the end of a .CO has no specific mark - the Len in the header indicates file length. __________ [FBB2H]-> Variable Table, see Variable Format. Minimum size is 0 bytes. __________ [FBB4H]-> Array Table, see Array Format. Minimum size is 0 bytes. __________ [FBB6H]-> Start of system unused RAM. The start boundary of this area increases (size reduces) due to growth in the areas below it. [F678H]-> End of system unused RAM. The end boundary of this area decreases (size reduces) due to growth in the areas above it. +1 __________ = ----> Start of BASIC string area. Size is 256 bytes at invocation of BASIC, thereafter size is per CLEAR ss statement (ss is the String area Size argument). __________ [FC83H]-> File Buffer descriptor pointer list. Size is per MAXFILES=mn statement, 2 bytes per File Buffer (mn is the Maximum Number argument). Minimum size is 2 bytes, minimum set for MAXFILES=0 which assigns File Buffer #0. Maximum size is 32 bytes, maximum set for MAXFILES=15 which assigns File Buffer #0 through File Buffer #15 inclusive. The algorithm [FC83H] + 2*(FC82H) is the address of File Buffer #0. Note that byte (FC82H) is the quantity MAXFILES. Each 2-byte address within the File Buffer descriptor pointer table is the address of a respective File Buffer. The address + of a particular File Buffer is also expressed by the algorithm 2 addr of fbn = [FC83H] + 2*(FC82H) + 265*fbn where fbn is the * particular file buffer's number. (FC82H) __________ = ----> File Buffer #0. Size is fixed at 265 bytes: a 9-byte header for device and file status, followed by a 256-byte buffer for transfer of device/file data. File Buffer #0 is always assigned. Where other File Buffers are assigned, they have the same 265-byte structure and directly follow File Buffer #0 (ascending address in ascending order by file buffer number). -- -- -- +265 -> File Buffer #1. If assigned, size is 265 bytes. +265 -> " " #.. +265 -> File Buffer #15. __________ [F5F4H]-> HIMEM protected area. Size is per CLEAR statement. Minimum size is 0 bytes, minimum set for CLEAR ss,MAXRAM (ss is the String area Size argument that isn't relevent but must preceed the relevent argument). __________ Note 1 MAXRAM protected area. Size is 2576 bytes after a cold start and ranges from a lower start boundary of F5F0H to highest RAM address of FFFFH which is the fixed end boundary. end=FFFFH__________ Notes: 1. There is no pointer or simple algorithm for the address of this start boundary. The hook for the MAXRAM function is relevent. Certain software (e.g., FLOPPY by Tandy) reside in an expanded MAXRAM area, which consequently lowers the address locale of the HIMEM area, for protection from overwrite by operations that are HIMEM dependent and not MAXRAM dependent. For example the RUNM"MYPROG.CO" statement copies image code from file MYPROG.CO (which lies in the .CO area) into memory according to items within its file header: begin at "Top" address, copy to the extent of "Len" bytes. Where "Top" equals or exceeds HIMEM, copy occurs and may extend into the MAXRAM area. The typical installation process sets the MAXRAM function hook as well as several hooks for vector into the software so that some or all the HIMEM dependent operations are also made MAXRAM dependent. Operations proceed to overwrite within the MAXRAM area on condition that the portion(s) of it where the software resides be avoided. Otherwise the software provides an abort of the operation so no overwrite occurs. The typical removal process restores the hooks to their cold start state for vector into the standard ROM where the operations are not MAXRAM dependent (may extend overwrite into the MAXRAM area). - File Descriptor Block (Address Given by VARPTR(#file)) Format - Byte: 0 - File status (0-not open, 1-open for input, 2 open for output or append) 2 & 3 - Address of file directory entry +1 (RRW) 4 - File device (248-RAM, 249-MoDeM, 250-LinePrinTer, 251-WAND, 252-COM, 253-CASsette, 254-CRT, 255-LCD) 0 to 8 undefined for an open buffer (Err=50 IE)(RRW) 9 to 255 legitimate devices for an open buffer (RRW) 6 - Offset from buffer start (see bytes 9 to 264) for start of next record 7 & 8 - Relative position of next 256 byte block from beginning of file 9-264 - 256 byte buffer for data transfer (RRW)