BASIC Variable Format

From Bitchin100 DocGarden
Jump to: navigation, search

Per Ron Wiesen, from a discussion on the Bitchin100 Mailing List:

Variable Format -  (RRW)
For M100:
The format of BASIC variables as they appear in the variable table is
shown here.  Where BASIC encounters a variable name, it searches the
variable table for the variable name of the appropriate variable type:
numeric double-precision, numeric single-precision, numeric integer,
or string.  Where the name is not found, BASIC expands the table and
appends the variable at the end of the variable table (i.e, defines the
previously undefined variable).  Where the name is found, the variable
content follows in the case of numeric variable types and a descriptor
of content follows in the case of the string variable type.
Each varible defined within the variable table consumes a 3-byte header
and from 2 to 8 more bytes depending on the variable type.  The header
contains a "Type" byte and a "Name" in 2-character form.  The value of
"Type" also serves as a skip-count (relative to first byte of content)
to the next (if any) header.
                     |     DP    |    SP    |  Integer |  String   |
                     |   Format  |  Format  |  Format  |  Format   |
 Byte  0   Type      |     8     |    4     |    2     |    3      |
 Byte  1   Name, 1st character                                     |
 Byte  2   Name, 2nd character - NUL for 1-character names         |
 Byte  3<-<(VARPTR)  |   S & E   |  S & E   |   LSB    |    Len    |
 Byte  4             |   BCD M   |  BCD M   |   MSB    | Addr Low  |
 Byte  5             |   BCD #   |  BCD #   |    -     | Addr High |
 Byte  6             |   BCD #   |  BCD L   |    -     |     -     |
 Byte  7             |   BCD #   |    -     |    -     |     -     |
 Byte  8             |   BCD #   |    -     |    -     |     -     |
 Byte  9             |   BCD #   |    -     |    -     |     -     |
 Byte 10             |   BCD L   |    -     |    -     |     -     |
  LSB =  Least significant byte of intger.
  MSB =  Most significant byte of integer.  Bit 7 contains the sign
         of the integer.
BCD L =  Least significant BCD byte, contains least significant pair
         of 4-bit BCD digits.
BCD H =  Most significant BCD byte, contains most significant pair of
         4-bit BCD digits.
BCD # =  Middle BCD bytes.  Each digit of the number is represented
         by one of the 4-bit values in the two nibbles in each byte.
S & E =  Sign and exponent of each number.  Bit 7 contains the sign
         of the floating point number.  Bits 0-5  determine where the
         decimal point is to be inserted.  For example, if this byte
         contained a 65, the sign would be positive (value 64) and the
         decimal point would be placed after the 1st digit (value 1),
         and before the second digit (#.############# in DP Format or
         #.##### is SP Format).  The purpose of bit 6 is unknown, but
         it may be a marker for the "currently selected" variable.
 Addr =  Address to string content.  This can be to a string constant
         within a BASIC program statement (e.g., V$ = "constant") or
         it can be within the string area BASIC (e.g., V$ = SPACE$(2).
  Len =  Length of string.  The LEN(var$) function returns this value.

                           - Array Format -  (RRW)
For M100:  
SV$(1,2,3) is an example of a subscripted variable name of the string
type [the name is SV, type identifier $ means string].  Element (1,2,3)
belongs to a set of variables which are organized into a 3-dimensional
array [three indices separated by commas].  Where BASIC encounters a
subscripted variable name, it searches the array table for the array
name and type: numeric double-precision, numeric single-precision,
numeric integer, or string.
Where the name and type combination is not found, BASIC checks if all
the indices are less than 11.  BASIC issues a Bad Subscript (BS) error
where any index exceeds 10.  Where all the indices are less than 11,
BASIC expands the array table by appending an array variable structure
that allocates a maximum index of 10 for every dimension and then BASIC
assigns the variable content of the particular element of the array
(i.e, defines the previously undefined variable).  The statement
SV$(1,2,3)="hello" for example appends an 11 by 11 by 11 array (1331
elements) to the array table and then defines the content of element
(1,2,3) as the string "hello".
Where the name and type combination is found, BASIC compares the index
to the maximum index for each dimension.  BASIC issues a Bad Subscript
(BS) error where any index exceeds its maximum.  Where all subscripts
are within range, BASIC uses the subscripts to locate the variable
content of a particular element.
The list of all elements are located at the end of an array structure.
The order of the elements is an interleave arrangement with ascending
indice/dimension.  The format for the content (in the case of numeric
types) or for the descriptor of content (in the case of the string
type) is the same as for the simple unsubscripted variables (See
variable table shown in prior email message "Re: [M100] Strange
- behavior of VARPTR(FN$)").
The format of BASIC arrays as they appear in the array table is shown
here.  The array table contains an array header and appended list of
array variable content for each array that is dimensioned.  The array
header contains 6 bytes plus 2 bytes for each dimension in descending
dimensional order.  The format for an array header is shown below.
 Byte  0   Type: 8=double-precision, 4=single-precision, 2=integer, 3=string
 Byte  1   Name, 1st character
 Byte  2   Name, 2nd character - NUL for 1-character names
 Byte  3   Length (LSB) of array relative to Byte 5
 Byte  4   Length (MSB) of array relative to Byte 5
 Byte  5   Number of dimensions
 Byte  6   Maximum index (LSB) for last dimension
 Byte  7   Maximum index (MSB) for last dimension
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
  .                      (LSB)     next dimension
  .                      (MSB)     next dimension
  .                      (LSB)      1st dimension
  .                      (MSB)      1st dimension
The list of array elements appends the array header.  The number of
bytes in the list depends upon the maximum index for each dimension and
the value of "Type" which indicates the number of bytes for each
element.  For example, the DIM SV$(1,2,3) statement defines a
3-dimension array of the string type with a maximum index of: 2 for the
first dimension, 3 for the second dimension, and 4 for the last
dimension.  The 72-byte list of the SV$ array has 24 elements (2 x 3 x
4) where each string element is a 3-byte descriptor (Len, Addr Low, and
Addr High) of content.  The ascending indice order of the elements
within this list is indicated below.
Byte00 SV$(0,0,0)  Byte18 SV$(0,0,1)  Byte36 SV$(0,0,2)  Byte54 SV$(0,0,3)
Byte03 SV$(1,0,0)  Byte21 SV$(1,0,1)  Byte39 SV$(1,0,2)  Byte57 SV$(1,0,3)
Byte06 SV$(0,1,0)  Byte24 SV$(0,1,1)  Byte42 SV$(0,1,2)  Byte60 SV$(0,1,3)
Byte09 SV$(1,1,0)  Byte27 SV$(1,1,1)  Byte45 SV$(1,1,2)  Byte63 SV$(1,1,3)
Byte12 SV$(0,2,0)  Byte30 SV$(0,2,1)  Byte48 SV$(0,2,2)  Byte66 SV$(0,2,3)
Byte15 SV$(1,2,0)  Byte33 SV$(1,2,1)  Byte51 SV$(1,2,2)  Byte69 SV$(1,2,3)
As regards File Control Blocks (FCBs), I refer to them as File Buffers (FBs).  And I make a distinction between: the always allocated "Inner File Buffer", versus the "Outer File Buffer(s) which are allocated via the MAXFILES command.  Please refer to the URL listed below and see its first illustration.  Note the upper division of RAM (seen at bottom of illustration) where the Inner File Buffer is located in RAM just above the $tring Space while the Outer File Buffers, if any are allocated, append the Inner File Buffer and adjoin the Point of Himem.
[ ]
See below for related information, and especially for granular detail about the 265-cell internal structure of a File Buffer.  Bear in mind that for every allocated File Buffer (which includes the Inner File Buffer), 2 additional cells of RAM are consumed in the upper division of RAM, consequently, each File Buffer exhibits a total RAM consumption of 267 cells.
           - RAM Dynamic Areas & Related Pointers -  (RRW)
For M100:
The operating system within the standard ROM manages the dynamics of RAM.
The entire RAM is divided into several areas. The MAXRAM protected area
is bound to the highest RAM address, which is address FFFFH. Peppered
throughout the MAXRAM protected area are address pointers that relate to
the other areas of RAM. The memory boundaries and sizes of the areas of
RAM are dynamic, that is, the boundaries move and the sizes change as a
consequence of operation. Thus the related pointers change, but always
reflect congruence of the areas within RAM.
RAM is partitioned into three regions. In order by ascending address,
these regions are:
  an "upward" growth region, bound to the lowest equipped RAM address.
  an "unused" region shrinks due to growth in the other regions.
  a "downward" growth region, bound to the highest RAM address of FFFFH.
Below is a RAM map which identifies each area, its related pointer or
algorithm for memory address, and its description. The map lists areas
in ascending address order: the various areas within the upward growth
region, the start and end boundaries of the unused region, and the
various areas within the downward growth region. The left side of the
map lists pointers and algorithms where the following sybology appears.
[     ]   Means the 16-bit word content of whatever appears between the
(     )   Means the 8-bit byte content of whatever appears between the
     H    Means preceeding value is expressed in hexidecimal notation.
          Values that aren't followed by this symbol are expressed in
          decimal notation.
 + - * =  Are conventional math operators which appear in algorithms.
Symbols __________ and --  --  -- denote, respectively, the congruence
of adjoined areas, and subordinate areas or conditional parts of an
[FAC0H]-> Start of .BA area. Also is lowest equipped RAM address which
          is determined by a cold start: 8000H for a 32K RAM, A000H for
          a 24K RAM. Noname.BA always exists at end of this area.
         --  --  --
[F99AH]-> Noname.BA which is the so-called "unsaved" BASIC program.  Also
          designated as Suzuki. Minimum size 2 bytes - the zero doublet
          which is an end of program marker.
[FBAEH]-> Start of .DO area which follows the end of Noname.BA. Noname.DO
          always exists at end of this area.
         --  --  --
[F9A5H]-> Noname.DO which is the so-called "PASTE" buffer.  Also designated
          as Hayashi. Minimum size 1 byte - the EOF which is end of file
[FBB0H]-> Start of .CO area which follows the end of Noname.DO.  Minimum
          size 0 bytes. Each .CO file in this area has a 6-byte header
          where the 2nd and 3rd bytes are the "Len" of the code image
          that appends the header. So the end of a .CO has no specific
          mark - the Len in the header indicates file length.
[FBB2H]-> Variable Table, see Variable Format. Minimum size is 0 bytes.
[FBB4H]-> Array Table, see Array Format. Minimum size is 0 bytes.
[FBB6H]-> Start of system unused RAM. The start boundary of this area 
          increases (size reduces) due to growth in the areas below it.
[F678H]-> End of system unused RAM. The end boundary of this area
          decreases (size reduces) due to growth in the areas above it.
  +1     __________
  = ----> Start of BASIC string area. Size is 256 bytes at invocation
          of BASIC, thereafter size is per CLEAR ss statement (ss is
          the String area Size argument).
[FC83H]-> File Buffer descriptor pointer list. Size is per MAXFILES=mn
          statement, 2 bytes per File Buffer (mn is the Maximum Number
          argument).  Minimum size is 2 bytes, minimum set for MAXFILES=0
          which assigns File Buffer #0. Maximum size is 32 bytes,
          maximum set for MAXFILES=15 which assigns File Buffer #0
          through File Buffer #15 inclusive.
          The algorithm [FC83H] + 2*(FC82H) is the address of File
          Buffer #0.  Note that byte (FC82H) is the quantity MAXFILES.
          Each 2-byte address within the File Buffer descriptor pointer
          table is the address of a respective File Buffer.  The address
  +       of a particular File Buffer is also expressed by the algorithm
  2       addr of fbn = [FC83H] + 2*(FC82H) + 265*fbn where fbn is the
  *       particular file buffer's number.
(FC82H)  __________
  = ----> File Buffer #0. Size is fixed at 265 bytes: a 9-byte header
          for device and file status, followed by a 256-byte buffer for
          transfer of device/file data.  File Buffer #0 is always
          assigned. Where other File Buffers are assigned, they have the
          same 265-byte structure and directly follow File Buffer #0
          (ascending address in ascending order by file buffer number).
         --  --  --
  +265 -> File Buffer #1. If assigned, size is 265 bytes.
  +265 ->  "     "    #..
  +265 -> File Buffer #15.
[F5F4H]-> HIMEM protected area. Size is per CLEAR statement.  Minimum
          size is 0 bytes, minimum set for CLEAR ss,MAXRAM (ss is the
          String area Size argument that isn't relevent but must preceed
          the relevent argument).
 Note 1   MAXRAM protected area. Size is 2576 bytes after a cold start
          and ranges from a lower start boundary of F5F0H to highest
          RAM address of FFFFH which is the fixed end boundary.
1.  There is no pointer or simple algorithm for the address of this start
    boundary.  The hook for the MAXRAM function is relevent.  Certain
    software (e.g., FLOPPY by Tandy) reside in an expanded MAXRAM area,
    which consequently lowers the address locale of the HIMEM area, for
    protection from overwrite by operations that are HIMEM dependent and
    not MAXRAM dependent.  For example the RUNM"MYPROG.CO" statement copies
    image code from file MYPROG.CO (which lies in the .CO area) into
    memory according to items within its file header: begin at "Top"
    address, copy to the extent of "Len" bytes. Where "Top" equals or
    exceeds HIMEM, copy occurs and may extend into the MAXRAM area.
    The typical installation process sets the MAXRAM function hook as well
    as several hooks for vector into the software so that some or all the
    HIMEM dependent operations are also made MAXRAM dependent.  Operations
    proceed to overwrite within the MAXRAM area on condition that the
    portion(s) of it where the software resides be avoided.  Otherwise the
    software provides an abort of the operation so no overwrite occurs.
    The typical removal process restores the hooks to their cold start
    state for vector into the standard ROM where the operations are not
    MAXRAM dependent (may extend overwrite into the MAXRAM area).
   - File Descriptor Block (Address Given by VARPTR(#file)) Format -
      0 - File status (0-not open, 1-open for input, 2 open for
          output or append)
  2 & 3 - Address of file directory entry +1 (RRW)
      4 - File device (248-RAM, 249-MoDeM, 250-LinePrinTer,
          251-WAND, 252-COM, 253-CASsette, 254-CRT, 255-LCD)
            0 to 8 undefined for an open buffer (Err=50 IE)(RRW)
            9 to 255 legitimate devices for an open buffer (RRW)
      6 - Offset from buffer start (see bytes 9 to 264) for start of next
  7 & 8 - Relative position of next 256 byte block from
          beginning of file
  9-264 - 256 byte buffer for data transfer (RRW)