Author Topic: Scanner challenge  (Read 5180 times)

Ed Davis

  • Guest
Scanner challenge
« on: February 18, 2016, 07:50:31 PM »
Problem statement:

Write a scanner, for the simple language Tiny, whose lexical structure is defined below.

Language elements:
                                     token-type
integers:    [0-9]+                  Integer
char literal 'x'                     Integer
identifiers: [_a-zA-Z][_a-zA-Z0-9]+  Variable

Notes: For char literals, '\n' is supported as a new line character.
To represent \, use: '\\'.

operators:

*          multiply                Mul
/          divide                  Div
+          plus                    Add
-          minus                   Sub
<          less than               Lss
>          greater than            Gtr
=          assign                  Assign


symbols:

(          left parenthesis        Lparen
)          right parenthesis       Rparen
{          left brace              Lbrace
}          right brace             Rbrace
;          semi colon              Semi


keywords:

if
while
putc

comments:    /* ... */   (multi-line)

Input to your program:

A text file containing the above symbols

Output:

A text file containing the following:

line 0 col 0 token-type [value n]

Where
  • line denotes the line number
  • col  denotes the column number
  • token-type is the token-type listed above
  • value is only printed when token-type is Integer or char literal or Variable or EOI. It is the value of Integer constants or char literals, or the offset of Variable.

Additionally, each unique identifier (that is not a keyword) should be noted, and the first one encountered should get an offset value of 0, the 2nd 1, the 3rd 2, and so on.

The EOI token should get the number of unique identifiers found, which will be the highest offset plus 1.

The scanner does not care about any syntactic issues, only whether the lexical structure of the program is correct.

Code: [Select]
{
/*
 Example 1
 */
    a = '!';
    putc(a);
}
Expected output:

line     1 col     1 Lbrace
line     5 col     5 Variable         0
line     5 col     7 Assign
line     5 col     9 Integer         33
line     5 col    12 Semi
line     6 col     5 Putc
line     6 col     9 Lparen
line     6 col    10 Variable         0
line     6 col    11 Rparen
line     6 col    12 Semi
line     7 col     1 Rbrace
line     7 col     3 EOI              1

Notes:
  • a is a variable, and as the first one, gets an offset of 0.
  • '!' is a char literal, and thus is an Integer, getting a value of 33 - the ascii value of '!'.
  • The EOI token has a value of 1, which is the number of unique identifiers/variables defined.

Code: [Select]
{
/*
 Example 2
 */
    var1 = 5;
    var2 = 10;
    var3 = var1 * var2;
}
Expected output:

line     1 col     1 Lbrace
line     5 col     5 Variable         0
line     5 col    10 Assign
line     5 col    12 Integer          5
line     5 col    13 Semi
line     6 col     5 Variable         1
line     6 col    10 Assign
line     6 col    12 Integer         10
line     6 col    14 Semi
line     7 col     5 Variable         2
line     7 col    10 Assign
line     7 col    12 Variable         0
line     7 col    17 Mul
line     7 col    19 Variable         1
line     7 col    23 Semi
line     8 col     1 Rbrace
line     8 col     3 EOI              3

Notes:
  • The EOI token has a value of 3, which is the number of unique identifiers/variables defined.

Code: [Select]
/*
 Example 3 - this makes no since syntactically, but is lexically
 correct
 */

if while putc + - * / } { ) (
Expected output:

line     6 col     1 If
line     6 col     4 While
line     6 col    10 Putc
line     6 col    15 Add
line     6 col    17 Sub
line     6 col    19 Mul
line     6 col    21 Div
line     6 col    23 Rbrace
line     6 col    25 Lbrace
line     6 col    27 Rparen
line     6 col    29 Lparen
line     7 col     2 EOI              0


A FreeBASIC version follows.  I would especially like to see versions in Pike, Python, Ruby, Perl, Lua, SmallBASIC, SpecBASIC, ScriptBASIC and any other language you can think of.

Code: [Select]
enum Token_type
    tk_eoi = 1
    tk_if
    tk_putc
    tk_while
    tk_lbrace
    tk_rbrace
    tk_lparen
    tk_rparen
    tk_uminus
    tk_mul
    tk_div
    tk_add
    tk_sub
    tk_lss
    tk_gtr
    tk_semi
    tk_assign
    tk_integer
    tk_variable
end enum

' where we store keywords and variables
type Symbol
    s_name as string
    tok as Token_type
    offset as integer
end type

dim shared symtab() as Symbol
dim shared tok_list(1 to tk_variable) as string

dim shared max_offset as integer
dim shared cur_line as string
dim shared cur_ch as string
dim shared line_num as integer
dim shared col_num as integer

function is_digit(byval ch as string) as long
    is_digit = (ch <> "") and Asc(ch) >= Asc("0") and Asc(ch) <= Asc("9")
end function

function is_alnum(byval ch as string) as long
    is_alnum = (ch <> "") and ((Asc(UCase$(ch)) >= Asc("A") and Asc(UCase$(ch)) <= Asc("Z")) or (is_digit(ch)))
end function

sub error_msg(byval eline as integer, byval ecol as integer, byval msg as string)
    print "("; eline; ":"; ecol; ")"; " "; msg
    system
end sub

' add an identifier to the symbol table
function install(byval s_name as string, byval tok as Token_type) as integer
    dim n as integer

    n = ubound(symtab)
    redim preserve symtab(n + 1)
    n = ubound(symtab)

    symtab(n).s_name = s_name
    symtab(n).tok    = tok
    if tok = tk_variable then
        symtab(n).offset = max_offset
        max_offset += 1
    end if
    return n
end function

' search for an identifier in the symbol table
function lookup(byval s_name as string) as integer
    dim i as integer

    for i = lbound(symtab) to ubound(symtab)
        if symtab(i).s_name = s_name then return i
    next
    return -1
end function

sub next_line()         ' read the next line of input from the source file
    cur_line = ""
    cur_ch  = ""        ' empty cur_ch means end-of-file
    if eof(1) then exit sub
    line input #1, cur_line
    cur_line = cur_line + chr$(10)
    line_num += + 1
    col_num = 1
end sub

sub next_char()         ' get the next char
    cur_ch = ""
    col_num += 1
    if col_num > len(cur_line) then next_line()
    if col_num <= len(cur_line) then cur_ch = mid$(cur_line, col_num, 1)
end sub

sub gettok(byref err_line as integer, byref err_col as integer, byref tok as Token_type, byref v as integer)
    restart:
    ' skip whitespace
    do
        if cur_ch = "" then exit do
        if cur_ch <> " " and cur_ch <> chr$(9) and cur_ch <> chr$(10) then exit do
        next_char()
    loop

    err_line = line_num
    err_col  = col_num

    select case cur_ch
        case "":  tok = tk_eoi: v = max_offset: exit sub
        case "+": tok = tk_add:    next_char(): exit sub
        case "-": tok = tk_sub:    next_char(): exit sub
        case "*": tok = tk_mul:    next_char(): exit sub
        case "(": tok = tk_lparen: next_char(): exit sub
        case ")": tok = tk_rparen: next_char(): exit sub
        case "{": tok = tk_lbrace: next_char(): exit sub
        case "}": tok = tk_rbrace: next_char(): exit sub
        case "<": tok = tk_lss:    next_char(): exit sub
        case ">": tok = tk_gtr:    next_char(): exit sub
        case ";": tok = tk_semi:   next_char(): exit sub
        case "=": tok = tk_assign: next_char(): exit sub
        case "/": ' div or comment
            next_char()
            if cur_ch <> "*" then
                tok = tk_div
                exit sub
            end if
            ' skip comments
            do
                next_char()
                if cur_ch = "*" or cur_ch = "" then
                    next_char()
                    if cur_ch = "/" or cur_ch = "" then
                        next_char()
                        goto restart  ' end of comment, start all over
                    end if
                end if
            loop
        case "'"    ' single char literals
            next_char()
            v = Asc(cur_ch)
            if cur_ch = "'" then error_msg(err_line, err_col, "empty character constant")
            if cur_ch = "\" then
                next_char()
                if cur_ch = "n" then
                    v = 10
                elseif cur_ch = "\" then
                    v = Asc("\")
                else error_msg(err_line, err_col, "unknown escape sequence: " + cur_ch)
                end if
            end if
            next_char()
            if cur_ch <> "'" then error_msg(err_line, err_col, "multi-character constant")
            next_char()
            tok = tk_integer
            exit sub
        case else   ' integers or identifiers
            dim s as string = ""
            dim is_number as boolean = is_digit(cur_ch)
            do while is_alnum(cur_ch) orelse cur_ch = "_"
                if not is_digit(cur_ch) then is_number = false
                s += cur_ch
                next_char()
            loop
            if len(s) = 0 then error_msg(err_line, err_col, "unknown character: " + cur_ch)
            if is_digit(mid(s, 1, 1)) then
                if not is_number then error_msg(err_line, err_col, "invalid number: " + s)
                v = val(s)
                tok = tk_integer
                exit sub
            end if
            dim index as integer
            index = lookup(s)
            if index = -1 then index = install(s, tk_variable)
            v = symtab(index).offset
            tok = symtab(index).tok
            exit sub
    end select
end sub

sub init_lex(byval filein as string)
    install("if",    tk_if)
    install("putc",  tk_putc)
    install("while", tk_while)

    tok_list( 1) = "EOI"
    tok_list( 2) = "If"
    tok_list( 3) = "Putc"
    tok_list( 4) = "While"
    tok_list( 5) = "Lbrace"
    tok_list( 6) = "Rbrace"
    tok_list( 7) = "Lparen"
    tok_list( 8) = "Rparen"
    tok_list( 9) = "Uminus"
    tok_list(10) = "Mul"
    tok_list(11) = "Div"
    tok_list(12) = "Add"
    tok_list(13) = "Sub"
    tok_list(14) = "Lss"
    tok_list(15) = "Gtr"
    tok_list(16) = "Semi"
    tok_list(17) = "Assign"
    tok_list(18) = "Integer"
    tok_list(19) = "Variable"

    open filein for input as #1

    max_offset = 0
    cur_line = ""
    line_num = 0
    col_num = 0
    next_char()
end sub

sub scanner()
    dim err_line as integer
    dim err_col as integer
    dim tok as Token_type
    dim v as integer

    do
        gettok(err_line, err_col, tok, v)
        print using "line ##### col ##### \       \"; err_line; err_col; tok_list(tok);
        if tok = tk_integer orelse tok = tk_variable orelse tok = tk_eoi then print using " ########"; v;
        print
    loop until tok = tk_eoi
end sub

sub main()
    dim filein as string

    filein = command$(1)
    if filein = "" then input "enter filein: ", filein
    if filein = "" then system

    init_lex(filein)
    scanner()
end sub

main()
system

I'm sure some things above may not be clear.  Please ask if something doesn't make sense.  And stay turned for part 2.  A Parser for Tiny.

jj2007

  • Guest
Re: Scanner challenge
« Reply #1 on: February 19, 2016, 03:30:15 AM »
Looks exciting, Ed  :)

If I just had more time...

In assembler, I would use a jump table for each char:
- see a quote? Jump to the tkQuotes proc, which scans until it finds the second one
- see *? Jump to tkMul...

etc - straightforward and fast. But I am sure you have tested that one already, and are aware of some other options ;-)

wang renxin

  • Guest
Re: Scanner challenge
« Reply #2 on: February 19, 2016, 05:00:23 AM »
Looking forward to read part 2, Ed.

ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #3 on: February 19, 2016, 07:22:13 AM »
Hi Ed,

If you find time, you might enjoy helping out with the C BASIC project. Please join the forum either way as your input would be very helpful.

John

P.S.

I'll try and give your scanner challenge a try in Script BASIC.


ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #4 on: February 20, 2016, 12:21:33 PM »
Ed,

I thought it might be interesting to show how Script BASIC tokenizes its scripts and allows them to be compiled to C as well.

Code: [Select]
' Example

var1 = 5
var2 = .5
var3 = "Five"

Code: [Select]
unsigned long ulGlobalVariables=3;
unsigned long ulNodeCounter=15;
unsigned long ulStartNode=5;
unsigned long ulStringTableSize=13;
unsigned char szCommandArray[] ={
0xDA, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0xDA, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xE0, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0xDA, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0C, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x0A, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0E, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0B, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00 };
char szStringTable[]={
0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x46, 0x69, 0x76, 0x65, 0x00,
0x00 };

Ed Davis

  • Guest
Re: Scanner challenge
« Reply #5 on: February 20, 2016, 01:44:52 PM »
I thought it might be interesting to show how Script BASIC tokenizes its scripts and allows them to be compiled to C as well.

What I would much rather see is a Script BASIC version of the very simple scanner I presented.  I think it is pretty much standard BASIC, and I would think it should readily translate to other BASIC's - BASIC's that are at least as powerful as the old Microsoft QBASIC from 1995 - which it pretty easily translates to, if FreeBASIC's language=QB can be trusted :-)

After perusing the code, this is all it uses:

    enums
    user defined type
    arrays
    integers
    functions
    subs
    file input
    file output or output to stdout
    assignment, selection and iteration statements

The only thing I see that isn't perhaps standard BASIC is the use of enum's.  But you can replace those with constants.

I don't even really know BASIC very well, but I was able to translate this from C to FreeBASIC in about 45 minutes.  And most of that time was spent figuring out file I/O, and how to re-dimension arrays, and if arrays are 0 or 1 based, and if strings are 0 or 1 based.

I'm glad to help if you hit a roadblock.  Just let me know where you are stuck, and I'll take a look.

Ed Davis

  • Guest
Re: Scanner challenge
« Reply #6 on: February 20, 2016, 06:35:18 PM »
So I created a Script BASIC version.  Not a bad language.  Needs much better syntax error messages, and it really shouldn't crash if it can't grok your program.  For instance, the following program crashes Scriba on my Win 7 machine:

Code: [Select]
sub main()
    filein = command$(1)
    if filein = "" then input "enter filein: ", filein
    if filein = "" then system
end sub

main()
system

Before the crash, it offers:


crash\lex3.bas(1): error &H77:syntax error during checking the line and also not
ing that the syntax error so serious that none of the other syntax defintions ca
n match the current line


Which is not very helpful :-)  Oh, and don't dare put comments on the same line as code!  That got me for a while, before I saw the part in the manual about not doing that :-)

Anyway, here it is.  Improvements welcomed!

Code: [Select]
declare option DeclareVars

const tk_eoi      =  1
const tk_if       =  2
const tk_putc     =  3
const tk_while    =  4
const tk_lbrace   =  5
const tk_rbrace   =  6
const tk_lparen   =  7
const tk_rparen   =  8
const tk_uminus   =  9
const tk_mul      = 10
const tk_div      = 11
const tk_add      = 12
const tk_sub      = 13
const tk_lss      = 14
const tk_gtr      = 15
const tk_semi     = 16
const tk_assign   = 17
const tk_integer  = 18
const tk_variable = 19

' no user defined types, so use these in pseudo associative arrays
const sym_name   = "name"
const sym_tok    = "tok"
const sym_offset = "offset"

global symtab, max_offset, cur_line, cur_ch, line_num, col_num

function is_digit(ch)
    is_digit = ch >= 0 and ch <= 9
end function

function is_alnum(ch)
    is_alnum = (ch <> "") and ((UCase(ch) >= "A" and UCase(ch) <= "Z") or (is_digit(ch)))
end function

sub error_msg(err_line, err_col, msg)
    print "(", err_line, ":", err_col, ") ", msg, "\n"
    end
end sub

sub install(s_name, tok)
    symtab{s_name, sym_name} = s_name
    symtab{s_name, sym_tok}  = tok
    if tok = tk_variable then
        symtab{s_name, sym_offset} = max_offset
        max_offset += 1
    end if
end sub

function lookup(s_name)
    lookup = undef
    if symtab{s_name, sym_name} = s_name then
        lookup = s_name
    end if
end function

sub next_line
    cur_line = ""
    cur_ch  = ""
    if eof(1) then exit sub
    line input #1, cur_line
    cur_line = cur_line
    line_num += + 1
    col_num = 1
end sub

sub next_char
    cur_ch = ""
    col_num += 1
    if col_num > len(cur_line) then
        next_line()
    end if
    if col_num <= len(cur_line) then
        cur_ch = mid(cur_line, col_num, 1)
    end if
end sub

sub gettok(err_line, err_col, tok, v)
    while (cur_ch = " " or cur_ch = chr(9) or cur_ch = chr(10)) and (cur_ch <> "")
        next_char()
    wend
    err_line = line_num
    err_col  = col_num

    if cur_ch = "" then
        tok = tk_eoi
        v   = max_offset
        exit sub
    elseif cur_ch = "+" then
        tok = tk_add
        next_char()
        exit sub
    elseif cur_ch = "-" then
        tok = tk_sub
        next_char()
        exit sub
    elseif cur_ch = "*" then
        tok = tk_mul
        next_char()
        exit sub
    elseif cur_ch = "(" then
        tok = tk_lparen
        next_char()
        exit sub
    elseif cur_ch = ")" then
        tok = tk_rparen
        next_char()
        exit sub
    elseif cur_ch = "{" then
        tok = tk_lbrace
        next_char()
        exit sub
    elseif cur_ch = "}" then
        tok = tk_rbrace
        next_char()
        exit sub
    elseif cur_ch = "<" then
        tok = tk_lss
        next_char()
        exit sub
    elseif cur_ch = ">" then
        tok = tk_gtr
        next_char()
        exit sub
    elseif cur_ch = ";" then
        tok = tk_semi
        next_char()
        exit sub
    elseif cur_ch = "=" then
        tok = tk_assign
        next_char()
        exit sub
    elseif cur_ch = "/" then
        next_char()
        if cur_ch <> "*" then
            tok = tk_div
            exit sub
        end if
        ' skip comments
        while true
            next_char()
            if cur_ch = "*" or cur_ch = "" then
                next_char()
                if cur_ch = "/" or cur_ch = "" then
                    next_char()
                    gettok(err_line, err_col, tok, v)
                    exit sub
                endif
            endif
        wend
    elseif cur_ch = "'" then
        next_char()
        v = Asc(cur_ch)
        if cur_ch = "'" then error_msg(err_line, err_col, "empty character constant")
        if cur_ch = "\\" then
            next_char()
            if cur_ch = "n" then
                v = 10
            elseif cur_ch = "\\" then
                v = Asc("\\")
            else
                error_msg(err_line, err_col, "unknown escape sequence: " & cur_ch)
            endif
        end if
        next_char()
        if cur_ch <> "'" then error_msg(err_line, err_col, "multi-character constant")
        next_char()
        tok = tk_integer
        exit sub
    else
    ' integers or identifiers
        local s, is_number
        s = ""
        is_number = is_digit(cur_ch)
        while is_alnum(cur_ch) or cur_ch = "_"
            if not is_digit(cur_ch) then is_number = false
            s &= cur_ch
            next_char()
        wend
        if len(s) = 0 then error_msg(err_line, err_col, "unknown character: " & cur_ch)
        if is_digit(mid(s, 1, 1)) then
            if not is_number then error_msg(err_line, err_col, "invalid number: " & s)
            v = val(s)
            tok = tk_integer
            exit sub
        end if
        if lookup(s) = undef then install(s, tk_variable)
        tok = symtab{s, sym_tok}
        v = symtab{s, sym_offset}
        exit sub
    endif
end sub

sub init_lex(filein)
    install("if", tk_if)
    install("while", tk_while)
    install("putc", tk_putc)

    max_offset = 0
    cur_line   = ""
    line_num   = 0
    col_num    = 0

    open filein for input as #1
    next_char()
end sub

sub scanner
    local err_line, err_col, tok, v, tok_list

    tok_list[tk_eoi     ] = "EOI"
    tok_list[tk_if      ] = "If"
    tok_list[tk_putc    ] = "Putc"
    tok_list[tk_while   ] = "While"
    tok_list[tk_lbrace  ] = "Lbrace"
    tok_list[tk_rbrace  ] = "Rbrace"
    tok_list[tk_lparen  ] = "Lparen"
    tok_list[tk_rparen  ] = "Rparen"
    tok_list[tk_uminus  ] = "Uminus"
    tok_list[tk_mul     ] = "Mul"
    tok_list[tk_div     ] = "Div"
    tok_list[tk_add     ] = "Add"
    tok_list[tk_sub     ] = "Sub"
    tok_list[tk_lss     ] = "Lss"
    tok_list[tk_gtr     ] = "Gtr"
    tok_list[tk_semi    ] = "Semi"
    tok_list[tk_assign  ] = "Assign"
    tok_list[tk_integer ] = "Integer"
    tok_list[tk_variable] = "Variable"

    do
        gettok(err_line, err_col, tok, v)
        ' no print using, how to format???
        print "line ", err_line, " col ", err_col, " ", tok_list[tok]
        if tok = tk_integer or tok = tk_variable or tok = tk_eoi then print " ", v
        print "\n"
    loop until tok = tk_eoi
end sub

sub main
    ' cannot put comments on same line as a command!!!
    if command() = "" then
        print "filename required"
        end
    end if
    init_lex(command())
    scanner()
end sub

main()


ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #7 on: February 20, 2016, 08:05:18 PM »
Thanks Ed for giving Script BASIC a try.

I agree that the docs need updating. English is only one of Peter's 1 of 12 languages he speaks but not his native Hungarian. Mark's (B+) challenge got me to try SWAP that I thought hadn't been implemented yet. POP is another undocumented keyword that will POP the GOSUB stack.

What version / OS of Script BASIC are you using?

If you put the error.bas in your Script BASIC include directory, errors should be more descriptive.

Code: [Select]
' This file was automatically generated by the program generrh.pl
' using the error code definition file errors.def from the
' ScriptBasic distribution
'
' This file is part of the ScriptBasic distribution and is
' specific to the actual build it was shipped. Do not use
' this file for any version or build of the ScriptBasic
' interpreter other than the one this file was shipped.
'
' THIS FILE IS FOR V1.0 BUILD 1
'
Global Const sbErrorOK = 0
Global Const sbErrorMemory = 1

Global Const sbErrorNoarray = 2
' Function can not return a whole array
' -------------------------------------

Global Const sbErrorDiv = 3
' Division by zero or other calculation error
' -------------------------------------------

Global Const sbErrorUndefop = 4
' Argument to operator is undefined
' ---------------------------------

Global Const sbErrorBadCall = 5
' The command or sub was called the wrong way
' -------------------------------------------

Global Const sbErrorFewArgs = 6
' There are not enough arguments of the module function.
' ------------------------------------------------------

Global Const sbErrorArgumentType = 7
' The argument passed to a module function is not the needed type.
' ----------------------------------------------------------------

Global Const sbErrorArgumentRange = 8
' The argument passed to a module function is out of the accepted range.
' ----------------------------------------------------------------------

Global Const sbErrorFileRead = 9
' The module experiences difficulties reading the file
' ----------------------------------------------------

Global Const sbErrorFileWrite = 10
' The module experiences difficulties writing the file.
' -----------------------------------------------------

Global Const sbErrorFile = 11
' The module experiences handling the file.
' -----------------------------------------

Global Const sbErrorCircular = 12
' There is a circular reference in memory.
' ----------------------------------------

Global Const sbErrorModuleNotLoaded = 13
' The module can not be unloaded, because it was not loaded.
' ----------------------------------------------------------

Global Const sbErrorPartialUnload = 14
' Some modules were active and could not be unloaded.
' ---------------------------------------------------

Global Const sbErrorModuleActive = 15
' The module can not be unloaded, because it is currently active.
' ---------------------------------------------------------------

Global Const sbErrorModuleLoad = 16
' The requested module can not be loaded.
' ---------------------------------------

Global Const sbErrorModuleFunction = 17
' The requested function does not exist in the module.
' ----------------------------------------------------

Global Const sbErrorModuleInitialize = 18
' The module did not initialize correctly
' ---------------------------------------

Global Const sbErrorModuleVersion = 19
' The module was developed for a different version of ScriptBasic.
' ----------------------------------------------------------------

Global Const sbErrorBadFileNumber = 20
' File number is out of range, it should be between 1 and 512
' -----------------------------------------------------------

Global Const sbErrorFileNumberIsUsed = 21
' The file number is already used.
' --------------------------------

Global Const sbErrorFileCannotBeOpened = 22
' The file can not be opened.
' ---------------------------

Global Const sbErrorFileIsNotOpened = 23
' The file is not opened.
' -----------------------

Global Const sbErrorInvalidLock = 24
' The lock type is invalid.
' -------------------------

Global Const sbErrorPrintFail = 25
' The print command failed. The file may be locked by another process.
' --------------------------------------------------------------------

Global Const sbErrorMkdirFail = 26
' Directory can not be created.
' -----------------------------

Global Const sbErrorDeleteFail = 27
' The directory or file could not be deleted.
' -------------------------------------------

Global Const sbErrorNotimp = 28
' Command is not implemented and no currently loaded extension module defined behaviour for it
' --------------------------------------------------------------------------------------------

Global Const sbErrorInvalidJoker = 29
' The character can not be a joker or wild card character.
' --------------------------------------------------------

Global Const sbErrorNoResume = 30
' The code tried to execute a resume while not being in error correction code.
' ----------------------------------------------------------------------------

Global Const sbErrorInvalidDirectoryName = 31
' The directory name in open directory is invalid.
' ------------------------------------------------

Global Const sbErrorInvalidOptionDirOpen = 32
' Invalid option for directory open.
' ----------------------------------

Global Const sbErrorDirectoryNoOpen = 33
' The directory can not be opened.
' --------------------------------

Global Const sbErrorBadRecordLength = 34
' The record length is invalid in the open statements (undefined, zero or negative)
' ---------------------------------------------------------------------------------

Global Const sbErrorNoCurrentDirectory = 35
' The current directory can not be retrieved for some reason.
' -----------------------------------------------------------

Global Const sbErrorChDirUndef = 36
' The directory name in chdir can not be undef.
' ---------------------------------------------

Global Const sbErrorChDir = 37
' Cannot change the current working directory to the desired directory.
' ---------------------------------------------------------------------

Global Const sbErrorReturnWithoutGosub = 38
' The command RETURN can not be executed, because there is no where to return.
' ----------------------------------------------------------------------------

Global Const sbErrorInvalidArgumentForFunctionAddress = 39
' The argument for the function address is invalid.
' -------------------------------------------------

Global Const sbErrorSetfileInvalidAttribute = 40
' The attribute value or symbol is invalid in the set file command.
' -----------------------------------------------------------------

Global Const sbErrorChownInvalidUser = 41
' The user does not exist.
' ------------------------

Global Const sbErrorChownNotSupported = 42
' The chown command is not supported on Win95 and Win98
' -----------------------------------------------------

Global Const sbErrorChownSetOwner = 43
' Can not change owner.
' ---------------------

Global Const sbErrorInvalidFileName = 44
' The file name is invalid.
' -------------------------

Global Const sbErrorSetCreateTime = 45
' Setting the create time of the file has failed.
' -----------------------------------------------

Global Const sbErrorSetModifyTime = 46
' Setting the modify time of the file has failed.
' -----------------------------------------------

Global Const sbErrorSetAccessTime = 47
' Setting the access time of the file has failed
' ----------------------------------------------

Global Const sbErrorInvalidTimeFormat = 48
' The specified time format is invalid
' ------------------------------------

Global Const sbErrorInvalidTime = 49
' The time is not valid, cannot be earlier than January 1, 1970. 00:00
' --------------------------------------------------------------------

Global Const sbErrorExtensionSpecific = 50
' Extension specific error: %s
' ----------------------------

Global Const sbErrorSocketFile = 51
' The operation can be done on files only and not on sockets.
' -----------------------------------------------------------

Global Const sbErrorInvalidCode = 52
' The embedding application tried to start the code at an invalid location
' ------------------------------------------------------------------------

Global Const sbErrorMandarg = 53
' Mandatory argument is missing
' -----------------------------

Global Const sbErrorTimeout = 54
' Subprocess did not finish within time limits
' --------------------------------------------

Global Const sbErrorStaysInMemory = 55
' The module can not be unloaded
' ------------------------------

Global Const sbErrorPreprocessorAbort = 56
' The preprocessor said to abort program compilation or execution.
' ----------------------------------------------------------------

Here is the code you were having issue with corrected.  END for some reason doesn't like being used on a single line IF.

Code: [Select]
sub main
    filein = command()
    if filein = "" then
      print  "enter filein: "
      line input filein
      fileln = chomp(fileln)
    end if
    if filein = "" then
     end
    end if
end sub

main()
end

jrs@laptop:~/sb/sb22/test$ scriba edtest.sb
enter filein: Ed
jrs@laptop:~/sb/sb22/test$

« Last Edit: February 20, 2016, 08:18:20 PM by John »

ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #8 on: February 20, 2016, 09:00:26 PM »
Ed,

I noticed you like using a MAIN function. The primary workspace is MAIN and the following code shows what SB sees under the covers for it. MODULE / END MODULE is used to create other namespaces for your script. This is how most extension modules are used. I mention this because if you embed Script BASIC and call its functions or access its global varables, the main:: prefix is needed. Also use lower case for names when accessing via the embedded method.

Code: [Select]
FUNCTION main::test(arg1)
  PRINT arg1,"\n"
END FUNCTION

main::v = 99
main::test(v)

jrs@laptop:~/sb/sb22/test$ scriba edtest2.sb
99
jrs@laptop:~/sb/sb22/test$


John
« Last Edit: February 20, 2016, 09:11:02 PM by John »

ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #9 on: February 20, 2016, 11:32:15 PM »
Nice job!
I'll see what I can improve on.

Code: [Select]
{
/*
 Example 2
 */
    var1 = 5;
    var2 = 10;
    var3 = var1 * var2;
}


jrs@laptop:~/sb/sb22/Ed$ time scriba scaned.sb example_2
line 1 col 1 Lbrace
line 5 col 5 Variable 0
line 5 col 10 Assign
line 5 col 12 Integer 5
line 5 col 13 Semi
line 6 col 5 Variable 1
line 6 col 10 Assign
line 6 col 12 Integer 10
line 6 col 14 Semi
line 7 col 5 Variable 2
line 7 col 10 Assign
line 7 col 12 Variable 0
line 7 col 17 Mul
line 7 col 19 Variable 1
line 7 col 23 Semi
line 8 col 1 Rbrace
line 9 col 1 EOI 3

real   0m0.018s
user   0m0.018s
sys   0m0.000s
jrs@laptop:~/sb/sb22/Ed$


ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #10 on: February 21, 2016, 04:37:46 AM »
Ed,

Here are a couple options you may want to think about using with your Script BASIC scanner.

This merges your associative arrays which can be accessed by array index for the values contained.

Code: [Select]
tk{"eoi"} = "EOI"
tk{"if"} = "If"
tk{"putc"} = "Putc"
tk{"while"} = "While"
tk{"lbrace"} = "Lbrace"
tk{"rbrace"} = "Rbrace"
tk{"lparen"} = "Lparen"
tk{"rparen"} = "Rparen"
tk{"uminus"} = "Uminus"
tk{"mul"} = "Mul"
tk{"div"} = "Div"
tk{"add"} = "Add"
tk{"sub"} = "Sub"
tk{"lss"} = "Lss"
tk{"gtr"} = "Gtr"
tk{"semi"} = "Semi"
tk{"assign"} = "Assign"
tk{"integer"} = "Integer"
tk{"variable"} = "Variable"

FOR x = 0 TO UBOUND(tk) STEP 2
  PRINT tk[x]," - ",tk[x+1],"\n"
NEXT


jrs@laptop:~/sb/sb22/Ed$ scriba scan_fb1.sb
eoi - EOI
if - If
putc - Putc
while - While
lbrace - Lbrace
rbrace - Rbrace
lparen - Lparen
rparen - Rparen
uminus - Uminus
mul - Mul
div - Div
add - Add
sub - Sub
lss - Lss
gtr - Gtr
semi - Semi
assign - Assign
integer - Integer
variable - Variable
jrs@laptop:~/sb/sb22/Ed$


If you want to be more verbose, you could create a token map array.

Code: [Select]
tk[1]{"tk_value"} = 1       
tk[1]{"tk_name"} = "eoi"
tk[1]{"listname"} = "EOI"   

There is no limits (available memory) on array size, number of indices's or the mix you use in your definition. Each indice can be checked with a LBOUND/UBOUND and can be joined with assignments.

« Last Edit: February 21, 2016, 04:57:56 AM by John »

ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #11 on: February 21, 2016, 09:39:16 AM »
Quote from: ed
Scriba on my Win 7 machine

Are you using Dave's Script BASIC GUI IDE/Debugger?



Dave's Github Code Repository

Update COM/OLE IDE/Debugger Project - Bitbucket (Themed, 2.2 Script BASIC source, ...)

Windows Setup Install
« Last Edit: February 21, 2016, 09:51:31 AM by John »

ScriptBasic

  • Guest
Re: Scanner challenge
« Reply #12 on: February 25, 2016, 07:36:51 AM »
I notice one of Script BASIC's commercial clients created a help (.chm) file. This version of Script BASIC is pretty old (2003 vintage) but covers most of the current syntax used today.

Quote
Script Basic Programming for Web Devices and Custom Gateways

Script Basic is an implementation of old fashioned "standard" Basic, the way it was before visual everything. It is ideal for creating applications that need to interface a device with a proprietary ASCII protocol to an open control system (Modbus or BACnet). Script Basic is available in Internet I/O Models IB-100 and IB-110, AddMe Jr. Basic, AddMe Jr. Data Manager, and Babel Buster SP Custom.