- Documented the `LexerInterface`'s methods (its API)
This commit is contained in:
Tristan B. Velloza Kildaire 2023-07-09 16:06:32 +02:00
parent 2a015a60bb
commit 7b3bee59f1
2 changed files with 68 additions and 9 deletions

View File

@ -8,13 +8,43 @@ tokens are built - this set of rules is called a *grammar*.
### Grammar
TODO: Add link to other seciton or remove this
TODO: fix the link The grammar is described in the [language
section](31-grammar.md) and can be viewed alongside this section for
some context.
### Overview of implementation
### Overview of files
The source code for the lexical analysis part of the compiler is located
in `source/tlang/lexer.d` which contains two important class
definitions:
in `source/tlang/compiler/lexer/` which contains a few important module
and class definitions.
#### Lexer API
The **Lexer API** describes the required methods that a tokenizer must
have in order to be used within the TLang compiler infrastructure. The
reason for describing such an interface is such that more improved
tokenizers can be easily integrated down the line in a manner that does
not require much churn in the parts of the code base that use the lexer
(namely the parser).
The API is described in the table below and the file in question is in
`source/tlang/compiler/lexer/core/lexer.d` which contains the
`LexerInterface` described below:
| Method name | Return type | Description |
|---------------------|-------------|-------------------------------------------------------------------------------|
| `getCurrentToken()` | `Token` | Returns the `Token` at the current cursor position |
| `nextToken()` | `void` | Moves the cursor forward once |
| `previousToken()` | `void` | Moves the cursor backwards once |
| `setCursor(ulong)` | `void` | Sets the cursors position to the given index |
| `getCursor()` | `ulong` | Returns the cursors current position |
| `hasTokens()` | `bool` | Returns `true` if there are more tokens to be consumed, otherwise `false` |
| `getLine()` | `ulong` | Returns the line number the lexer is at |
| `getColumn()` | `ulong` | Returns the column number the lexer is at |
| `getTokens()` | `Token[]` | Exhausts the lexers token stream and returns all gathered tokens in an array |
Some auxillary types which are used in `LexerInterface` are that of the
below:
- `Token` - This represents a token
- Complete with the token string itself, `token`. Retrivebale with a
@ -29,7 +59,10 @@ definitions:
- …would evaluate to `true`, rather than false by reference equality
(the default in D)
- `Lexer` - The token builder
### impl basicLexer
- `BasicLexer` - The token builder
- `sourceCode`, the whole input program (as a string) to be tokenized
- `position`, holds the index to the current character in the string
array `sourceCode`

View File

@ -4,11 +4,33 @@ Lexical analysis is the process of taking a program as an input string $A$ and s
### Grammar
TODO: Add link to other seciton or remove this
TODO: fix the link
The grammar is described in the [language section](31-grammar.md) and can be viewed alongside this section for some context.
### Overview of implementation
### Overview of files
The source code for the lexical analysis part of the compiler is located in `source/tlang/lexer.d` which contains two important class definitions:
The source code for the lexical analysis part of the compiler is located in `source/tlang/compiler/lexer/` which contains a few important module and class definitions.
#### Lexer API
The **Lexer API** describes the required methods that a tokenizer must have in order to be used within the TLang compiler infrastructure. The reason for describing such an interface is such that more improved tokenizers can be easily integrated down the line in a manner that does not require much churn in the parts of the code base that use the lexer (namely the parser).
The API is described in the table below and the file in question is in `source/tlang/compiler/lexer/core/lexer.d` which contains the `LexerInterface` described below:
| Method name | Return type | Description |
|---------------------------|---------------|-------------------------------------------------------------------------------|
| `getCurrentToken()` | `Token` | Returns the `Token` at the current cursor position |
| `nextToken()` | `void` | Moves the cursor forward once |
| `previousToken()` | `void` | Moves the cursor backwards once |
| `setCursor(ulong)` | `void` | Set's the cursor's position to the given index |
| `getCursor()` | `ulong` | Returns the cursor's current position |
| `hasTokens()` | `bool` | Returns `true` if there are more tokens to be consumed, otherwise `false` |
| `getLine()` | `ulong` | Return's the line number the lexer is at |
| `getColumn()` | `ulong` | Return's the column number the lexer is at |
| `getTokens()` | `Token[]` | Exhausts the lexer's token stream and returns all gathered tokens in an array |
Some auxillary types which are used in `LexerInterface` are that of the below:
* `Token` - This represents a token
* Complete with the token string itself, `token`. Retrivebale with a call to `getToken()`
@ -18,7 +40,11 @@ The source code for the lexical analysis part of the compiler is located in `sou
new Token("int") == new Token("int")
```
* ...would evaluate to `true`, rather than false by reference equality (the default in D)
* `Lexer` - The token builder
### impl basicLexer
* `BasicLexer` - The token builder
* `sourceCode`, the whole input program (as a string) to be tokenized
* `position`, holds the index to the current character in the string array `sourceCode`
* `currentChar`, the current character at index-`position`