- Attempt to fix the tables rendering

This commit is contained in:
Tristan B. Velloza Kildaire 2023-04-19 17:36:14 +02:00
parent 0bca21937b
commit 68d7ce5b8e
18 changed files with 102 additions and 110 deletions

View File

@ -1,10 +1,11 @@
## Lexical analysis
Lexical analysis is the process of taking a program as an input string
$A$ and splitting it into a list of $n$ sub-strings
$A_{1},\,A_{2}\ldots A_{n}$ called tokens. The length $n$ of this list
of dependent on several rules that determine how, when and where new
tokens are built - this set of rules is called a *grammar*.
*A* and splitting it into a list of *n* sub-strings
*A*<sub>1</sub>,*A*<sub>2</sub>…*A*<sub>*n*</sub> called tokens. The
length *n* of this list of dependent on several rules that determine
how, when and where new tokens are built - this set of rules is called a
*grammar*.
### Grammar
@ -27,7 +28,7 @@ definitions:
new Token("int") == new Token("int")
```
- ...would evaluate to `true`, rather than false by reference
- would evaluate to `true`, rather than false by reference
equality (the default in D)
- `Lexer` - The token builder
- `sourceCode`, the whole input program (as a string) to be
@ -38,7 +39,7 @@ definitions:
- Contains a list of the currently built tokens, `Token[] tokens`
- Current line and column numbers as `line` and `column`
respectively
- A "build up" - this is the token (in string form) currently
- A “build up” - this is the token (in string form) currently
being built - `currentToken`
### Implementation
@ -124,13 +125,10 @@ character == ':';
!!! error FInish this page
`\texttt{;}`{=tex} `\texttt{,}`{=tex} `\texttt{(}`{=tex} `\texttt{)}`{=tex} `\texttt{[}`{=tex} `\texttt{]}`{=tex} `\texttt{+}`{=tex} `\texttt{-}`{=tex} `\texttt{/}`{=tex} `\texttt{\%}`{=tex} `\texttt{*}`{=tex} `\texttt{\&}`{=tex} `\texttt{\{}`{=tex} `\texttt{\}}`{=tex}
             
`\texttt{=}`{=tex} \| (TODO: make it
texttt) \\texttt{\^} `\texttt{!}`{=tex} `\texttt{\\n}`{=tex}(TODO:
`\n `{=tex}not
appearing) \\texttt{\~} `\texttt{.}`{=tex} `\texttt{\:}`{=tex}
 \| (TODO: make it texttt) \\texttt{^}  (TODO: not
appearing) \\texttt{\~}  
Whenever this method returns `true` it generally means you should flush
the current token, start a new token add the offending spliter token and

View File

@ -2,7 +2,7 @@
Once we have generated a list of tokens (instances of `Token`) from the
`Lexer` instance we need to turn these into a structure that represents
our program's source code *but* using in-memory data-structures which we
our programs source code *but* using in-memory data-structures which we
can traverse and process at a later stage.
### Overview
@ -12,7 +12,7 @@ sub-structures of a TLang program and returning different data types
generated by these methods. The parser has the ability to move back and
forth between the token stream provided and fetch the current token
(along with analysing it to return the type of symbol the token
represents - known as the `SymbolType` (TODO: Cite the "Symbol types"
represents - known as the `SymbolType` (TODO: Cite the “Symbol types”
section).
For example, the method `parseIf()` is used to parse if statements, it
@ -30,7 +30,7 @@ proper module support
### API
The API exposed by the parser is rather minimal as there isn't much to a
The API exposed by the parser is rather minimal as there isnt much to a
parser than controlling the token stream pointer (the position in the
token stream), fetching the token and acting upon the type or value of
said token. Therefore we have the methods summarised below:
@ -132,7 +132,7 @@ within the `parsing/data/check.d` and contains the following methods:
that you provide it a `SymbolType` and it will return the
corresponding string that is of that type.
- This will work only for back-mapping a sub-section of tokens as
you won't get anything back if you provide
you wont get anything back if you provide
`SymbolType.IDENT_TYPE` as there are infinite possibiltiies for
that - not a fixed token.
@ -280,7 +280,7 @@ while (hasTokens())
```
Following this we now have several checks that make use of
`getSymbolType(Token)` in order to determine what the token's type is
`getSymbolType(Token)` in order to determine what the tokens type is
and then in our case if the token is `"if"` then we will make a call to
`parseIf()` and append the returned Statement-sub-type to the body of
statements (`Statement[]`):

View File

@ -12,7 +12,7 @@ time but also makes the implementation bloated as all logic must be in
one file to support each stage. There are also other disbenefits:
1. Symbol definitions
- Doing a single pass means you haven't stored all symbols in the
- Doing a single pass means you havent stored all symbols in the
program yet, hence resolution of some will fail unless you do
some sort of over-complicated lookahead to find them - cache
them - and then retry. In general it makes all sort of
@ -24,28 +24,28 @@ one file to support each stage. There are also other disbenefits:
2. Dependencies
- Some instructions must be generated which do not have a
syntactical mapping. I.e. the static initialization of a class
doesn't have a parser/AST node equivalent. Therefore our
doesnt have a parser/AST node equivalent. Therefore our
multi-stage system of parserot-to-dependency coversions allows
us to convert all AST nodes to dependency nodes and add extra
dependency nodes (such as `ClassStaticInit`) into the dependency
tree despite them having no AST equivalent.
- Splitting this up also let's us more easily, once again, about
- Splitting this up also lets us more easily, once again, about
symbols that are defined but reauire static initializations, and
looping structures which must be resolved and can easily be done
if we know all symbols (we just walk the AST tree)
*And the list goes on...*
*And the list goes on*
Hopefully now one understands as to why a multi-pass compiler is both
easier to write (as the code is more modular) and easier to reason about
in terms symbol resolution. It is for this reason that a lot of the code
you see in the dependency processor looks like a duplicate of the parser
processor but in reality it's doing something different - it's generated
processor but in reality its doing something different - its generated
the actual executable atoms that must be typechecked and have code
generated for - taking into account looping structures and so forth.
> The dependency processor adds execution to the AST tree and the
> ability to reason about visited nodes and "already-initted" structures
> ability to reason about visited nodes and “already-initted” structures
### What gets accomplished?
@ -60,19 +60,19 @@ and creation process provides us:
mark them as visited hence a use-before-declare situation is
easy to detect and report to the end-user
2. Tree of execution
- When the dependency tree is fully created it can be "linearized"
- When the dependency tree is fully created it can be “linearized”
or left-hand leaf visited whereby eahc leaf-left node is
appended into an array.
- This array then provides us a list of `DNode`s we walk through
in the typechecker and can effectively generate instructions
from and perform typechecking
- It's an easy to walk through "process - typecheck - code gen".
- Its an easy to walk through “process - typecheck - code gen”.
3. Non-AST equivalents
- There is no equivalent AST node that represents a "static
allocation" - that is something derived from the AST tree,
therefore we need a list of **concrete** "instructions" which
- There is no equivalent AST node that represents a static
allocation - that is something derived from the AST tree,
therefore we need a list of **concrete** “instructions” which
precisely tell the code generator what to do - this is one of
those cases where a AST tree wouldn't help us - or we we would
those cases where a AST tree wouldnt help us - or we we would
effectively have to implement this all in the parser which leads
to overly complex parser.
@ -104,7 +104,7 @@ wraps the following methods and fields within it:
needed, therefore a second visitation state is required. See
`tree()`.
7. `DNode[] dependencies`
- The current `DNode`'s array of depenencies which themselves are
- The current `DNode`s array of depenencies which themselves are
`DNode`s
8. `performLinearization()`
- Performs the linearization process on the dependency tree,
@ -133,7 +133,7 @@ wraps the following methods and fields within it:
The DNodeGenerator is used to generate dependency node objects
(`DNode`s) based on the current state of the type checker. It will use
the type checker's facilities to lookup the `Module` that is contained
the type checkers facilities to lookup the `Module` that is contained
within and use this container-based entity to traverse the entire parse
tree of the container and process each different possible type of
`Statement` found within, step-by-step generating a dependency node for
@ -156,7 +156,7 @@ TODO: Discuss the `DNodeGenerator`
### Pooling
Pooling is the technique of mapping a given parse node, let's say some
Pooling is the technique of mapping a given parse node, lets say some
kind-of `Statement`, to the same `DNode` everytime and if no mapping
exists then creating a `DNode` for the respective parse node once off
and then returning that same dependency node on successive requests.
@ -176,7 +176,7 @@ status of said `DNode` during processing.
Below we have an example of what this process looks like. In this case
we would have done something akin to the following. Our scenario is that
we have some sort of parse node, let's assume it was a `Variable` parse
we have some sort of parse node, lets assume it was a `Variable` parse
node which would represent a variable declaration.
![](/projects/tlang/graphs/pandocplot11037938885968638614.svg)
@ -190,7 +190,7 @@ it and then confirmed that the `varDNode.entity` is equal to that of the
(`varPNode`) in order to show the returned dependency node will be the
same as that referenced by `varDNode`.
``` {.d .numberLines}
``` d
Variable varPNode = <... fetch node>;
DNode varDNode = pool(varPNode);

View File

@ -23,7 +23,7 @@ instructions and contains some common methods used by all of them:
that if such context is needed during further code generation
(or even emit) it can then be accessed
2. `Context getContext()`
- Returns this instruction's associated context via its `Context`
- Returns this instructions associated context via its `Context`
object
3. `string produceToStrEnclose(string addInfo)`
- Returns a string containing the additional info provided through
@ -58,12 +58,12 @@ be found in
### Code generation
The method of code generation and type checking starts by being provided
a so-called "action list" which is a linear array of dependency-nodes
(or `DNode`s for code's sake), this list is then iterated through by a
a so-called “action list” which is a linear array of dependency-nodes
(or `DNode`s for codes sake), this list is then iterated through by a
for-loop, and each `DNode` is passed to a method called
`typeCheckThing(DNode)`:
``` {.d .numberLines}
``` d
foreach(DNode node; actionList)
{
/* Type-check/code-gen this node */
@ -75,7 +75,7 @@ The handling of every different instruction type and its associated
typechecking requirements are handled in one huge if-statement within
the `typeCheckThing(DNode)` method. This method will analyse a given
dependency-node and perform the required typechecking by extracting the
`DNode`'s emebedded parser-node, whilst doing so if a type check passes
`DNode`s emebedded parser-node, whilst doing so if a type check passes
then code generation takes place by generating the corresponding
instruction and adding this to some position in the code queue
(discussed later).
@ -86,8 +86,8 @@ TODO: Add information on this
The code queue is used as a stack and a queue in order to facilitate
instruction generation. Certain instructions are produced once off and
then added to the back of the queue (*"consuming"* instructions) whilst
other are produced and pushed onto the top of the queue (*"producing"*
then added to the back of the queue (*“consuming”* instructions) whilst
other are produced and pushed onto the top of the queue (*“producing”*
instructions) for consumption by other consuming instructions later.
An example of this would be the following T code which uses a binary

View File

@ -94,7 +94,7 @@ else if(cast(VariableDeclaration)instruction)
What we have here is some code which will extract the name of the
variable being declared via `varDecInstr.varName` which is then used to
lookup the parser node of type `Variable`. The `Variable` object
contains information such as the variable's type and also if a variable
contains information such as the variables type and also if a variable
assignment is attached to this declaration or not.
TODO: Insert code regarding assignment checking
@ -129,7 +129,7 @@ usage. In this case we want to translate the symbol of the entity named
`simple_variables_decls_ass`. Therefore we provide both peices of
information into the function `symbolLookup`:
``` {.d .numberLines}
``` d
// The relative container of this variable is the module
Container container = tc.getModule();

View File

@ -21,9 +21,9 @@ following methods:
1. `this(string sourceCode, File emitOutFile)`
- Constructs a new compiler object with the given source code and
the file to write the emitted code out to
- An newly initialized `File` struct that doesn't contain a valid
- An newly initialized `File` struct that doesnt contain a valid
file handle can be passed in in the case whereby the emitter
won't be used but an instance of the compiler is required
wont be used but an instance of the compiler is required
2. `doLex()`
- Performs the tokenization of the input source code,
`sourceCode`.
@ -118,7 +118,7 @@ The types that can be stored and their respectives methods are:
Below is an example of the usage of the `ConfigEntry`s in the
`CompilerConfiguration` system, here we add a few entries:
``` {.d .numberLines}
``` d
/* Enable Behaviour-C fixes */
config.addConfig(ConfigEntry("behavec:preinline_args", true));
@ -138,7 +138,7 @@ Later on we can retrieve these entries, the below is code from the
`DGen` class which emits the C code), here we check for any object files
that should be linked in:
``` {.d .numberLines}
``` d
//NOTE: Change to system compiler (maybe, we need to choose a good C compiler)
string[] compileArgs = ["clang", "-o", "tlang.out", file.name()];

View File

@ -3,15 +3,15 @@
Despite my eagerness to jump directly into the subject matter at hand I
think believe there is something of even greater importance. Despite
there being a myriad of reasons I embarked upon this project something
more important than the stock-and-standard "I needed it to solve a
problem of mine" reasoning comes to mind. There is indeed a better
more important than the stock-and-standard I needed it to solve a
problem of mine reasoning comes to mind. There is indeed a better
reason for embarking on something that the mere technical *requirement
thereof* - I did this **because I can**. This sentiment is something
that I really hold dear to my heart despite it being a seemingly obvious
one. Of course you can do what you want with your code - it's a free
one. Of course you can do what you want with your code - its a free
country. One would not be wrong to make such a statement but mention
your ideas online and you get hounded down by others saying "that's
dumb, just use X" or "your implementation will be inefficient". These
your ideas online and you get hounded down by others saying “thats
dumb, just use X” or “your implementation will be inefficient”. These
statements are not entirely untrue but they miss the point that this is
an exercise in scientific thinking and an artistic approach at it in
that as well.
@ -23,5 +23,5 @@ expectations but luckily I do not require the external feedback of the
mass - just some close few friends who can appreciate my work and join
the endeavor with me.
*Don't let people stop you, you only have one life - take it by the
*Dont let people stop you, you only have one life - take it by the
horns and fly*

View File

@ -13,7 +13,7 @@ filter.
Tristan aims to be able to support all of these but with certain limits,
this is after all mainly an imperative language with those paradigms as
*"extra features"*. Avoiding feature creep in other systems-levels
*“extra features”*. Avoiding feature creep in other systems-levels
languages such as C++ is something I really want to stress about the
design of this language, I do not want a big and confusing mess that has
an extremely steep learning curve and way too many moving parts.
@ -117,7 +117,7 @@ in my viewpoint and hence we support such features as:
- Pointers
- The mere *support* of pointers allowing one to take a
memory-level view of objects in memory rather than the normal
"safe access" means
“safe access” means
- Inline assembly
- Inserting of arbitrary assembler is allowed, providing the
programmer with access to systems level registers,
@ -125,7 +125,7 @@ in my viewpoint and hence we support such features as:
- Custom byte-packing
- Allowing the user to deviate from the normal struct packing
structure in favor of a tweaked packing technique
- Custom packing on a system that doesn't agree with the alignment
- Custom packing on a system that doesnt agree with the alignment
of your data **is** allowed but the default is to pack
accordingly to the respective platform

View File

@ -1,6 +1,6 @@
# Language
This page serves as an official manual for both user's of TLang and
This page serves as an official manual for both users of TLang and
those who want to understand/develop the internals of the compiler and
runtime (the language itself).

View File

@ -2,7 +2,7 @@
The grammar for the language is still a work in progress and may take
some time to become concrete but every now and then I update this page
to add more to it or fix any incongruencies with the parser's actual
to add more to it or fix any incongruencies with the parsers actual
implementation. The grammar starts from the simplest building blocks and
then progresses to the more complex (heavily composed) ones and these
are placed into sections whereby they are related.

View File

@ -12,22 +12,22 @@ attributes:
### Integral types
Type Width Intended interpretation
-------- ------- ---------------------------------
byte 8 signed byte (two's complement)
ubyte 8 unsigned byte
short 16 signed short (two's complement)
ushort 16 unsigned short
int 32 signed int (two's complement)
uint 32 unsigned int
long 64 signed long (two's complement)
ulong 64 unsigned long
| Type | Width | Intended interpretation |
|--------|-------|---------------------------------|
| byte | 8 | signed byte (twos complement) |
| ubyte | 8 | unsigned byte |
| short | 16 | signed short (twos complement) |
| ushort | 16 | unsigned short |
| int | 32 | signed int (twos complement) |
| uint | 32 | unsigned int |
| long | 64 | signed long (twos complement) |
| ulong | 64 | unsigned long |
#### Conversion rules
1. TODO: Sign/zero extension
2. Promotion?
3. Precedence in interpretation when the first two don't apply
3. Precedence in interpretation when the first two dont apply
### Decimal

View File

@ -32,7 +32,7 @@ else if(val == 3)
```
In the case the conditions are not true for any of the `if` or `else if`
branches then "default" code can be run in the `else` branch as such:
branches then “default” code can be run in the `else` branch as such:
``` d
int val = 2;

View File

@ -3,7 +3,7 @@
Arrays allow us to have one name refer to multiple instances of the same
type. Think of an array like having multiple variables of the same type
tightly packed next to one-another but being able to refer to this group
by a *single name* and *each instance* by a number - an *"offset"* so to
by a *single name* and *each instance* by a number - an *“offset”* so to
speak.
### Stack arrays
@ -13,7 +13,7 @@ Stack arrays are what we refer to when we allocate an array
stack space of the current stack frame (the space for the current
function call).
``` {.d numberLines="1" hl_lines="5"}
``` d
module simple_stack_arrays4;
int function()

View File

@ -2,14 +2,14 @@
Pointers are just like any other variable one would declare but what is
important is that their values can be used in certain operations. A
pointer's value is an address of another variable and one can use a
pointers value is an address of another variable and one can use a
pointer to indirectly refer to such a variable and indirectly fetch or
update its value.
A pointer type is written in the form of `<type>*`, for example one may
write `int*` which is read as "a pointer to an `int`".
write `int*` which is read as “a pointer to an `int`.
TODO: All pointers are 64-bit values - the size of addresses on one's
TODO: All pointers are 64-bit values - the size of addresses on ones
system.
### Pointer syntax
@ -17,22 +17,16 @@ system.
There are a few operators that can be used on pointers which are shown
below, most specific of which are the `*` and `&` unary operators:
---------------------------------------------------------------------------------
Operator Description Example
---------------------- ----------------------------- ----------------------------
`&` Gets the address of the `int* myVarPtr = &myVar`
identifier
`*` Gets the value at the address `int myVarVal = *myVarPtr`
held in the referred
identifier
---------------------------------------------------------------------------------
| Operator | Description | Example |
|----------|---------------------------------------------------------------|----------------------------|
| `&` | Gets the address of the identifier | `int* myVarPtr = &myVar` |
| `*` | Gets the value at the address held in the referred identifier | `int myVarVal = *myVarPtr` |
Below we will declare a module-level global variable `j` of type `int`
and then use a function to indirectly update its value by the use of a
pointer to this integer - in other words an `int*`:
``` {.d numberLines="1"}
``` d
module simple_pointer;
int j;
@ -58,7 +52,7 @@ named `ptr`. This can hold the address of memory that points to an
function with the argument `&j` which means it is passing a pointer to
the `j` variable in.
``` {.d numberLines="1"}
``` d
int function(int* ptr)
{
*ptr = 2+2;
@ -81,15 +75,15 @@ What `int function(int* ptr)` does is two things:
Some of the existing operators such as those used for arithmetic have
special usage when used on pointers:
Operator Description Example
---------- ------------------------------------------------------------------ ---------
`+` Allows one to offset the pointer by a `+ offset*sizeof(ptrType)` `ptr+1`
`-` Allows one to offset the pointer by a `- offset*sizeof(ptrType)` `ptr-1`
| Operator | Description | Example |
|----------|------------------------------------------------------------------|---------|
| `+` | Allows one to offset the pointer by a `+ offset*sizeof(ptrType)` | `ptr+1` |
| `-` | Allows one to offset the pointer by a `- offset*sizeof(ptrType)` | `ptr-1` |
Below we show how one can use pointer arithmetic and the casting of
pointers to work on sub-sections of data referenced to by a pointer:
``` {.d linenums="1" hl_lines="12-14"}
``` d
module simple_pointer_cast_le;
int j;
@ -123,7 +117,7 @@ access the 4 byte integer byte-by-byte, on x86 we would be starting with
the least-significant byte. What we have done here is updated said byte
to the value of `2+2`:
``` {.d linenums="1"}
``` d
byte* bytePtr = cast(byte*)ptr;
*bytePtr = 2+2;
```
@ -133,7 +127,7 @@ which would increment the address by `1`, resultingly pointing to the
second least significant byte, we then use the dereference operator `*`
to set this byte to `1`:
``` {.d linenums="1"}
``` d
*(bytePtr+1) = 1;
```
@ -142,7 +136,7 @@ should (TODO: we can explain the memory here) become the result of
`256+4` (that is `260`). After this we then return that number with two
added to it:
``` {.d linenums="1"}
``` d
return (*ptr)+1*2;
```
@ -151,7 +145,7 @@ return (*ptr)+1*2;
One can even mix these if they want, for example we can do the
following:
``` {.d numberLines="1"}
``` d
module simple_stack_arrays3;
void function()

View File

@ -18,7 +18,7 @@ struct <name>
}
```
Note: Assignments to these variables within the struct's body is not
Note: Assignments to these variables within the structs body is not
allowed.
#### Example

View File

@ -1,21 +1,21 @@
## External symbols
Some times it is required that a symbol be processed at a later stage
that is not within the T compiler's symbol procvessing stage but rather
that is not within the T compilers symbol procvessing stage but rather
at the linking stage. This is known as late-binding at link time where
such symbols are only resolved then which can help one link their T
program to some symbol in an ELF file (linked in with extra `gcc`
arguments to `DGen`) or in a C standard library autpmatically included
in the DGen's emitted C code.
in the DGens emitted C code.
In order to use such a feature one can make use of the `extern` keyword
which us specify either a function's signature or variable that should
which us specify either a functions signature or variable that should
be resolved during C compilation time **but** such that we can still use
it in our T program with typechecking and all.
One could take a C program such as the following:
``` {.c .numberLines}
``` c
#include<unistd.h>
int ctr = 2;
@ -29,14 +29,14 @@ unsigned int doWrite(unsigned int fd, unsigned char* buffer, unsigned int count)
and then compile it to an on object file named `file_io.o` with the
following command:
``` {.bash .numberLines}
``` bash
gcc source/tlang/testing/file_io.c -c -o file_io.o
```
And then link this with your T program using the command (take note of
the flag `-ll file_io.o` which specifies the object to link in):
``` {.bash .numberLines}
``` bash
./tlang compile source/tlang/testing/simple_extern.t \
-sm HASHMAPPER \
-et true \
@ -47,16 +47,16 @@ the flag `-ll file_io.o` which specifies the object to link in):
### External functions
To declare an external function use the `extern efunc ...` clause
followed by a function's signature. Below we have an example of the
followed by a functions signature. Below we have an example of the
`doWrite` function from our C program (seen earlier) being specified:
``` {.d .numberLines}
``` d
extern efunc uint doWrite(uint fd, ubyte* buffer, uint count);
```
We can now go ahead and use this function as a call such as with:
``` {.d .numberLines}
``` d
extern efunc uint write(uint fd, ubyte* buffer, uint count);
void test()
@ -75,13 +75,13 @@ followed by the variable declaration (type and name). Below we have an
example of the `ctr` variable from our C program seen earlier being
specified:
``` {.d .numberLines}
``` d
extern evar int ctr;
```
We have the same program as before where we then refer to it with:
``` {.d .numberLines}
``` d
...
extern evar int ctr;

View File

@ -11,7 +11,7 @@ Primitive data type are the building blocks of which other more complex types ar
### Integral types
| Type | Width | Intended interpretation |
|-|-|-|
|------|-------|-------------------------|
| byte | 8 | signed byte (two's complement) |
| ubyte | 8 | unsigned byte |
| short | 16| signed short (two's complement) |

View File

@ -37,7 +37,7 @@ function generateMarkdown()
outputFile="docs/$(echo $doc | cut -b 9-)"
pandoc -F pandoc-plot -M plot-configuration=pandoc-plot.conf -f markdown -t markdown "$doc" -o "$outputFile"
pandoc -F pandoc-plot -M plot-configuration=pandoc-plot.conf -f markdown -t gfm "$doc" -o "$outputFile"
echo "$(cat $outputFile | sed -e s/docs\\//\\/projects\\/tlang\\//)" > "$outputFile"
cat "$outputFile"