python-plus-plus/plans/do-while.md
2024-08-11 13:07:05 +10:00

8.4 KiB

Do While

I have just noticed something with the grammar of this language. It is that I use do-while, without the while for blocks of code within while/for loops and the branches of if conditions. But this also means that you can't put a while loop after any of them, because it will interpret the while condition as part of the do block of the loop/if condition. Consider the following:

if <condition1> do {
	<block1>
}
while <condition2> do {
	<block2>
}

In this example, if condition1 is true, then block1 will be executed at least once, and will be repeated while condition2 is true, and block2 will be executed once, no matter whether or not condition2 is true, or maybe more times if you have another while loop after it as well.

Correction

Ok. Um Actually..., I just checked and this is not true. It will instead just cause a syntax error, claiming that the do keyword in the while loop after the if should be a semicolon, because you need to have a semicolon after the while condition of a do-while loop. If you then put a semicolon before the do, then it will have the behaviour described above, but you probably shouldn't do that. Also I guess it makes it a bit more obvious what is happening, because from that you might infer that the while condition is being interpreted as part of the do block. I guess for ifs there is a remedy, which is to put else {} at the end, which would ensure that the parser isolates the do block from the while condition. But this is not a good solution, and this is definitely a problem that I will need to solve.

I remember the original reason I decided to do this hack with do blocks was because I needed to have some keyword that differentiated between when you have a struct and not, because the syntax for structs is MyStruct{arg1=val1, arg2=val2, ...}.

Paths forward

From what I see, there are a few ways that I can solve this.

  1. Use a keyword to indicate the end of the condition. For if blocks this can obviously be done with a then keyword, but the keyword is not so obvious for while and for loops. I could keep the do keyword, but have these loops expliciately check for the presense of the do keyword before continuing. Currently the way they work is that they parse the while/for keyword, then they parse the condition as an expression, then they parse just one statement. You can put a block using a do block because the entire do block counts as one statement. The do block, like these other loops, also only parses one statement for its body. But putting a sequence of statements within { and } is how it counts as one statement to the do block. The reason you can't just directly do

    if <condition> {
        <block>
    }
    

    is because the code that parses the condition will attempt to parse the { as the start of a struct instantiation.

  2. Modify when struct instatiation is valid. Currently, you can use { and } to start a struct instantiation after any expression, and it is only checked at runtime if the value of the expression is a type object that is a struct. I could make this a check in the grammar, by requiring the expression to be a single identifier token, but I still don't think this would work, because what if the identifer token is a boolean variable? I want to be able to do:

    if flag {
        <block>
    }
    

    But the parser will attempt to make flag into the identifier naming the type for a struct. So I don't think this will work. The reason that the check on whether or not the expression is only at runtime is because when I originally wrote it, I think I was anticipating some kind of expressions that return new types, and I wanted you to be allowed to instantiate structs that you construct in this way, but looking back, I don't think that that was a good idea. But in the runtime code, the only thing that I can really do is make the AST node for a struct instantiation require a single identifier, instead of a whole expression, this doesn't solve the issue of how I am going to differentiate between

    if flag {
        <block>
    }
    

    and

    if MyStruct {
        arg1=val1,
        arg2=val2,
        ...
    } {
        <block>
    }
    

    where you are checking the truthiness of your instance of MyStruct.

  3. I could just require parentheses around the condition. Honestly, though, I don't really want to, because I just like the aesthetic of not needing parentheses around conditions. So I probably won't do this.

  4. Change how struct instantiation works syntactically. I could just use ( and ) instead of { and }, so MyStruct(arg1=val1, arg2=val2, ...). But honestly, I don't like this for the same reason as the previous fix. However, just as I am writing this, I think I have come up with the solution. .. A single period would seperate the type and the open curly brace. So MyStruct.{arg1=val1, arg2=val2, ...}. Honestlty, I think that this will be what I go with. It also allows me to solve my problem concerning the types of array literals. I'm currently doing some very weird stuff with emtpy variable types, and using the first element to indicate the type of the rest, which isn't always ideal if the first element is a subtype of the type you want for the array, and it's just a whole ordeal, which I think that this syntax can solve. I can do something like Jai (I think it does this? I haven't checked) where you prepend the array with the type, and then a period, so int.[0, 1, 2, 3]. This wpill also allow you to indicate whether or not integer literals should be interpreted as floats for example. I think that if you tried to create an array of floats using integer literals, it wouldn't work, and you would have to do something like putting a .0 at the end of all the integers. Back on the topic of the period, this syntax is basically already being used for Enums, where you have to specify the type and then the member, and I had always been planning on allowing the type to be not provided, if the context was enough to deduce what enum type was being returned, for example, if you are in a function the language can use the return type to deduce the type. I can now also extend this idea to structs and arrays, where you don't need to specify the type if it is determined by something like the return type of a function or something else. I guess there would just a few rinckles to iron out depending on what kind of expression is allowed to the left of the period. I will probably have to make type expressions just regular expressions (i.e. normal expressions, not RegEx), and then have it check when parsing if the expression to the left of the period is a type expression. The same check will also have to be done in type declarations in function arguments and what-not. I think out of all the paths forward, this will be what I will do, if I ever come back to this language. Ultimately, though, this is just syntax bikeshedding, and there are certainly more important things that I should be focusing on at the moment if I want to get this language into a functional state.

I haven't worked on this lanaguage in like half a year or so, and I only really came back to check it out because I started learning to use emacs, and I just wanted to explore my old projects through emacs. I will probably not work on it soon, but once everything that is taking up my time has been dealt with, I plan on working on the language further. As you can probably tell from the opening of this, I wrote this without really checking whether or not what I wrote was true. This problem with do-while just came to me as a random thought, and I just decided to write this markdown file as some documentation for my future self in case I ever came back to this langauge. I think it was certainly productive, because throughout the course of writing this, I think I came across something even better for the langauge, inspired by Jai. I originally wanted this language to be at the same level as python, as this langauge came out of my frustrations with python, but on the whole I think I now want to eventually make this language closer to c/rust/jai, but I will need to do a lot of work to get there. The current state of the project (which, like I said, has been untouched for half a year) is currently nowhere near that, and is, like originally planned, closer to where python is, especially with how it deals with dicts and lists. There is currently no memory management at all, and you can't even decide the size of ints. So yeah, this project has a long way to go.