•   almost 5 years ago

I'd like to try writing a parser for C

I'm a bit familiar with compilers (but I've used only tools like lex and yacc for creating tokens and their parsing) and ..well I only know how to code in C. I'd love for more people to learn C btw and I'd love to help out.

But unfortunately, I don't really understand the requirements. I get that I should understand Coffeescript. That seems fine. But I don't really understand the rest. Plus what sort of features would be required for our parser? To be honest I'm not really sure what are the questions I need to ask.

Can someone give me a slightly more detailed explanation of the requirements? So I'd guess that'll give me an idea of the scale of the problem and whether I can even try tackling it.

  • 8 comments

  • Moderator   •   almost 5 years ago

    Hi chotighoti!

    C would be a great language to add, although because it's so different from JavaScript, it's not exactly clear what all the tricky bits will be. Have you seen the Mozilla abstract syntax tree format that other JS tools like Esprima and Acorn generate from JavaScript code or that the CoffeeScript Redux compiler generates from CoffeeScript code? Your C parser should do two things:

    1) Using JavaScript/CoffeeScript, parse C code into this AST format.
    2) Add a JavaScript/CoffeeScript runtime library that adds functionality needed to run the C code (types, language features, basic standard libraries).

    For a trivial example, imagine this code:

    int foo = 8;
    float bar = 5 * sizeof foo; // bar = 5 * 4 = 20.0
    float baz = bar / foo; // baz = 20.0 (float) / 8 (int) = 3.0 (integer division truncates)

    Your code might translate it it into an AST which, when then generating concrete JavaScript code again, could end up like this:

    // We'll need to store the types of all the variables we define.
    var __types = {'foo': __c.types.int, 'bar': __c.types.float, 'baz': __c.types.float};
    var foo, bar;

    // This part is easy, at least!
    foo = 8;

    // We'll need to use a JS function to emulate C's sizeof operator.
    bar = 5 * __c.functions.sizeof(__types['foo']);

    // We'll need to use another function to emulate C's float / integer division truncation.
    baz = __c.functions.divide({value: bar, type: __types['bar']}, {value: 'foo', type: __types['baz']});

    So for this example, your runtime library (__c) would need to define functions.sizeof, functions.divide, and all the types.

    You wouldn't need to handle every possible piece of C code–it's not like we care about writing files and doing stdio and such things, and if some of the really advanced semantics are a bit different, we can live with that. But the basic stuff like you might use to play CodeCombat levels should be there. And since CodeCombat's API is object-oriented, then there'll have to be some syntax for emulating things like this:

    this.attack(this.getNearbyEnemy());

    Does this make any more sense?

  •   •   almost 5 years ago

    Ah! It does make sense. I'll give this a shot and hopefully return with more queries!

  •   •   almost 5 years ago

    It probably doesn't need to be that complex for typing as I thought types were checked at compile time in C which would be the parsing step here. The biggest problems with C off the top of my head would be things like pointer arithmetic or other things that let you get really low. Although that hopefully falls under the complex category. The differences in how it deals with even something like strings could be a problem as well though.

    Just some thoughts on it.

  •   •   almost 5 years ago

    Yeah, although it seems that we'd need only a subset of the features in C to play the game as such. (Although admittedly, I haven't checked out the game or its API - Note to self)

    I also need to learn CoffeeScript and see how I'd implement the more basic features before I start hitting the big problems :D. With regards to to types, even basic typecasting a variable to another type could probably cause sleepless nights during implementation.

    Or not.

    I really have no clue about Coffeescript and the abstract syntax tree format. So my first order of business would be figuring out these two things. Then I'll move on to parsing basic arithmetic and then start iterations where I add language features of increasing complexity. Hopefully before the May deadline :D

    Thanks for your thoughts btw. I really appreciate them.

  • Moderator   •   almost 5 years ago

    Xavion, good point about the compile-time type checking. That would probably be good to do as well as part of the parser. I think you'll still need the runtime type information, though, because in the example I gave, JavaScript's Number type will happily give a floating point result from the integer division, because it doesn't have integers and won't truncate–so to mimic actual C runtime behavior, you'd have to mimic the separate float and int types so that the truncation would happen.

    Now it's not like you'd have to start with this or to handle every single C type and the differences between how they interact and how the simpler JS types would do it, but it's not really going to be very C-like if it's just C syntax with JS primitive types.

    If you don't want to learn CoffeeScript, you could write your parser in vanilla JavaScript. I just recommend CoffeeScript because I like it, I think it's more productive and expressive, and the rest of the CodeCombat stack is using it. But because this parser would be a separate open source project, it's up to you whether to do it in JS or CS.

  •   •   almost 5 years ago

    You shouldn't need to pass any types to the runtime. For the float/int division since the only two things it can take as inputs are literals and variables you wouldn't need to pass anything along, you'll know the type of the variables and literals are easy to check. Just call a int division function if they use two ints or normal division otherwise.

    It would get more complex when using longs or things but I'm pretty sure in most cases you shouldn't need to pass info through the parser that the actual language wouldn't send through it's compiler.

  • Moderator   •   almost 5 years ago

    Fair enough! So can I infer from this that you have interest in writing a parser for the challenge? :)

  •   •   almost 5 years ago

    That would be a fairly safe assumption yes. I've been looking at python as it's what I know the best so will probably be the easiest of the more complex ones for me. I might take a crack at something else though once I get the hang of it.

Comments are closed.