DEV Community

Lahari Tenneti
Lahari Tenneti

Posted on

LLVM #1 - Lexer, Parser, Codegen

The Lexer/Scanner and Parser aren’t very different from what we’ve done earlier. Both are initial/front-end phases of compilation, after which the code is converted into LLVM IR.

What I built: Commits 15710fe, 2562e61


What I understood:

1) The Lexer:

  • Reads raw text input one character at a time and groups them into meaningful chunks of code called Tokens.
  • enum Token defines the kinds of words the language can process. gettok() loops through characters, skips whitespaces, recognises keywords/numbers, and handles comments too (by skipping text after #)

2) AST:

  • Once the tokens are ready, we need to understand how they relate to each other. A tree is the best way to do the same.
  • ExprAST is the base/parent class for all the nodes.
  • public: virtual ~ExprAST() = default;: Here, Expr objects will later be manipulated through pointers to the base (ExprAST) class.
  • Without virtual, if we wish to delete a subclass node, only the base class' destructor will be called, not the subclass' destructor. Any data stored in the subclass won't be deleted and can affect other areas through leaks.
  • Thus, we use virtual to delete/release all memory/resources used by the derived (sub)class.
  • Similarly, NumberExprAST' represents a number,VariableExprASTrepresents a variable,BinaryExprASTrepresents an operator with two operands,CallExprASTrepresents a function call,PrototypeASTrepresents a function signature (not the body), andFunctionAST` represents a full function (prototype + body)

3) The Parser:

  • Uses operator-precedence parsing for building AST objects for binary expressions and recursive descent parsing for everything else.
  • Ex: On seeing a number token, it creates a NumberExprAST

4) Code Generation:

  • This is LLVM specific. We traverse through the AST we've built and ask each node to codegen()
  • Ex: In NumberExprAST, we use ConstantFP::get to create a floating-point constant in LLVM's internal format.
  • NamedValues is a symbol table/dictionary for remembering where (in which specific memory location/register) a variable is stored.
  • Note: LLVM uses Static Single Assignment (SSA), meaning its "virtual registers" can only be assigned once. It's why we see names like %multmp and %multmp1 in our results.
  • BinaryExprAST::codegen is for generating code for the left and right sides recursively.
  • Then, we use Builder to create the math instruction (like Builder->CreateFAdd for float addition)
  • For <, it converts the result to a float (0.0 or 1.0) as Kaleidoscope only uses doubles.
  • Similarly, FunctionAST::codegen creates a new function in the LLVM Module
  • Module here is like a project folder; a container that holds all our functions together so they can see and call each other.
  • It creates a block called entry. LLVM code lives inside such basic blocks (chunks of instructions)
  • It tells the Builder to start writing instructions into this new block.
  • It adds the function arguments to NamedValues so the body can use them.
  • Finally, it creates a ret instruction to finish the function.

5) Driver Code:

  • The MainLoop is an infinite loop printing ready> waiting for the user to type.
  • It switches based on what is typed (def, `extern, or just math) and calls the appropriate handler.
  • HandleDefinition parses the code, generates the LLVM IR, and prints it to the screen.
  • It also sets up the operator precedence so the math logic works correctly, initialises the module, and starts the loop.

Let’s understand this through an example:

  • def foo(x) x + 1 is converted by the Lexer into [tok_def] [tok_identifier “foo”] [ ( ] [tok_identifier “x”] [ ) ] [tok_identifier “x”] [ + ] [tok_number1]
  • The Parser turns it into a FunctionAST object.
  • The codegen turns that object into LLVM IR define double @foo(double %x) { … }

Results:


What's next: Optimising and Just-in-time (JIT) compilation!


Musings:
The one thing that calms me the most is playing the piano. You see, mindfulness and being focused is a rather difficult thing, especially in the attention economy. So it feels particularly wonderful and refreshing when an activity demands our time, patience, and full focus, lest we compromise on the quality of its outcome. “Right hand first, left hand next, and then do it together.” I feel like every single inch of my brain is dedicated to getting that one bar right. And nothing feels more rewarding than seeing (or maybe hearing?) the fruits of our labour. Whenever I’ve felt bored, lost, confused, overwhelmed, or even happy—the piano has helped me move on feeling better. My mom wanted us to learn some or the other instrument. I was merely “okay” with the idea of learning the piano, but never could I have fathomed how important it would become to me. Life’s little surprises are often like that, with some of the most beautiful experiences coming to us when we least expect it. So it doesn’t hurt to try and be a little open.

Top comments (0)