The Lexer/Scanner and Parser aren’t very different from what we’ve done earlier. Both are initial/front-end phases of compilation, after which the code is converted into LLVM IR.
What I built: Commits 15710fe, 2562e61
What I understood:
1) The Lexer:
- Reads raw text input one character at a time and groups them into meaningful chunks of code called Tokens.
-
enum Tokendefines the kinds of words the language can process.gettok()loops through characters, skips whitespaces, recognises keywords/numbers, and handles comments too (by skipping text after#)
2) AST:
- Once the tokens are ready, we need to understand how they relate to each other. A tree is the best way to do the same.
-
ExprASTis the base/parent class for all the nodes. -
public: virtual ~ExprAST() = default;: Here,Exprobjects will later be manipulated through pointers to the base (ExprAST) class. - Without
virtual, if we wish to delete a subclass node, only the base class' destructor will be called, not the subclass' destructor. Any data stored in the subclass won't be deleted and can affect other areas through leaks. - Thus, we use
virtualto delete/release all memory/resources used by the derived (sub)class. - Similarly,
NumberExprAST' represents a number,VariableExprASTrepresents a variable,BinaryExprASTrepresents an operator with two operands,CallExprASTrepresents a function call,PrototypeASTrepresents a function signature (not the body), andFunctionAST` represents a full function (prototype + body)
3) The Parser:
- Uses operator-precedence parsing for building AST objects for binary expressions and recursive descent parsing for everything else.
-
Ex: On seeing a number token, it creates a
NumberExprAST
4) Code Generation:
- This is LLVM specific. We traverse through the AST we've built and ask each node to
codegen() -
Ex: In
NumberExprAST, we useConstantFP::getto create a floating-point constant in LLVM's internal format. -
NamedValuesis a symbol table/dictionary for remembering where (in which specific memory location/register) a variable is stored. -
Note: LLVM uses Static Single Assignment (SSA), meaning its "virtual registers" can only be assigned once. It's why we see names like
%multmpand%multmp1in our results. -
BinaryExprAST::codegenis for generating code for the left and right sides recursively. - Then, we use
Builderto create the math instruction (likeBuilder->CreateFAddfor float addition) - For
<, it converts the result to a float (0.0 or 1.0) as Kaleidoscope only uses doubles. - Similarly,
FunctionAST::codegencreates a new function in the LLVMModule -
Modulehere is like a project folder; a container that holds all our functions together so they can see and call each other. - It creates a block called
entry. LLVM code lives inside such basic blocks (chunks of instructions) - It tells the
Builderto start writing instructions into this new block. - It adds the function arguments to
NamedValuesso the body can use them. - Finally, it creates a
retinstruction to finish the function.
5) Driver Code:
- The
MainLoopis an infinite loop printingready>waiting for the user to type. - It switches based on what is typed (
def, `extern, or just math) and calls the appropriate handler. -
HandleDefinitionparses the code, generates the LLVM IR, and prints it to the screen. - It also sets up the operator precedence so the math logic works correctly, initialises the module, and starts the loop.
Let’s understand this through an example:
-
def foo(x) x + 1is converted by the Lexer into[tok_def] [tok_identifier “foo”] [ ( ] [tok_identifier “x”] [ ) ] [tok_identifier “x”] [ + ] [tok_number1] - The Parser turns it into a
FunctionASTobject. - The codegen turns that object into LLVM IR
define double @foo(double %x) { … }
Results:
What's next: Optimising and Just-in-time (JIT) compilation!
Musings:
The one thing that calms me the most is playing the piano. You see, mindfulness and being focused is a rather difficult thing, especially in the attention economy. So it feels particularly wonderful and refreshing when an activity demands our time, patience, and full focus, lest we compromise on the quality of its outcome. “Right hand first, left hand next, and then do it together.” I feel like every single inch of my brain is dedicated to getting that one bar right. And nothing feels more rewarding than seeing (or maybe hearing?) the fruits of our labour. Whenever I’ve felt bored, lost, confused, overwhelmed, or even happy—the piano has helped me move on feeling better. My mom wanted us to learn some or the other instrument. I was merely “okay” with the idea of learning the piano, but never could I have fathomed how important it would become to me. Life’s little surprises are often like that, with some of the most beautiful experiences coming to us when we least expect it. So it doesn’t hurt to try and be a little open.





Top comments (0)