This thread delves into contemplating the concept of "Melosynthos," more aligned towards being a compiler generator rather than strictly a parser generator.
Initially, I was engrossed in a Python Lark fork while concurrently developing an unique shader language, primarily for Vulkan Compute (SPIR-V, to be precise), aiming at Machine Learning (intending to replace the Pytorch framework). Python Lark's parser generator appealed to me due to its simplicity-centric grammar syntax, prompting me to create a fork in C language. This new version was designed to support a top-down LL(k) parser algorithm and generate corresponding AST trees.
Upon successfully getting it to function, it dawned on me how complex and challenging the iterative development of a compiler could be. The task of designing a programming language and writing the compiler implementation, along with the eventual Language Server Protocol, seemed daunting for a single developer.
This realization sparked the question - could we streamline the entire process, right from the parser generator to the compilation output target? This led to the inception of the Meta-AST and subsequently, the Melosynthos project.
The Meta-AST scripting language is essentially conceptualized to interact with the generated raw AST tree, providing traversal and visitor syntax. This enables users to enhance, refine, or rectify the "Raw" AST with more comprehensive data, such as type information or context-free grammar support.
The Melosynthos compiler generator project primarily involves three stages: the standard Backus-Naur Form grammar for generating Lexer/Parser and raw AST, the Meta-AST script interacting with the AST, and the final compilation output reading the AST and printing it out.
Envision a scenario where everything is streamlined from the start, enabling the generation of any dialects or features in the language as a full compiler in the output, accompanied by an LSP server. Despite searching extensively, I couldn't find any existing tools to accomplish this.
Consequently, I began musing about the potential structure and function of Meta-AST, emphasizing its readability and familiarity for compiler designers. It borrows elements from Regex (like "^" for the start of an array and "$" for the end), functional programming for pure function transformation and analysis, and differentiation between "dialects."
Consider the following example of an AST tree represented in JSON:
{
"rule": {
"HELLO": { "Content": "Hello", "Line": "1", "Col": "1" },
"WORLD": { "Content": "World", "Line": "1", "Col": "6" },
"SET_OF_EXCLAIMATION_MARK": [
{ "EXCLAIMATION_MARK": { "Content": "!", "Line": "1", "Col": "12"} },
{ "EXCLAIMATION_MARK": { "Content": "!", "Line": "1", "Col": "13"} },
{ "EXCLAIMATION_MARK": { "Content": "!", "Line": "1", "Col": "14"} }
]
}
}
For a basic analysis of this AST tree, we could attribute the AST with the following script:
local myAST = .; // You are making a copy of the current AST
myAST.rule.SET_OF_EXCLAIMATION_MARK.summarize(
MarkCount = this.Count,
StartColumn = this[^].Col,
EndColumn = this[$].Col,
StartLine = this[^].Line,
EndLine = this[$].Line
);
This would add counts for exclamation marks, start and end columns for this particular rule in the grammar, and the start and end lines for diagnostic purposes.
I share this here to muse over the concept and encourage some discussions surrounding it. I hope it sparks some interests in this topic.
Microsoft: "Gotta keep all of the telemetries and AI running 24/7 of course!"