Building JSONiq Language Tools, Part 4

In the previous post, we transformed the ANTLR parse tree into a simpler parser AST. This isolates most parser-specific details inside the adapters and makes it easier to support different parsers at runtime. For example:

jsoniq

let $x := 1
return (
    let $x := 2
    return $x
)

flowchart TB
    M["module"]

    M --> F1["flowr-expression"]
    F1 --> L1["let-binding<br/>name: $x"]
    F1 --> F2["flowr-expression"]

    F2 --> L2["let-binding<br/>name: $x"]
    F2 --> R1["variable-reference<br/>name: $x"]

This parser AST, while useful, is still not enough to provide a good developer experience. It records a variable reference named $x, but it does not know which declaration that reference denotes. Features such as "go to definition" and "find all references" require declarations and references to be resolved according to the language's scope rules.

H2 Analysis AST

Based on the parser AST, an analysis AST can be built to represent the semantic structure of the program. The analysis builder traverses the parser AST, but the transformation is not one-to-one. Some parser nodes create analysis nodes, some only affect scope construction, and others disappear after their children are processed.

Consequently, the analysis AST has fewer node types than the parser AST. For example, a flowr-expression parser node creates a scope so references can be resolved correctly, but it does not produce a corresponding analysis node. Different declarations share the same analysis node shape, while the attached definition preserves whether the declaration is a variable, parameter, function, namespace, or type. The following node types are enough for the current semantic model:

module
declaration
reference
function-call
argument

For the example above, the analysis AST would look like this:

flowchart TB
    M["module"]
    M --> D1["declaration<br/>outer let $x"]
    M --> D2["declaration<br/>inner let $x"]
    M --> R["reference<br/>$x"]

    R -. "Resolution points to" .-> D2

H2 Definitions

The analysis model contains several types of source definitions:

Variable declarations: introduced by let, for, count, group by, declare variable, and other binding constructs.
Function declarations: introduced by declare function.
Parameter declarations: introduced by function parameters.
Namespace declarations: introduced by declare namespace.
Type declarations: introduced by declare type.

Built-in functions also participate in reference resolution even though they are not declared in the source program. They belong to the definition model, but they are not inserted as declaration nodes in the source analysis AST.

Even though source definitions share the same declaration node type in the analysis AST, the attached definition preserves kind-specific information. For example, a function definition has a list of parameters, while a variable definition does not.

A definition can be modeled as:

typescript

export type VariableKind =
    | "declare-variable"
    | "let"
    | "for"
    | "for-position"
    | "group-by"
    | "count"
    | "catch-variable";

export type DeclarationKind = VariableKind | "namespace" | "type" | "parameter" | "function";
export type DefinitionKind = DeclarationKind | "builtin-function";

interface AbstractDefinition<K extends DefinitionKind> {
    name: DefinitionNameByKind[K];
    kind: K;

    // List of references that resolve to this declaration.
    references: ResolvedReference[];

    isBuiltin: boolean;
}

H2 Scope model

To resolve references, the analysis introduces the concept of scope. A scope is a region of the program containing a set of declarations. Scopes can be nested, and a reference resolves to the most recently registered matching declaration in the current scope or, if none exists, in a parent scope.

Scopes can be modeled as a tree. Each scope stores the definitions declared within it and a list of nested child scopes. Lookup continues through its parent when no matching definition is found locally.

In the previous example, we have two scopes: the outer scope that contains the declaration of $x, and the inner scope that contains the declaration of $x and the reference to $x. The inner scope is nested within the outer scope.

flowchart TB
    M["module"]

    subgraph S1["Outer FLOWR scope"]
        D1["declaration<br/>let $x"]

        subgraph S2["Inner FLOWR scope"]
            D2["declaration<br/>let $x"]
            R1["reference<br/>$x"]
            R1 -. "resolves to" .-> D2
        end
    end

    M --> D1

Analysis starts with a module scope, which is the root scope of the program. New scopes are created when entering constructs such as function bodies, FLOWR expressions, and catch clauses. When the construct has been processed, analysis returns to the parent scope.

A source definition also has a visibleFrom field containing the source offset from which it should be returned by position-based queries. Normal reference resolution does not need this field; it depends on the order in which declarations are registered during traversal. visibleFrom is used later by features such as completion to filter definitions according to the cursor position.

H2 Visibility rules

Traversal order is important for reference resolution. For let, for, and similar bindings, the expression that produces the value is visited before the new declaration is registered. A reference inside the initializer therefore resolves using the declarations that were already available, while references in subsequent clauses can use the new binding.

Function declarations are handled differently to support recursion. The function definition is registered in the module scope before the function body is visited. Its visibleFrom value is the end of its selectionRange, which is used by position-based queries; recursion itself works because the declaration has already been registered before analysis enters the function body.

Consequently, this example resolves correctly:

jsoniq

declare function local:f($x) {
    local:f($x)
};

In the following example, the $x in the initializer is unresolved because the new binding has not yet been registered. The $x in the return expression does resolve to the newly declared variable:

jsoniq

let $x := $x + 1
return $x

H2 Qualified-name resolution

In the previous chapter, names were divided into three forms: unprefixed, prefixed, and URI-qualified:

jsoniq

a
local:f
Q{https://example.com}a

URI-qualified names already contain their namespace URI and can be used directly. Unprefixed names retain only their local name at this stage. Prefixed names require a namespace lookup.

The analysis begins with a module-level map of built-in namespace prefixes and extends it with namespace declarations found in the source. When a prefixed name is encountered, its prefix is looked up in this map. A matching declaration adds the namespace URI to the resolved QName; an unknown prefix produces a warning diagnostic.

The resolved representation keeps the local name and includes a namespace URI when one is available. It may also retain the original prefix for display purposes.

H2 Resolving references

Once the name has been resolved, the current scope is searched for a matching declaration. Function references first check the built-in function definitions and then fall back to lexical scope lookup.

If no matching definition is found, the reference node is still added to the analysis AST, but its resolution remains undefined. The analysis builder also emits an unresolved-variable or unresolved-function diagnostic. Keeping the unresolved node allows other features to continue operating on the surrounding tree.

flowchart TD
    A["Lexical reference"] --> B["Resolve namespace"]
    B --> C["Search current scope"]
    C --> D{"Declaration found?"}

    D -->|Yes| E["Create resolved reference"]
    D -->|No| F{"Parent scope exists?"}

    F -->|Yes| C
    F -->|No| G["Report unresolved reference"]

H2 Features built on top of the analysis AST

Once we have the analysis AST, we can implement a lot of features that are useful for developers. The following table shows some of the features that can be implemented on top of the analysis AST:

Analysis information	Feature
Resolved declaration	Go to definition
Definition reference list	Find references
Declaration and references	Rename
Visible scope contents	Completion
Resolved function and arguments	Signature help
Definition kind and range	Semantic tokens and symbols

These features will be discussed in more detail in the next post.