Skip to content

Proposal: Flatten the AST to speed up tsgo #3259

@no-yan

Description

@no-yan

We may be able to improve tsgo performance by changing the AST representation from a pointer-based tree to a flat array.

Problem

More than 20% of tsgo’s execution time is spent in GC, so optimizing GC should make it run faster.
For example, when running against the VSCode repository with GOGC=off, execution becomes 1.24 times faster:

Repository default (s) GOGC=off (s) Ratio Command
VSCode 6.567 5.292 1.24 ± 0.11 times faster npm run compile-check-ts-native

According to pprof, most of the GC time is spent scanning objects, and the parser AST accounts for about 50% of inuse_space. This suggests that reducing the scan cost of the AST could reduce GC time.

Proposed solution

I propose adopting a flat AST. Flattening an AST means converting its tree data structure into a simple slice.

As background, Go’s garbage collector does not scan the backing array of a slice when the slice element type contains no pointers. This optimization can eliminate the scan cost of large arrays.

If we can apply this to the AST, we should be able to reduce the amount of memory scanned during each GC cycle, which could reduce GC time.

As a first step, we could remove pointers from the Node struct like this:

type Arena struct {  
	Node    []Node  
	Data    []nodeData // still contains 2 pointers because of interface boxing  
}  
  
type NodeId uint32  
  
type Node struct {  
	Parent      NodeId // replace the parent pointer  
	Kind        Kind  
	Flags       NodeFlags  
	Loc         core.TextRange  
}

This would reduce the number of pointers in the AST and allow scanning to be skipped for []Node.

Note that nodeData still contains pointers. It may be possible to optimize this further by making Arena hold separate arrays for each type.

Trade-offs

Pros:

  • Better locality, which may improve CPU cache efficiency
  • Pointer-free Node

Cons:

  • Less intuitive API
  • More complex dynamic updates and deletion in arrays

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions