-
Notifications
You must be signed in to change notification settings - Fork 884
Proposal: Flatten the AST to speed up tsgo #3259
Description
We may be able to improve tsgo performance by changing the AST representation from a pointer-based tree to a flat array.
Problem
More than 20% of tsgo’s execution time is spent in GC, so optimizing GC should make it run faster.
For example, when running against the VSCode repository with GOGC=off, execution becomes 1.24 times faster:
| Repository | default (s) | GOGC=off (s) | Ratio | Command |
|---|---|---|---|---|
| VSCode | 6.567 | 5.292 | 1.24 ± 0.11 times faster | npm run compile-check-ts-native |
According to pprof, most of the GC time is spent scanning objects, and the parser AST accounts for about 50% of inuse_space. This suggests that reducing the scan cost of the AST could reduce GC time.
Proposed solution
I propose adopting a flat AST. Flattening an AST means converting its tree data structure into a simple slice.
As background, Go’s garbage collector does not scan the backing array of a slice when the slice element type contains no pointers. This optimization can eliminate the scan cost of large arrays.
If we can apply this to the AST, we should be able to reduce the amount of memory scanned during each GC cycle, which could reduce GC time.
As a first step, we could remove pointers from the Node struct like this:
type Arena struct {
Node []Node
Data []nodeData // still contains 2 pointers because of interface boxing
}
type NodeId uint32
type Node struct {
Parent NodeId // replace the parent pointer
Kind Kind
Flags NodeFlags
Loc core.TextRange
}
This would reduce the number of pointers in the AST and allow scanning to be skipped for []Node.
Note that nodeData still contains pointers. It may be possible to optimize this further by making Arena hold separate arrays for each type.
Trade-offs
Pros:
- Better locality, which may improve CPU cache efficiency
- Pointer-free
Node
Cons:
- Less intuitive API
- More complex dynamic updates and deletion in arrays