Skip to content

SBNovaScript/go-check-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gocheck

A Python syntax checker and linter written entirely in Go. It parses Python source files, reports syntax errors, detects lint issues concurrently, and fixes them in place using an operational transform engine.

Why gocheck exists

Python's ecosystem has excellent linters (pylint, flake8, ruff), but they all require a Python runtime. gocheck is a single static binary with no dependencies. It leverages Go's concurrency model to check files in parallel and run lint rules concurrently within each file. The operational transform engine resolves conflicting fixes by priority so that all non-overlapping fixes are applied in a single pass.

Installation

go install gocheck/cmd/gocheck@latest

Or build from source:

git clone <repo-url>
cd go-check-python
go build -o gocheck ./cmd/gocheck/

Usage

gocheck [flags] <files or directories...>

Directories are walked recursively for .py files.

Flags

Flag Default Description
--fix false Apply fixes in-place instead of showing a diff
--rules all Comma-separated list of rules to enable
--exclude none Comma-separated list of rules to disable
--workers CPU count Maximum concurrent file workers
--quiet false Only output errors and fix summaries
--verbose false Show skipped fixes and per-rule details
--no-color false Disable colored output
--version Print version and exit

Exit codes

Code Meaning
0 No issues, or all issues fixed with --fix
1 Syntax errors found
2 Lint issues found (diff mode only)
3 Internal error (bad arguments, I/O failure)

Examples

Check a single file:

gocheck app.py

Check an entire project:

gocheck src/

Auto-fix all issues:

gocheck --fix src/

Run only whitespace and import rules:

gocheck --rules trailing-whitespace,indentation,unused-import src/

Disable a specific rule:

gocheck --exclude broad-except src/

Output: diff mode (default)

src/app.py:3:5: [trailing-whitespace] trailing whitespace
src/app.py:8:1: [unused-import] imported name 'os' is unused
--- src/app.py
+++ src/app.py (fixed)
@@ -1,5 +1,4 @@
-import os
 import sys
-x = 1
+x = 1
 y = 2

Output: fix mode (--fix)

Fixed src/app.py:
  - [trailing-whitespace] line 3: trailing whitespace
  - [unused-import] line 1: imported name 'os' is unused
  2 fixes applied

Pipeline

Each file goes through two passes:

Source bytes
  --> Tokenizer (lexer)
  --> Iterative parser --> syntax errors? --> report and stop
                       --> AST
  --> Concurrent lint rules (goroutine per rule)
  --> Operational Transform engine (conflict resolution + apply)
  --> Fixed source bytes (write or diff)

Concurrency operates at two levels. A worker pool processes multiple files in parallel (bounded by --workers). Within each file, all lint rules run as concurrent goroutines (bounded by GOMAXPROCS).

Lint rules

Whitespace

Rule Detects Fix
trailing-whitespace Lines ending with spaces or tabs Strips trailing whitespace
blank-line-spacing Fewer or more than 2 blank lines between top-level function/class definitions Normalizes to exactly 2 blank lines
indentation Tab characters used for indentation Replaces each tab with 4 spaces
no-trailing-newline File missing a final newline Appends \n

Imports

Rule Detects Fix
unused-import Imported name never referenced anywhere in the file Removes the import statement
import-sorting Import blocks not sorted alphabetically Re-sorts the entire import block
duplicate-import Same module imported more than once Removes the duplicate

Naming

Rule Detects Fix
function-naming Function names not in snake_case (skips dunder methods) Renames to snake_case
class-naming Class names not in PascalCase Renames to PascalCase
constant-naming Module-level names with uppercase characters not in UPPER_SNAKE_CASE Renames to UPPER_SNAKE_CASE
variable-naming Variables inside functions not in snake_case (skips _-prefixed and dunder names) Renames to snake_case

Style

Rule Detects Fix
none-comparison == None or != None Rewrites to is None / is not None
bool-comparison == True, == False, != True, != False Simplifies to x, not x, etc.
not-is-not-in not x in y or not x is y Rewrites to x not in y / x is not y
unnecessary-parens Redundant parentheses after return, if, while, assert Removes the outer parentheses

Safety

Rule Detects Fix
mutable-default-arg Mutable literals ([], {}) as default argument values Replaces default with None and inserts a guard at the top of the function body
bare-except except: with no exception type Rewrites to except Exception:
broad-except except Exception: (overly broad) Report only, no auto-fix

Dead code

Rule Detects Fix
unreachable-code Statements after return, break, continue, or raise in the same block Removes the unreachable statements
unused-variable Assigned variable never read in the same function (skips _-prefixed names) Renames to _

Algorithms

Iterative parser with Pratt expression parsing

The parser avoids recursion entirely. It maintains an explicit stack of frames, where each frame is a state machine for a single grammar production (e.g., if statement, function definition, binary expression). The main loop repeatedly peeks at the top frame, advances its state by consuming tokens or pushing child frames, and pops completed frames back to their parents.

This design eliminates stack overflow on deeply nested Python and makes error recovery straightforward: on a syntax error, the parser skips tokens until a synchronization point (a newline at indent level 0, or a keyword like def, class, if) and continues parsing from there to collect multiple errors in a single pass.

Expression parsing uses the Pratt algorithm (top-down operator precedence). Each token type maps to a binding power that determines how tightly it binds to its operands. The parser compares the current operator's binding power against the minimum binding power of the current frame to decide whether to extend the current expression or return it to the parent:

Binding power (lowest to highest):
  lambda < ternary < or < and < not < comparisons
  < bitwise or < xor < bitwise and < shifts
  < add/sub < mul/div < unary < power < await
  < call/subscript/attribute

Chained comparisons (a < b < c) are handled specially: instead of nesting binary operators, the parser collects all operators and comparators into a single Compare node, matching Python's semantics where each operand is evaluated once.

Operational Transform engine

When multiple lint rules run concurrently, they each produce fix operations against the original source bytes. Fixes may overlap (e.g., an unused-import rule deleting a line that an import-sorting rule also wants to reorder). The OT engine resolves this in three steps:

1. Sort. Fixes are sorted by start offset (ascending), with ties broken by priority (descending). This ensures a deterministic left-to-right sweep.

2. Conflict resolution. A single sweep identifies clusters of overlapping fixes. Two fixes conflict when the second starts before the first ends. Within each cluster, the fix with the highest priority wins and the rest are discarded (reported as "skipped due to conflict"). After this pass, all remaining fixes are non-overlapping.

3. Apply. The non-overlapping fixes are applied left-to-right in a single pass over the source bytes. A cursor tracks the current position; for each fix, the engine copies unchanged bytes from the cursor to the fix's start offset, writes the replacement text, and advances the cursor past the fix's end offset. This produces the corrected source in a single allocation.

The entire process is O(n log n) for n fixes (dominated by the sort), plus a single O(m) pass over the source bytes.

Unified diff with LCS

The diff generator compares the original and fixed source to produce standard unified diff output. It uses the Longest Common Subsequence (LCS) algorithm to identify which lines are shared between the two versions.

LCS computation uses dynamic programming. A 2D table is built where table[i][j] holds the length of the longest common subsequence of the first i lines of the old file and the first j lines of the new file:

  • If old[i-1] == new[j-1]: table[i][j] = table[i-1][j-1] + 1
  • Otherwise: table[i][j] = max(table[i-1][j], table[i][j-1])

Backtracking from table[m][n] recovers the actual subsequence. Lines present in the old file but not in the LCS are removals; lines present in the new file but not in the LCS are additions; LCS lines are context.

Hunk generation groups changes with 3 lines of surrounding context. Adjacent changes close enough to share context are merged into a single hunk. The output follows the standard unified diff format compatible with patch and git diff.

Concurrent rule engine

The lint engine fans out each registered rule as a separate goroutine. A semaphore channel (capacity = GOMAXPROCS) bounds the number of rules executing simultaneously to avoid over-subscribing CPU cores. Each rule receives the AST and the raw source bytes and returns a slice of fix operations. A mutex protects the shared result slice during collection. After all goroutines complete, the combined fixes are passed to the OT engine.

Rules are pure functions with no shared mutable state, making concurrent execution safe without fine-grained locking.

Project structure

go-check-python/
  cmd/gocheck/         CLI entry point — flag parsing, file walking, output
  token/               Token type definitions, positions, spans
  lexer/               Python tokenizer — source bytes to token stream
  ast/                 AST node types and tree traversal utilities
  parser/              Iterative stack-based Python parser
    parser.go            Infrastructure — stack, frames, main loop
    expr.go              Pratt expression parser
    stmt.go              Statement production state machines
  linter/
    engine.go            Concurrent rule execution engine
    rules/
      rule.go            Rule interface, Fix type, global registry
      whitespace.go      4 whitespace rules
      imports.go         3 import rules
      naming.go          4 naming convention rules
      style.go           4 style rules
      safety.go          3 safety rules
      deadcode.go        2 dead code rules
  fixer/
    ot.go                Operational Transform — sort, conflict resolve, apply
    diff.go              Unified diff generation with LCS
  testdata/              Test fixtures for integration tests

Testing

go test ./...

The test suite covers:

  • Lexer (21 tests): token types, positions, indentation, strings, numbers, operators, line joining
  • Parser (44 tests): every expression and statement type, precedence, error recovery
  • OT engine (8 tests): single/multiple fixes, conflict resolution, insert/delete, offset transform
  • Diff (2 tests): unified diff output, no-change case
  • Lint rules (57 tests): per-rule detection and fix verification
  • Lint engine (4 tests): concurrent execution, rule filtering
  • Integration (4 tests): end-to-end syntax check, diff mode, fix mode, clean file

License

MIT

About

Python syntax checker and linter written entirely in Go.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages