fix(ext/web): handle Windows file paths in URL parsing#33097
fix(ext/web): handle Windows file paths in URL parsing#33097renezander030 wants to merge 2 commits intodenoland:mainfrom
Conversation
Implement WHATWG URL spec change (url#874) to detect Windows drive letter patterns (e.g., C:\path\file.txt) in the URL parser's scheme start state and automatically convert them to file:/// URLs (file:///C:/path/file.txt). The spec adds a check: when parsing encounters a single ASCII alpha letter as the scheme buffer, followed by ':' and '\', it recognizes this as a Windows drive path rather than a URL scheme. The parser then sets the scheme to "file", the host to empty string, and transitions to path state with backslashes normalized to forward slashes. This is implemented as a preprocessing step in the Rust parse_url function, before the input reaches the rust-url crate parser.
|
Please check this comment: |
|
Thanks for the pointer. I'm aware of that comment. The difference here is that this implements the WHATWG URL spec change (url#874), which is being tracked by Chromium, Gecko, and WebKit as well. The upstream Rust The implementation is intentionally minimal and isolated (one function). Easy to remove once the upstream crate picks it up. Happy to hear from the maintainers on whether they'd prefer to wait for upstream or ship early. |
|
|
Summary
Implements the WHATWG URL spec change (whatwg/url#874) to handle Windows-style file paths in URL parsing.
When
new URL("C:\\path\\file.txt")is called, the parser now detects the Windows drive letter pattern (single ASCII alpha +:+\) and converts it to afile:///URL with normalized forward slashes:file:///C:/path/file.txt.Changes
ext/web/url.rs: Addedmaybe_convert_windows_path_to_file_url()preprocessing function that detects Windows drive letter patterns before the input reachesrust-url's parser. Called fromparse_url()for bothop_url_parseandop_url_parse_with_basecode paths.tests/unit/url_test.ts: Added tests covering basic paths, different drive letters, lowercase drives, mixed separators, paths with base URL, andURL.parse().Spec details
The WHATWG URL spec change adds a check in the "scheme start state": when the parser encounters a single ASCII alpha character as a potential scheme, followed by
:and\, it recognizes this as a Windows drive path rather than a URL scheme. It then sets scheme tofile, host to empty string, and transitions to path state.Test plan
tests/unit/url_test.tsFixes #30363