Use WAL with SQLite cache, fix close#21154
Use WAL with SQLite cache, fix close#21154hauntsaninja wants to merge 4 commits intopython:masterfrom
Conversation
This is the more modern way to manage concurrency with SQLite In our case, it means concurrent mypy runs using the cache will wait for each other, rather than fail SQLite also claims this is faster, but I haven't yet done a good profile (If you are profiling this, note that WAL is a persistent setting, so you will want to delete the cache) Finally, I also explicitly close the connection in main. This is relevant to this change, because it forces checkpointing of the WAL, which reduces disk space and means the cache.db remains a single self-contained file in regular use
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
Can you give some more concrete examples? Also the problem is, as I mentioned in other issue, even if there are such uses, current incremental logic was not designed for this, and it would be tricky to guarantee correctness. |
|
Anyway, I am not strongly against this per se, as I mentioned in #13916 (comment) IMO the key point is to give a loud warning, the exact best effort semantics is not as important then. Also, to be clear, although this will make SQLite cache behave like FS cache, this will not solve other concurrent-related crashes like #14521 or #18473, while disabling cache completely will fix those. |
|
Sure, can talk a little bit more about work use case. We have monorepo with lots of first party projects. These often have similar dependency graph, and sharing cache across them helps, e.g. Note this behaviour will still be a little different from FS cache (and so might reduce likelihood of those issues). Once a connection has started writing, this will block until the connection commits at the end of the build. If we do want the fallback behaviour to match FS cache, we should set |
OK, in such cases parallel mypy invocations will likely work, unless you use different options. But even in such cases I would give a warning, to make it clear that we can't guarantee correctness in general, and users can do this at their own risk.
And how exactly will this work in case of parallel type checking? In this case we want the workers to be reading/writing ~freely (because coordinator is already making sure they are scheduled in a way to guarantee correctness).
This seems like a better option, but I vaguely remember I tried it at some point, and I didn't like it in terms of performance, but it well may be I did something wrong. In general, it would be good to have performance measurements for parallel checking (say on self check with cold cache) for both these options vs status quo. This is arguably a niche use case, and I don't want to sacrifice performance for everyone else because of it. |
|
Btw couple notes on performance measurements for parallel checking:
|
|
Another concurrent case is running mypy on the CLI whilst the IDE runs it in the background via a language server -- which the user often isn't explicitly aware of. A |
This is the more modern way to manage concurrency with SQLite. Relevant to current discussion, it means concurrent mypy runs using the cache will wait for each other, rather than fail
SQLite also claims this is significantly faster, but I haven't yet done a good profile (If you are profiling this, note that WAL is a persistent setting, so you will want to delete the cache). This might also allow removing the
PRAGMA synchronous=OFFFinally, I also explicitly close the connection in main. This is relevant to this change, because it forces checkpointing of the WAL, which keeps reads fast, reduces disk space and means the cache.db remains a single self-contained file under regular use
Fixes #21136, see also discussion in #13916
For what it's worth, I feel there are many legitimate uses of concurrent mypy. At work, we often share cache between multiple projects. At home, I often end up having parallel runs with a debugger while working on mypy (although this PR just makes those ones hang waiting for the lock lol)