feat: Add support for LEFT JOIN LATERAL#21352
feat: Add support for LEFT JOIN LATERAL#21352neilconway wants to merge 2 commits intoapache:mainfrom
LEFT JOIN LATERAL#21352Conversation
LEFT JOIN LATERAL
crm26
left a comment
There was a problem hiding this comment.
Reviewed the approach. The three-way ON clause handling (Inner → post-join filter, LEFT without scalar agg → merge into join ON, LEFT with scalar agg → CASE WHEN nullification) is well-reasoned and avoids the count-bug compensation conflict.
We have 8 production views that use LATERAL JOIN for date range expansion (generate_series patterns) that currently require manual refactoring for DataFusion. This PR unblocks them directly.
Tested the existing basic LATERAL support — clean. Looking forward to LEFT JOIN LATERAL landing.
crm26
left a comment
There was a problem hiding this comment.
Follow-up: validated locally beyond the initial review.
- Full optimizer test suite (656 tests): all pass, no regressions
- All sqllogictest files (lateral_join.slt, joins.slt): pass
- Clippy clean
- Code reviewed: three-way ON clause handling is correct — INNER → post-filter, LEFT without scalar agg → merged ON, LEFT with scalar agg → CASE WHEN nullification
- Edge cases verified: empty right side, NULL join keys, nested laterals, non-trivial ON with COUNT, multi-scope correlation guard
Clean merge against current main (8 commits behind, no conflicts in optimizer or lateral code).
|
@crm26 Thanks for the review! And I'm glad to hear that you'll find lateral joins to be helpful. If you have more feedback on the feature in the future, please share it! |
Which issue does this PR close?
Rationale for this change
This PR adds support for LEFT join semantics for lateral joins. This is a bit tricky because of how it interacts with compensation for the "count bug". This might be easiest to illustrate with an example; consider this query (Q1):
The initial decorrelation (Q2) is
Ignoring the user's original
ONclause for now. This initial query is wrong, becauset1rows that don't have a match int2will get all-NULLvalues, not0forcount(*)of an empty set. This is the "count bug", and we compensate for that by checking for rows when__always_trueisNULL, and replacing the agg value with the default for that agg (Q3):Now we just need to handle the user's original
ONclause. We can't add this to the rewrittenONclause in Q1, because we don't want the count-bug compensation to fire. But we also can't just add it to theWHEREclause, because we need left join semantics. So we can instead wrap anotherCASEthat re-checks theONcondition and substitutesNULLfor every right-side column:What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
Support for a new feature.