You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the SQL spec, when handling EXCEPT ALL the number of copies returned of a given record is the maximum of 0 OR the number of copies in the LHS minus the RHS.
Specifically:
b) If a set operator is specified, then the result of applying
the set operator is a table containing the following rows:
i) Let R be a row that is a duplicate of some row in T1 or of
some row in T2 or both. Let m be the number of duplicates
of R in T1 and let n be the number of duplicates of R in
T2, where m � 0 and n � 0.
...
iii) If ALL is specified, then
Case:
1) If UNION is specified, then the number of duplicates of
R that T contains is (m + n).
2) If EXCEPT is specified, then the number of duplicates of
R that T contains is the maximum of (m - n) and 0.
3) If INTERSECT is specified, then the number of duplicates
of R that T contains is the minimum of m and n.
DataFusion currently removes all copies of a record if it is present in the RHS.
To Reproduce
The following query
➜ ~ datafusion-cli
DataFusion CLI v42.0.0
> SELECT * FROM VALUES ('a'), ('b'), ('b'), ('c'), ('c'), ('c')
EXCEPT ALL
SELECT * FROM VALUES ('b'), ('c');
+---------+
| column1 |
+---------+
| a |
+---------+
returns 0 copies of ('b') and ('c') which does not match the behaviour from the spec.
Expected behavior
According to the SQL spec there should be 1 copy of ('b') and 2 copies of ('c')
Describe the bug
According to the SQL spec, when handling EXCEPT ALL the number of copies returned of a given record is the maximum of 0 OR the number of copies in the LHS minus the RHS.
Specifically:
DataFusion currently removes all copies of a record if it is present in the RHS.
To Reproduce
The following query
returns 0 copies of
('b')
and('c')
which does not match the behaviour from the spec.Expected behavior
According to the SQL spec there should be 1 copy of
('b')
and 2 copies of('c')
Additional context
See DB Fiddle for Postgres, which showcases the expected results
https://www.db-fiddle.com/f/ja4BG5CfyEvak5ScoBwCZr/1
The text was updated successfully, but these errors were encountered: