Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional dependency shouldn't be preserved in UNION logical plan #12980

Open
Sevenannn opened this issue Oct 17, 2024 · 0 comments · May be fixed by #12979
Open

Functional dependency shouldn't be preserved in UNION logical plan #12980

Sevenannn opened this issue Oct 17, 2024 · 0 comments · May be fixed by #12979
Labels
bug Something isn't working

Comments

@Sevenannn
Copy link
Contributor

Describe the bug

When the datafusion logical planner build the AGGREGATE plan, it adds additional columns in the group_expr based on the functional dependencies. However, for queries that are aggregating upon table obatined through UNION operation, the functional dependency is still preserved in the schema of UNION plan, while the functional dependency no longer retains after the UNION. This causes wrong column being added as group_by column in aggregation plan

To Reproduce

Query involves aggregation on UNION will cause the issue. For example, the query below:

with t1 as (
    select i_manufact_id, count(*) as extra from item
    group by i_manufact_id
),
t2 as (
    select i_manufact_id, count(*) as extra from item
    group by i_manufact_id
)
select i_manufact_id, sum(extra)
 from  (select * from t1
        union all
        select * from t2) tmp1
 group by i_manufact_id
 order by i_manufact_id;

This will lead to a logical plan that involves wrong extra column in Aggregate
Aggregate: groupBy=[[tmp1.i_manufact_id, tmp1.extra]], aggr=[[sum(tmp1. extra)]]

Expected behavior

UNION logical plan shouldn't retain functional dependencies from the tables involved in UNION. In the example below, both Table1 and Table2 has the functional dependency col1 -> col2. However, when select * from table1 UNION select * from table2, the functional dependency col1 -> col2 no longer holds.
Table 1:

col1 | col2
-----|-----
a    | 1
b    | 2

Table 2:

col1 | col2
-----|-----
a    | 2
b    | 4

Additional context

This bug is causing wrong results for running TPCDS query 33, 56, 60, 66 - duplicated groups are presented in results

@Sevenannn Sevenannn added the bug Something isn't working label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant