Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ProjectionPushdown into datafusion-physical-optimizer crate #14184

Closed
Tracked by #11502
alamb opened this issue Jan 18, 2025 · 13 comments
Closed
Tracked by #11502

Move ProjectionPushdown into datafusion-physical-optimizer crate #14184

alamb opened this issue Jan 18, 2025 · 13 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

Is your feature request related to a problem or challenge?

Historically DataFusion was one (very) large crate datafusion, and as it grew bigger we extracted various functionality into separate crates. This leads to both faster compile times (as the crates can be compiled in parallel) as well easier to navigate code (as the crates force a cleaner dependency separation)

One project we have not yet completed is to extract the physical optimizer passes out #11502

Describe the solution you'd like

Extract the ProjectionPushdown from the datafusion core crate to the datafusion-physical-optimizer crate

Describe alternatives you've considered

Move the code

Notes that due to dependencies (e.g. on SessionContext or functions), you may have to move the tests into the core_integration tests here:

Additional context

Here are some example PRs

I think this is a good first issue as it is mostly mechanical, has several good examples, and does not require deep internals knowledge

@alamb alamb added the enhancement New feature or request label Jan 18, 2025
@alamb alamb added the good first issue Good for newcomers label Jan 18, 2025
@logan-keede
Copy link
Contributor

logan-keede commented Jan 18, 2025

take

@logan-keede
Copy link
Contributor

I think this would require making a separate crate for datasource since ProjectionPushdown has a dependency on datasource in core unless I am missing some major and glaring fact.

use crate::datasource::physical_plan::CsvExec;

If it is as I believe, I think that deserves a whole issue to itself.
On a side note, would it make sense to open an issue to update imports all over the core module(don't use it if it is not necessary)? I think that should make the whole refactoring a bit less daunting.

- use crate::error::Result;
+ use datafusion_common::error::Result;

@logan-keede
Copy link
Contributor

@alamb Please let me know your thoughts on this and #14190, whenever possible,
Thanks,
Logan

@logan-keede
Copy link
Contributor

cc @berkaysynnada

@berkaysynnada
Copy link
Contributor

There are 2 options:
Plan A:

  1. Decrease the functionality, and take back projections pushdown feature over CsvExec's.
  2. Then, this rule has no dependency with datasource, and can be easily moved to PhysicalOptimizer crate.
  3. Add an API to ExecutionPlan like handles_projection_pushdown, and implement it for all operators, including CsvExec. That will bring projection pushdown over CsvExec back.

Plan B:

  1. We first add that handles_projection_pushdown API in ExecutionPlan's, and eliminate the dependency on datasource in projection_pushdown.rs
  2. Then complete the migration

@buraksenn
Copy link
Contributor

take

@buraksenn
Copy link
Contributor

I think I understand @berkaysynnada proposal, currently working on moving swapping to ExecutionPlan trait and will open a PR soon to verify approach

@buraksenn
Copy link
Contributor

Opened #14235 for this please review when you find time @berkaysynnada @alamb

@logan-keede
Copy link
Contributor

take

@alamb
I thought this issue was assigned to me, is this an issue with the bot?

@buraksenn
Copy link
Contributor

buraksenn commented Jan 23, 2025

take

@alamb I thought this issue was assigned to me, is this an issue with the bot?

I did not saw any assignments to this sorry, that's why I took it. I can unassign. My PR actually does not move ProjectionPushdown so I think after that you can move it

EDIT: I don't see unassign me option not sure why

Image

@logan-keede
Copy link
Contributor

I did not saw any assignments to this sorry, that's why I took it. I can unassign. My PR actually does not move ProjectionPushdown so I think after that you can move it

no biggies, I had not really started working on it so that's fine, I might have needed some help either way.

maybe we can work on this together(if that is fine with you), so no need to unassign.

EDIT: I don't see unassign me option not sure why

I am not really sure if this behaviour of unassign/take is intended or not.

@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2025

I did not saw any assignments to this sorry, that's why I took it. I can unassign. My PR actually does not move ProjectionPushdown so I think after that you can move it

No worries -- we tried to document the norms here: https://datafusion.apache.org/contributor-guide/index.html#open-contribution-and-assigning-tickets

Basically what you both are doing (good communication) in my mind is the key thing -- exactly who is assigned in github is just a sometimes convenience means for that communication, not any sort of requirement

@alamb
Copy link
Contributor Author

alamb commented Jan 26, 2025

@alamb alamb closed this as completed Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants