-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on the scope of groupby #865
Comments
That sounds good, no reason it can't work. You can group by multiple dims now but as separate axes, not as a mask. I guess we should allow grouping by any arbitrary Do you also imagine the dims of the mask are different to the main array? Like we have to use selectors like Near? |
I just realised we already have that behaviour of passing a DimArray a the same as |
No, I did not think so far yet, currently having everything share the same dims should be good enough, at least for my use cases and otherwise there is the possibility to explicitly interpolate the mask. |
The only thing I am pondering about is if it would be possible to create a groupby object with unknown classes and number of classes, because the mask array itself might be huge so you only know the resulting groups after doing the grouped aggregation. I have a few benchmarks of DAE against this library: https://flox.readthedocs.io/en/latest/intro.html . In their main example they also group by an array larger than memory. In flox, they have a keyword argument |
That is a cool idea. How do we store the group indices for masks that big? That array could also get large. Or is it also doing online stats? (Ok seems like that's what flox dies) |
The idea is to do online stats, but it would still assume that the aggregated results still fit in memory. Ofc users would have to make sure the mask array does not contain millions of classes... |
Ok guess we will need a groupby/combine function for that. Any idea what to call it? |
I am really bad at naming things. What about
Of course there will be some overlap with the |
I'm going to add (Where |
Nice, I think this might work. I will try to implement a DiakArrayEninge-based combine that builds on this interface |
I just realised allowing this will let us have the DataFrames.jl |
I am trying to implement a bit more
groupby
logic into DiskArrayEngine and was wondering if you have any thoughts on allowing grouping on other DimArrays in the future. Currently we can only group by single dimensions, which is already useful. However, imagine I have a 3d DimArraya
(lon x lat x Ti) and a 2d mask DimArraym
(lon x lat) e.g. containing country codes, ideally I would like to be able to writemean.(groupby(a, m))
to get country means where both. Is something like this on your radar or does it already exist in DD and just have a different name? Just interested to hear your thoughts before continuing to work on this in DAE to make sure the interface is somewhat compatible.The text was updated successfully, but these errors were encountered: