Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.putmask mangles the data if columns are not in the right order #262

Open
MarcMassar opened this issue Dec 1, 2021 · 0 comments
Open
Labels
area-dataset Issue relates to the Dataset class bug Something isn't working

Comments

@MarcMassar
Copy link

Applying Dataset.putmask on a dataset with different columns, or with the same columns but in a different order, will silently produce garbage.

The issue happens because Dataset.putmask(mask, values) just validates that self.shape == values.shape, and then it operated on the columns by index and not by name.

Here is a realistic example (rt.merge_lookup puts the 'on' columns first, which causes the issue):

>>> import riptable as rt
>>> rt.__version__
'1.1.4'
>>> prices = rt.Dataset({'price':[100., 200.], 'stock_id':[1, 2]})
>>> updates = rt.Dataset({'price':[101., 201.], 'stock_id':[1, 1]})
>>> updates_aligned = rt.merge_lookup(prices, updates, on='stock_id', columns_left=[], keep='last')
>>> prices.putmask(updates_aligned.price.isfinite(), updates_aligned)
>>> prices
#    price   stock_id
-   ------   --------
0     1.00        201
1   200.00          2
[2 rows x 2 columns] total bytes: 32.0 B
@jack-pappas jack-pappas added area-dataset Issue relates to the Dataset class bug Something isn't working labels Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dataset Issue relates to the Dataset class bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants