Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More validation for pointer casts? Use buffer protocol for importing data? #78

Open
kylebarron opened this issue Jan 9, 2025 · 2 comments

Comments

@kylebarron
Copy link

I was reading through the code for accessing the numpy data:

zarrs-python/src/lib.rs

Lines 157 to 177 in cca07fc

fn py_untyped_array_to_array_object<'a>(
value: &Bound<'a, PyUntypedArray>,
) -> &'a PyArrayObject {
let array_object_ptr: *mut PyArrayObject = value.as_array_ptr();
unsafe {
// SAFETY: array_object_ptr cannot be null
&*array_object_ptr
}
}
fn nparray_to_slice<'a>(value: &'a Bound<'_, PyUntypedArray>) -> &'a [u8] {
let array_object: &PyArrayObject = Self::py_untyped_array_to_array_object(value);
let array_data = array_object.data.cast::<u8>();
let array_len = value.len() * Self::pyarray_itemsize(value);
let slice = unsafe {
// SAFETY: array_data is a valid pointer to a u8 array of length array_len
debug_assert!(!array_data.is_null());
std::slice::from_raw_parts(array_data, array_len)
};
slice
}

I'm not sure I'm well versed enough to assert that it could be unsound, but it looks a little quick to cast the pointer. It seems to expect that the input array is C-order contiguous, but doesn't provide any validation to that effect.

Instead of doing raw unsafe casts through pointers, you could use the safe as_slice method to access the underlying &[u8] of a numpy array.

Alternatively, you could use the buffer protocol to access data. Which is unsafe for different reasons, and requires the Python user to not mutate your buffers, but more flexible for varied inputs. https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/

@LDeakin
Copy link
Collaborator

LDeakin commented Jan 9, 2025

I'm not sure I'm well versed enough to assert that it could be unsound

Neither 🙃. I rely on miri for stuff like this, but we haven't run it over this crate. I don't even know if we can with all the interop.

Instead of doing raw unsafe casts through pointers, you could use the safe as_slice method to access the underlying &[u8] of a numpy array.

Looks like PyReadonlyArray is type-dependent, so it might be a lot of fluff to get there from a PyUntypedArray.

It seems to expect that the input array is C-order contiguous, but doesn't provide any validation to that effect

Indeed! We are validating before calling the function, but validation should be inside it

@kylebarron
Copy link
Author

Instead of doing raw unsafe casts through pointers, you could use the safe as_slice method to access the underlying &[u8] of a numpy array.

Looks like PyReadonlyArray is type-dependent, so it might be a lot of fluff to get there from a PyUntypedArray.

It is. I'm not well versed enough in your code to know what kind of array you'd expect there. However if you only ever need a &[u8] here, then one way to fix this is to call array.view("uint8") in Python, and then you'll always be able to extract the output view as a PyReadonlyArray<u8>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants