-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sds performance peculiarities #284
Comments
zhangyingmath
changed the title
why sds has big performance difference on float versus int?
why sds has big performance difference on int64 versus int32?
Jan 11, 2022
rt version 1.0.57. I expanded the above experiment a bit. Here is a table: N = int(80e6)
To compare, here is a similar table with h5. Write with the hdf5 package, with default compression blosc:lz4, comp level 6, written as a dict of numpy arrays. Load with rt.load_h5, which calls the same underlying package.
|
zhangyingmath
changed the title
why sds has big performance difference on int64 versus int32?
sds performance peculiarities
Jan 11, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is not necessarily a bug, but it really surprised me as a user.
Create a test dataset of 80Million rows, one column of float64,
ds = rt.Dataset({'f' : [1.0 for i in range(int(8e7))]})
save to sds: 139ms. Load back: 220ms. Size on disk: 60KB.
Create a similar dataset, except a column of int32,
save to sds: 69ms. Load back: 116ms. Size on disk: 31K.
Create a similar dataset, except a column of int64,
ds = rt.Dataset({'f' : [1 for i in range(int(8e7))]})
save to sds: 1.5s. Load back: 1.1s. Size on disk: 20M.
Q: why is int 10 times slower than float and 300 larger in file size?
TO compare, when I use h5, regardless of float or int, the performance is similar: save is 460ms, load back is 350ms, disk size is 2.6MB.
The text was updated successfully, but these errors were encountered: