Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

Open
Jacobsonradical opened this issue Jan 16, 2025 · 3 comments

Comments

@Jacobsonradical
Copy link

Describe the bug
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/run_scoring.py", line 294, in _run_scorer_parallelizable
scoringResults = scorer.prescore(scoringArgs, preserveRatings=not runParallel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/scorer.py", line 301, in prescore
noteScores, userScores, metaScores = self._prescore_notes_and_users(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 554, in _prescore_notes_and_users
) = self._run_stable_matrix_factorization(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 449, in _run_stable_matrix_factorization
return self._run_regular_matrix_factorization(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 424, in _run_regular_matrix_factorization
return self._mfRanker.run_mf(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py", line 560, in run_mf
self._lossModule = NormalizedLoss(
^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/normalized_loss.py", line 108, in init
assert all(ratings[labelCol].values == targets.numpy())
^^^^^^^^^^^^^^^
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
"""

To Reproduce
I run the code in shell:

python3 main.py
--enrollment /root/community-note/enrollment/2024-12-29_20-02.tsv
--notes /root/community-note/note/2024-12-29_20-02.tsv
--ratings /root/community-note/rating/
--status /root/community-note/status/2024-12-29_20-02.tsv
--outdir /root/community-note/notescore
--parallel

Expected behavior
I believe that this is due to normalized_loss.py, line 108
assert all(ratings[labelCol].values == targets.numpy())

I am not sure if I should change it to
assert all(ratings[labelCol].values == targets.cpu().numpy())

Environment

  1. Same venv as in requirement
  2. NVIDIA H100 80GB HBM3 X2
  3. CUDA 12.2
  4. python 3.11.9
  5. Intel(R) Xeon(R) Platinum 8462Y+
  6. 516GB RAM
@avalanchesiqi
Copy link
Contributor

We ran into the same issue. @tuler you successfully ran the code a few days ago. Did you encounter the same issue?

@Jacobsonradical
Copy link
Author

@avalanchesiqi
I think there are two ways to solve this.

  1. install CPU pytorch, then pytroch automatically compute everything on CPU, no need to transfer tensor
  2. change the line to
    assert all(ratings[labelCol].values == targets.cpu().numpy())

@tuler
Copy link

tuler commented Jan 25, 2025

We ran into the same issue. @tuler you successfully ran the code a few days ago. Did you encounter the same issue?

No, I ran on CPU only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants