TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

Jacobsonradical · 2025-01-16T19:48:56Z

Describe the bug
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/run_scoring.py", line 294, in _run_scorer_parallelizable
scoringResults = scorer.prescore(scoringArgs, preserveRatings=not runParallel)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/scorer.py", line 301, in prescore
noteScores, userScores, metaScores = self._prescore_notes_and_users(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 554, in _prescore_notes_and_users
) = self._run_stable_matrix_factorization(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 449, in _run_stable_matrix_factorization
return self._run_regular_matrix_factorization(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/mf_base_scorer.py", line 424, in _run_regular_matrix_factorization
return self._mfRanker.run_mf(ratingsForTraining)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/matrix_factorization.py", line 560, in run_mf
self._lossModule = NormalizedLoss(
^^^^^^^^^^^^^^^
File "/root/community-note/communitynotes/sourcecode/scoring/matrix_factorization/normalized_loss.py", line 108, in init
assert all(ratings[labelCol].values == targets.numpy())
^^^^^^^^^^^^^^^
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
"""

To Reproduce
I run the code in shell:

python3 main.py
--enrollment /root/community-note/enrollment/2024-12-29_20-02.tsv
--notes /root/community-note/note/2024-12-29_20-02.tsv
--ratings /root/community-note/rating/
--status /root/community-note/status/2024-12-29_20-02.tsv
--outdir /root/community-note/notescore
--parallel

Expected behavior
I believe that this is due to normalized_loss.py, line 108
assert all(ratings[labelCol].values == targets.numpy())

I am not sure if I should change it to
assert all(ratings[labelCol].values == targets.cpu().numpy())

Environment

Same venv as in requirement
NVIDIA H100 80GB HBM3 X2
CUDA 12.2
python 3.11.9
Intel(R) Xeon(R) Platinum 8462Y+
516GB RAM

avalanchesiqi · 2025-01-25T02:11:12Z

We ran into the same issue. @tuler you successfully ran the code a few days ago. Did you encounter the same issue?

Jacobsonradical · 2025-01-25T19:52:14Z

@avalanchesiqi
I think there are two ways to solve this.

install CPU pytorch, then pytroch automatically compute everything on CPU, no need to transfer tensor
change the line to
assert all(ratings[labelCol].values == targets.cpu().numpy())

tuler · 2025-01-25T20:02:57Z

We ran into the same issue. @tuler you successfully ran the code a few days ago. Did you encounter the same issue?

No, I ran on CPU only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

Jacobsonradical commented Jan 16, 2025

avalanchesiqi commented Jan 25, 2025

Jacobsonradical commented Jan 25, 2025

tuler commented Jan 25, 2025

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #306

Comments

Jacobsonradical commented Jan 16, 2025

avalanchesiqi commented Jan 25, 2025

Jacobsonradical commented Jan 25, 2025

tuler commented Jan 25, 2025