Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search API not returning variables and observations for tabular data #11182

Open
ekraffmiller opened this issue Jan 24, 2025 · 3 comments
Open
Labels
GREI Re-arch Issues related to the GREI Dataverse rearchitecture SPA These changes are required for the Dataverse SPA Type: Bug a defect

Comments

@ekraffmiller
Copy link
Contributor

The search API doesn't return the variables and observations of newly ingested files. It seems to need a subsequent update to the file, such as publishing the dataset, or editing the file metadata, for the search API to return the ingested version of the file.

This is a problem for the SPA because it is using the Search API for getting all the file information it displays on the Collection Page.

What steps does it take to reproduce the issue?

Create a Dataset and add a tabular file to the Dataset. Go to the JSF Dataverse page, and see that the file has been ingested and that it is showing tabular data information, for example:

Image

Retrieve the file information from the Search API:

http://localhost:8000/api/v1/search?q=*&subtree=ingested-file-collection&type=file

The returned file is not the ingested version, it doesn't contain variables or observations:

{
  "status": "OK",
  "data": {
    "q": "*",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "pums_1000.csv",
        "type": "file",
        "url": "http://localhost:8080/api/access/datafile/24",
        "file_id": "24",
        "description": "",
        "file_type": "Comma Separated Values",
        "file_content_type": "text/comma-separated-values",
        "size_in_bytes": 16969,
        "md5": "874aef476ffb8a1e98d68fcf5f780990",
        "checksum": {
          "type": "MD5",
          "value": "874aef476ffb8a1e98d68fcf5f780990"
        },
        "dataset_name": "Ingested File Dataset",
        "dataset_id": "23",
        "dataset_persistent_id": "doi:10.5072/FK2/SDRRRM",
        "dataset_citation": "Admin, Dataverse, 2025, \"Ingested File Dataset\", https://doi.org/10.5072/FK2/SDRRRM, Root, DRAFT VERSION",
        "restricted": false,
        "canDownloadFile": true,
        "publicationStatuses": [
          "Unpublished",
          "Draft"
        ],
        "releaseOrCreateDate": "2025-01-24T13:42:42Z"
      }
    ],
    "count_in_response": 1
  }
}
  • When does this issue occur?
    Search API

  • Which page(s) does it occurs on?
    NA

  • What happens?
    doesn't return ingested version of the file

  • To whom does it occur (all users, curators, superusers)?
    API users

  • What did you expect to happen?
    Should return Tabular file type, variables and observations.

Which version of Dataverse are you using?
The unstable Dataverse Docker image

@ekraffmiller ekraffmiller added the Type: Bug a defect label Jan 24, 2025
@ekraffmiller ekraffmiller added SPA These changes are required for the Dataverse SPA GREI Re-arch Issues related to the GREI Dataverse rearchitecture labels Jan 24, 2025
@pdurbin
Copy link
Member

pdurbin commented Jan 24, 2025

It seems to need a subsequent update to the file, such as publishing the dataset, or editing the file metadata, for the search API to return the ingested version of the file.

Interesting. If it's handy, can you please also post the JSON of how it looks after such an update, with the extra fields you expect?

@ekraffmiller
Copy link
Contributor Author

It seems to need a subsequent update to the file, such as publishing the dataset, or editing the file metadata, for the search API to return the ingested version of the file.

Interesting. If it's handy, can you please also post the JSON of how it looks after such an update, with the extra fields you expect?

After publishing, this is the result. The same thing happens if I edit the file metadata, for example adding a description or path.

{
  "status": "OK",
  "data": {
    "q": "*",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "pums_1000.tab",
        "type": "file",
        "url": "http://localhost:8080/api/access/datafile/24",
        "file_id": "24",
        "description": "",
        "published_at": "2025-01-24T14:04:27Z",
        "file_type": "Tab-Delimited",
        "file_content_type": "text/tab-separated-values",
        "size_in_bytes": 16936,
        "md5": "874aef476ffb8a1e98d68fcf5f780990",
        "checksum": {
          "type": "MD5",
          "value": "874aef476ffb8a1e98d68fcf5f780990"
        },
        "unf": "UNF:6:wN9L6WbdfDIkHDrElCKz2w==",
        "dataset_name": "Ingested File Dataset",
        "dataset_id": "23",
        "dataset_persistent_id": "doi:10.5072/FK2/SDRRRM",
        "dataset_citation": "Admin, Dataverse, 2025, \"Ingested File Dataset\", https://doi.org/10.5072/FK2/SDRRRM, Root, V1, UNF:6:wN9L6WbdfDIkHDrElCKz2w== [fileUNF]",
        "restricted": false,
        "variables": 6,
        "observations": 1000,
        "canDownloadFile": true,
        "publicationStatuses": [
          "Published"
        ],
        "releaseOrCreateDate": "2025-01-24T14:04:27Z"
      }
    ],
    "count_in_response": 1
  }
}

@pdurbin
Copy link
Member

pdurbin commented Jan 24, 2025

Interesting. Thanks. Definitely sounds like a bug! 🐞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI Re-arch Issues related to the GREI Dataverse rearchitecture SPA These changes are required for the Dataverse SPA Type: Bug a defect
Projects
Status: No status
Development

No branches or pull requests

2 participants