Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset without files are incorrectly labeled as info:eu-repo/semantics/openAccess within exports #11154

Open
johannes-darms opened this issue Jan 14, 2025 · 4 comments
Labels
Type: Bug a defect

Comments

@johannes-darms
Copy link
Contributor

  • What steps does it take to reproduce the issue?

Create a dataset in Dataverse without uploading any files due to the sensitivity of the data (e.g., patient-level data).
(Apply a custom license that restricts access to the dataset.)[This has no effect on this property.]
Export the dataset metadata using the XmlMetadataTemplate or OpenAireExportUtil. (Datacite or openAIRE)
Check the exported metadata and observe that the access type is incorrectly labeled as info:eu-repo/semantics/openAccess.

  • When does this issue occur?

This issue occurs during the metadata export process for datasets without uploaded files. The relevant code to place another value within the export depends on the presence of a file cf. method .

  • Which page(s) does it occur on?

This issue affects the metadata export functionality and is not directly tied to a specific UI page but impacts the exported files and integrations.

  • What happens?

The export incorrectly includes info:eu-repo/semantics/openAccess in the metadata. A dataset without a file is not automatically openAccess, as means to access them are defined within custom terms.

  • To whom does it occur (all users, curators, superusers)?

This issue affects all users who rely on metadata exports, including administrators, curators, and external systems consuming the exported metadata.

  • What did you expect to happen?

The export should reflect the correct access status of the dataset or if not known omit this property.

  • Which version of Dataverse are you using?

6.5

  • Any related open or closed issues to this bug report?

I found none.

  • Are you thinking about creating a pull request for this issue?

We welcome contributions to this issue.

@johannes-darms johannes-darms added the Type: Bug a defect label Jan 14, 2025
@jggautier
Copy link
Contributor

Thanks for catching this @johannes-darms!

Do you think #5920 is related? If I had been thinking of datasets that didn't have files when I opened it, I think I would have included it as another case of those Access Rights being misapplied.

@johannes-darms
Copy link
Contributor Author

This is definitely related, and I think the current implementation is quite effective. It bases decisions on actively chosen options (e.g. restricted files and access requests managed by Dataverse) rather than relying on static, outdated metadata values. However, this approach assumes that datasets are managed exclusively within a Dataverse instance, without regard to the terms attached.

If the openness or closeness of a dataset is not to be inferred from active settings (as is currently the case), then this information must be derived from the terms associated with the file or dataset. Since a machine cannot interpret the terms and automatically infer these properties*, it would require the submitter or administrator to manually select the appropriate property - either when configuring the licence list or when providing a custom licence.

IMHO: To address this, we need a feature flag to toggle between the current method of inferring this property and a user-selectable drop-down as part of the terms.

(* There are ways to specify data use restrictions in a machine readable way... see https://www.ga4gh.org/product/data-use-ontology-duo)

@jggautier
Copy link
Contributor

This is really interesting, thanks!

This is definitely related, and I think the current implementation is quite effective. It bases decisions on actively chosen options (e.g. restricted files and access requests managed by Dataverse) rather than relying on static, outdated metadata values.

Could you write more about relying on static, outdated metadata values? Are there examples of this being done now?

@johannes-darms
Copy link
Contributor Author

In Dataverse, this feature is not available. By "static outdated metadata," I am referring to properties set during the initial creation or upload of a dataset or data file, such as license details or custom terms. These settings are typically chosen once and are rarely, if ever, updated. On the other hand, settings like "file restriction" and "support data reuse requests" are more likely to be maintained or correctly configured because they directly influence the software's behavior and the user's actions. These settings provide immediate feedback to the uploader about their decisions, which helps identify and resolve misconfigurations. In contrast, incorrect metadata properties are often overlooked because they do not directly impact functionality, making them less noticeable.

However, as mentioned earlier, this approach works only if the data file is managed within Dataverse and the data reuse requests are also handled directly through Dataverse. To address this (special case?) , Dataverse needs a flag that either enables the "calculation" of data openness or delegates this responsibility to the terms and conditions properties of the dataset or data file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

2 participants