Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Statistics about metadata fields in search API #11173

Open
vera opened this issue Jan 21, 2025 · 0 comments
Open

Feature Request: Statistics about metadata fields in search API #11173

vera opened this issue Jan 21, 2025 · 0 comments
Labels
Type: Feature a feature request

Comments

@vera
Copy link
Contributor

vera commented Jan 21, 2025

Overview of the Feature Request

We are interested in making statistics about metadata fields available through the search API.

Solr supports this via the stats and stats.field query parameters for indexed fields of numeric, string and date types: https://solr.apache.org/guide/6_6/the-stats-component.html

By default, Solr sends the following stats:

Image

Example: If I pass the following query to Solr http://localhost:8983/solr/#/collection1/query?q=*:*&q.op=OR&indent=true&fq=publicationStatus:%22Published%22&fq=dvObjectType:%22datasets%22&stats=true&stats.field=resource.design.population.obtainedSampleSize I receive the following stats for the "obtained sample size" metadata field in addition to the query results:

"stats":{
    "stats_fields":{
      "resource.design.population.obtainedSampleSize":{
        "min":-1.0,
        "max":9.9999999E7,
        "count":25593,
        "missing":1540,
        "sum":1.41567565E8,
        "sumOfSquares":1.002115122884504E16,
        "mean":5531.495526120423,
        "stddev":625733.9594302336
      }
    }
  }

This feature could be implemented relatively easily by allowing the search API user to pass a stats.field parameter into Solr, and passing the output back.

What kind of user is the feature intended for?

API users

What inspired the request?

In our custom UI, we have a range input for searches on numeric fields. For the input, we would like to know the min and max of the fields to use as the default min and max of the input.

Image
Example: numeric field "obtained sample size"

What existing behavior do you want changed?

/

Any brand new behavior do you want to add to Dataverse?

Any open or closed issues related to this feature request?

Didn't find any

Are you thinking about creating a pull request for this feature?

Yes

@vera vera added the Type: Feature a feature request label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
None yet
Development

No branches or pull requests

1 participant