Skip to content

Latest commit

 

History

History

custom-embeddings

#Vespa

Customizing Frozen Data Embeddings in Vespa

This sample application is used to demonstrate how to adapt frozen embeddings from foundational embedding models. Frozen data embeddings from Foundational models are an emerging industry practice for reducing the complexity of maintaining and versioning embeddings. The frozen data embeddings are re-used for various tasks, such as classification, search, or recommendations.

Read the [blog post](blog post).

Quick start

The following is a quick start recipe on how to get started with this application:

  • Docker Desktop installed and running. 4 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Alternatively, deploy using Vespa Cloud
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.

Validate Docker resource settings, should be minimum 4 GB:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using docker image:

$ vespa config set target local

Pull and start the vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Download this sample application:

$ vespa clone custom-embeddings my-app && cd my-app

Download a frozen embedding model file, see text embeddings made easy for details:

$ mkdir -p models
$ curl -L -o models/tokenizer.json \
  https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json

$ curl -L -o models/frozen.onnx \
  https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx

$ cp models/frozen.onnx models/tuned.onnx

In this case, we re-use the frozen model as the tuned model to demonstrate functionality.

Deploy the application :

$ vespa deploy --wait 300

Deployment note

It is possible to deploy this app to Vespa Cloud.

Indexing sample documents

vespa document ext/1.json
vespa document ext/2.json
vespa document ext/3.json

Query and ranking examples

We demonstrate using vespa cli, use -v to see the curl equivalent using HTTP api.

Simple retrieve all documents with undefined ranking:

vespa query 'yql=select * from doc where true' \
'ranking=unranked'

Notice the relevance, which is assigned by the rank-profile.

Using the frozen query tower

vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(frozen, "space contains many suns")'

Using the tuned query tower

vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(tuned, "space contains many suns")'

In this case, the tuned model is equivelent to the frozen query tower that was used for document embeddings.

Using the simple weight transformation query tower

vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(tuned, "space contains many suns")' \
'ranking=simple-similarity'

This invokes the simple-similarity ranking model, which performs the query transformation to the tuned embedding.

Using the Deep Neural Network similarity

vespa query 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding, q)' \
'input.query(q)=embed(tuned, "space contains many suns")' \
'ranking=custom-similarity'

Note that this just demonstrates the functionality, the custom similarity model is initialized from random weights.

Dump all embeddings

This is useful for training routines, getting the frozen document embeddings out of Vespa:

vespa visit --field-set "[all]" > ../vector-data.jsonl

Get a specific document and it's embedding(s):

curl "http://localhost:8080/document/v1/doc/doc/docid/1?fieldSet=\[all\]"

Cleanup

Tear down the running container:

$ docker rm -f vespa