Skip to main content
Version: 0.6.0

Embedding Models

Seaplane currently supports one embedding model (all-mpnet-base-v2). Just like any other model you access it through the model DAG (directed acyclic graph). You can learn more about the model DAG here.

The embedding model takes any text input and turns it into a 768-dimensional vector.

tip

To use the Seaplane vector store with all-mpnet-base-v embeddings. Make sure to set the dimensions of your vector store index to 768

vector_store.create_index('my-index', 768)

Input​

The embedding model expects the following JSON as input. The input can come from any upstream task or DAG.

Example input of the embeddings model
{
"model" : "embeddings",
"text" : "<TEXT TO EMBED>"
}

Output​

The Seaplane model DAG performs an asynchronous request to the embeddings model. Once completed it sends the result to the downstream task or DAG as defined in your DAG.

The output of the model DAG contains the input_data and the generated embeddings (output).

Example output of the embeddings model
{
"input_data": {
"model": "embeddings",
"text": "What is a Seaplane?"
},
"output": [0.03557444363832474, -0.004008828196674585, 0.020864153280854225,
...
-0.008185186423361301, 0.00842976849526167, -0.017890378832817078]
}

Embed Multiple Text Snippets​

In most cases, you will likely want to embed multiple sections of text. For example, when embedding texts to use as retrieval augmented generation (RAG) input data you often split up entire documents in chunks and turn each of them into vectors.

To achieve this on Seaplane, we recommend you debatch your data. For example, assume you have a list of strings that you want to embed. You can achieve this by creating a task in front of the model DAG and sending the text snippets one by one in a loop.

Sample app to embed multiple texts
import json

# create a task definition
def debatch_text(msg):

# load input data
data = json.loads(msg.body)

# loop through the texts and send to model DAG
for text in data["texts"]:
yield json.dumps({"model" : "embeddings", "text" : text})

# create app and dag
app = App("debatched-embedding")
dag = app.dag('debatched-embedding-dag')

# wire task to input
text = dag.task(debatch_text, [app.input()], instance_name='debatcher')

# send to model dag
embedding = app.model_hub('model-dag-instance', [text])

# set resonse and run app
dag.respond(embedding)
app.run()

This application creates an API endpoint that takes the following JSON object as input and turns each text element in the list into an embedding.

{
"texts" : ["embed this text", "also embed this text", "and this one too"]
}