Version: 0.3.94

Langchain

Seaplane offers native integration with Langchain for some of its core functionality related to chatbots. Specifically, we have integrated our vector store and local MPT-30B model with LangChain's ConversationalRetrievalChain and the RecursiveCharacterTextSplitter. By doing so we can support creating powerful chatbots in as little lines of code as possible.

Built-in Wrappers

Seaplane supports two built-in wrappers around Langchain that take care of all of the heavy lifting. One to process and store your PDF documents, and one to easily query your LLM with a Conversational Retrieval Chain.

Process and Store PDFs

The store.save() method powered by the Seaplane vector store takes a PDF file, local path or URL, as input and runs it through a RecursiveCharacterTextSplitter and embeddings model to finally store it in the vector store.

You can use store.save() inside any task of type='vectordb'. The method takes two input arguments.

filename (string)
file_url (string) - Local file path or URL.

from seaplane import app, task, start, config

@task(type="vectordb", id='pdf-processor', index_name='<INDEX-NAME>')
def pdf_chat_processor_task(data, store):
    # get PDF and turn into documents using the seaplane embeddings
    return store.save(data['pdf'], data['pdf'])

The result is a vector representation of your input PDF inside the Seaplane vector store.

Conversational Retrieval Chain

store.query() is a wrapper around the conversational retrieval chain method by Langchain. It creates the conversational retrieval chain using the Seaplane vector store, Substation our large model platform, Specifically MPT-30B and runs your query and chat history through it.

You can use store.query() inside any task of type='vectordb'. The method takes two input arguments.

Chat History (LIST) - A list of query and result tuples i.e., [(query, result["answer"])]. Can be empty if there is no chat history.
Query (STRING) - Your question you want the LLM and vector store to answer

from seaplane import app, task, start, config

@task(type="vectordb", id='chatbot-task', index_name='<INDEX-NAME>')
def pdf_chatbot_task(data, store):
    # answer question 
    return store.query(data['question'], data['history'])

Directly Using Langchain

Alternatively, if you do not have a PDF as input or want more control over your application you can implement your chatbot and processor application directly using Langchain and Seaplane.

Storing Your Documents as Vectors

In this example, you assume new data is fed to the application endpoint as a body parameter (text) to the POST request.

First, create a new index in the vector store. In this example, you use MPT-30B and the Seaplane Embeddings function which has a dimension of 768.

from seaplane import task
from seaplane.vector import vector_store

# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
     # create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
    vector_store.create_index("chat-documents", 768)

Next, split the input text into chunks using the RecursiveCharacterTextSplitter functionality from Langchain. Create documents from the chunks and embed them using the Seaplane embeddings.

from seaplane import app, task, start
from seaplane.vector import vector_store
from langchain.text_splitter import RecursiveCharacterTextSplitter
from seaplane.integrations.langchain import seaplane_embeddings


# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
     # create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
    vector_store.create_index("chat-documents", 768)

    # create text splitter
    text_splitter = RecursiveCharacterTextSplitter(
            chunk_size = 1000,
            chunk_overlap  = 100,
            length_function = len,
            add_start_index = True,
    )

    # create documents from our input string
    texts = text_splitter.create_documents([data['text']])

    # embed documents
    vectors = seaplane_embeddings.embed_documents([page.page_content for page in texts])

The create_documents function expects a list as input hence the [] around data['text']. The embed_documents expects a list of text chunks as input. You create it inside the function call using a list comprehension extracting the page_content from each previously created page.

Finally, transform your vectors into a vector format the vector store understands. This includes the vector itself, an ID (UUID4), the metadata, and a data component containing the text representation of the vector.

from seaplane import task
from seaplane.vector import vector_store
from langchain.text_splitter import RecursiveCharacterTextSplitter
from seaplane.integrations.langchain import seaplane_embeddings
from seaplane.model import Vector
import uuid

# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
     # create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
    vector_store.create_index("chat-documents", 768)

    # create text splitter
    text_splitter = RecursiveCharacterTextSplitter(
            chunk_size = 1000,
            chunk_overlap  = 100,
            length_function = len,
            add_start_index = True,
    )

    # create documents from our input string
    texts = text_splitter.create_documents([data['text']])

    # embed documents
    vectors = seaplane_embeddings.embed_documents([page.page_content for page in texts])

    # create the vector representation the vector store understands
    vectors = [Vector(id=str(uuid.uuid4()), vector=vector, metadata={"page_content": texts[idx].page_content, "metadata": texts[idx].metadata}) for idx, vector in enumerate(vectors)]

    # insert vectors in vector store
    return vector_store.insert("chat-documents", vectors)

Querying Your Documents Using a `ConversationalRetrievalChain`

The example below creates a ConversationalRetrievalChain by hand powered by the Langchain Seaplane integration, using an instance of Seaplane hosted MPT-30B as the LLM of choice.

from seaplane import task
from langchain.chains import ConversationalRetrievalChain
from seaplane.integrations.langchain import SeaplaneLLM, langchain_vectorstore

# the chat task that performs the document search and feeds them to the LLM
@task(type="inference", id='chat-task')
def chat_task(data):
    # create vector store instance with langchain integration
    vectorstore = langchain_vectorstore(index_name="chat-documents")

    # Create the chain
    pdf_qa_hf = ConversationalRetrievalChain.from_llm(
        llm=SeaplaneLLM(),
        retriever=vectorstore.as_retriever(),
        return_source_documents=True,
    )

Make sure you select your the correct index that holds your documents for in-context learning. Notice SeaplaneLLM() is added as the llm parameter for the ConversationalRetrievalChain. In short, this ensures you use the Seaplane vector store and Seaplane hosted MPT-30B as your LLM.

Finally, you can query the LLM with the user-provided question.

from seaplane import task
from langchain.chains import ConversationalRetrievalChain
from seaplane.integrations.langchain import SeaplaneLLM, langchain_vectorstore

# the chat task that performs the document search and feeds them to the LLM
@task(type="inference", id='chat-task')
def chat_task(data):
    # create vector store instance with langchain integration
    vectorstore = langchain_vectorstore(index_name="chat-documents")

    # Create the chain
    pdf_qa_hf = ConversationalRetrievalChain.from_llm(
        llm=SeaplaneLLM(),
        retriever=vectorstore.as_retriever(),
        return_source_documents=True,
    )

    # answer the question using MPT-30B
    result = pdf_qa_hf({"question": data["query"], "chat_history": data['chat_history']})

    # return only the answer to the user
    return result["answer"].split("\n\n### Response\n")[1]

The above assumes that the user adds the questions and chat history to the POST request as query and chat_hisotry respectively.

Built-in Wrappers​

Process and Store PDFs​

Conversational Retrieval Chain​

Directly Using Langchain​

Storing Your Documents as Vectors​

Querying Your Documents Using a ConversationalRetrievalChain​