Skip to main content
Version: 0.3.94

Langchain

Seaplane offers native integration with Langchain for some of its core functionality related to chatbots. Specifically, we have integrated our vector store and local MPT-30B model with LangChain's ConversationalRetrievalChain and the RecursiveCharacterTextSplitter. By doing so we can support creating powerful chatbots in as little lines of code as possible.

Built-in Wrappers​

Seaplane supports two built-in wrappers around Langchain that take care of all of the heavy lifting. One to process and store your PDF documents, and one to easily query your LLM with a Conversational Retrieval Chain.

Process and Store PDFs​

The store.save() method powered by the Seaplane vector store takes a PDF file, local path or URL, as input and runs it through a RecursiveCharacterTextSplitter and embeddings model to finally store it in the vector store.

You can use store.save() inside any task of type='vectordb'. The method takes two input arguments.

  • filename (string)
  • file_url (string) - Local file path or URL.
from seaplane import app, task, start, config

@task(type="vectordb", id='pdf-processor', index_name='<INDEX-NAME>')
def pdf_chat_processor_task(data, store):
# get PDF and turn into documents using the seaplane embeddings
return store.save(data['pdf'], data['pdf'])

The result is a vector representation of your input PDF inside the Seaplane vector store.

Conversational Retrieval Chain​

store.query() is a wrapper around the conversational retrieval chain method by Langchain. It creates the conversational retrieval chain using the Seaplane vector store, Substation our large model platform, Specifically MPT-30B and runs your query and chat history through it.

You can use store.query() inside any task of type='vectordb'. The method takes two input arguments.

  • Chat History (LIST) - A list of query and result tuples i.e., [(query, result["answer"])]. Can be empty if there is no chat history.
  • Query (STRING) - Your question you want the LLM and vector store to answer
from seaplane import app, task, start, config

@task(type="vectordb", id='chatbot-task', index_name='<INDEX-NAME>')
def pdf_chatbot_task(data, store):
# answer question
return store.query(data['question'], data['history'])

Directly Using Langchain​

Alternatively, if you do not have a PDF as input or want more control over your application you can implement your chatbot and processor application directly using Langchain and Seaplane.

Storing Your Documents as Vectors​

In this example, you assume new data is fed to the application endpoint as a body parameter (text) to the POST request.

First, create a new index in the vector store. In this example, you use MPT-30B and the Seaplane Embeddings function which has a dimension of 768.

from seaplane import task
from seaplane.vector import vector_store

# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
# create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
vector_store.create_index("chat-documents", 768)

Next, split the input text into chunks using the RecursiveCharacterTextSplitter functionality from Langchain. Create documents from the chunks and embed them using the Seaplane embeddings.

from seaplane import app, task, start
from seaplane.vector import vector_store
from langchain.text_splitter import RecursiveCharacterTextSplitter
from seaplane.integrations.langchain import seaplane_embeddings


# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
# create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
vector_store.create_index("chat-documents", 768)

# create text splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000,
chunk_overlap = 100,
length_function = len,
add_start_index = True,
)

# create documents from our input string
texts = text_splitter.create_documents([data['text']])

# embed documents
vectors = seaplane_embeddings.embed_documents([page.page_content for page in texts])

The create_documents function expects a list as input hence the [] around data['text']. The embed_documents expects a list of text chunks as input. You create it inside the function call using a list comprehension extracting the page_content from each previously created page.

Finally, transform your vectors into a vector format the vector store understands. This includes the vector itself, an ID (UUID4), the metadata, and a data component containing the text representation of the vector.

from seaplane import task
from seaplane.vector import vector_store
from langchain.text_splitter import RecursiveCharacterTextSplitter
from seaplane.integrations.langchain import seaplane_embeddings
from seaplane.model import Vector
import uuid

# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
# create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
vector_store.create_index("chat-documents", 768)

# create text splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000,
chunk_overlap = 100,
length_function = len,
add_start_index = True,
)

# create documents from our input string
texts = text_splitter.create_documents([data['text']])

# embed documents
vectors = seaplane_embeddings.embed_documents([page.page_content for page in texts])

# create the vector representation the vector store understands
vectors = [Vector(id=str(uuid.uuid4()), vector=vector, metadata={"page_content": texts[idx].page_content, "metadata": texts[idx].metadata}) for idx, vector in enumerate(vectors)]

# insert vectors in vector store
return vector_store.insert("chat-documents", vectors)

Querying Your Documents Using a ConversationalRetrievalChain​

The example below creates a ConversationalRetrievalChain by hand powered by the Langchain Seaplane integration, using an instance of Seaplane hosted MPT-30B as the LLM of choice.

from seaplane import task
from langchain.chains import ConversationalRetrievalChain
from seaplane.integrations.langchain import SeaplaneLLM, langchain_vectorstore

# the chat task that performs the document search and feeds them to the LLM
@task(type="inference", id='chat-task')
def chat_task(data):
# create vector store instance with langchain integration
vectorstore = langchain_vectorstore(index_name="chat-documents")

# Create the chain
pdf_qa_hf = ConversationalRetrievalChain.from_llm(
llm=SeaplaneLLM(),
retriever=vectorstore.as_retriever(),
return_source_documents=True,
)

Make sure you select your the correct index that holds your documents for in-context learning. Notice SeaplaneLLM() is added as the llm parameter for the ConversationalRetrievalChain. In short, this ensures you use the Seaplane vector store and Seaplane hosted MPT-30B as your LLM.

Finally, you can query the LLM with the user-provided question.

from seaplane import task
from langchain.chains import ConversationalRetrievalChain
from seaplane.integrations.langchain import SeaplaneLLM, langchain_vectorstore

# the chat task that performs the document search and feeds them to the LLM
@task(type="inference", id='chat-task')
def chat_task(data):
# create vector store instance with langchain integration
vectorstore = langchain_vectorstore(index_name="chat-documents")

# Create the chain
pdf_qa_hf = ConversationalRetrievalChain.from_llm(
llm=SeaplaneLLM(),
retriever=vectorstore.as_retriever(),
return_source_documents=True,
)

# answer the question using MPT-30B
result = pdf_qa_hf({"question": data["query"], "chat_history": data['chat_history']})

# return only the answer to the user
return result["answer"].split("\n\n### Response\n")[1]

The above assumes that the user adds the questions and chat history to the POST request as query and chat_hisotry respectively.