Langchain
Seaplane offers native integration with Langchain for some of its core
functionality related to chatbots. Specifically, we have integrated our vector
store and local MPT-30B model with LangChain's ConversationalRetrievalChain
and the RecursiveCharacterTextSplitter
. By doing so we can support creating
powerful chatbots in as little lines of code as possible.
Built-in Wrappers​
Seaplane supports two built-in wrappers around Langchain that take care of all of the heavy lifting. One to process and store your PDF documents, and one to easily query your LLM with a Conversational Retrieval Chain.
Process and Store PDFs​
The store.save()
method powered by the Seaplane vector store takes a PDF file,
local path or URL, as input and runs it through a
RecursiveCharacterTextSplitter
and embeddings model to finally store it in the
vector store.
You can use store.save()
inside any task of type='vectordb'
. The method
takes two input arguments.
- filename (string)
- file_url (string) - Local file path or URL.
from seaplane import app, task, start, config
@task(type="vectordb", id='pdf-processor', index_name='<INDEX-NAME>')
def pdf_chat_processor_task(data, store):
# get PDF and turn into documents using the seaplane embeddings
return store.save(data['pdf'], data['pdf'])
The result is a vector representation of your input PDF inside the Seaplane vector store.
Conversational Retrieval Chain​
store.query()
is a wrapper around the conversational retrieval chain method by
Langchain. It creates the conversational retrieval chain using the Seaplane
vector store, Substation our large model platform, Specifically MPT-30B and runs
your query and chat history through it.
You can use store.query()
inside any task of type='vectordb'
. The method
takes two input arguments.
- Chat History (LIST) - A list of query and result tuples i.e.,
[(query, result["answer"])]
. Can be empty if there is no chat history. - Query (STRING) - Your question you want the LLM and vector store to answer
from seaplane import app, task, start, config
@task(type="vectordb", id='chatbot-task', index_name='<INDEX-NAME>')
def pdf_chatbot_task(data, store):
# answer question
return store.query(data['question'], data['history'])
Directly Using Langchain​
Alternatively, if you do not have a PDF as input or want more control over your application you can implement your chatbot and processor application directly using Langchain and Seaplane.
Storing Your Documents as Vectors​
In this example, you assume new data is fed to the application endpoint as a
body parameter (text
) to the POST request.
First, create a new index in the vector store. In this example, you use MPT-30B and the Seaplane Embeddings function which has a dimension of 768.
from seaplane import task
from seaplane.vector import vector_store
# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
# create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
vector_store.create_index("chat-documents", 768)
Next, split the input text into chunks using the
RecursiveCharacterTextSplitter
functionality from Langchain. Create documents
from the chunks and embed them using the Seaplane embeddings.
from seaplane import app, task, start
from seaplane.vector import vector_store
from langchain.text_splitter import RecursiveCharacterTextSplitter
from seaplane.integrations.langchain import seaplane_embeddings
# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
# create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
vector_store.create_index("chat-documents", 768)
# create text splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000,
chunk_overlap = 100,
length_function = len,
add_start_index = True,
)
# create documents from our input string
texts = text_splitter.create_documents([data['text']])
# embed documents
vectors = seaplane_embeddings.embed_documents([page.page_content for page in texts])
The create_documents
function expects a list as input hence the []
around
data['text']
. The embed_documents expects a list of text chunks as input. You
create it inside the function call using a list comprehension extracting the
page_content
from each previously created page.
Finally, transform your vectors into a vector format the vector store
understands. This includes the vector itself, an ID (UUID4
), the metadata, and
a data component containing the text representation of the vector.
from seaplane import task
from seaplane.vector import vector_store
from langchain.text_splitter import RecursiveCharacterTextSplitter
from seaplane.integrations.langchain import seaplane_embeddings
from seaplane.model import Vector
import uuid
# the processing task
@task(type="compute", id='chat-processor')
def process_data(data):
# create vector store if it does not yet exist, 768 dimensions for seaplane embeddings
vector_store.create_index("chat-documents", 768)
# create text splitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000,
chunk_overlap = 100,
length_function = len,
add_start_index = True,
)
# create documents from our input string
texts = text_splitter.create_documents([data['text']])
# embed documents
vectors = seaplane_embeddings.embed_documents([page.page_content for page in texts])
# create the vector representation the vector store understands
vectors = [Vector(id=str(uuid.uuid4()), vector=vector, metadata={"page_content": texts[idx].page_content, "metadata": texts[idx].metadata}) for idx, vector in enumerate(vectors)]
# insert vectors in vector store
return vector_store.insert("chat-documents", vectors)
Querying Your Documents Using a ConversationalRetrievalChain
​
The example below creates a ConversationalRetrievalChain
by hand powered by
the Langchain Seaplane integration, using an instance of Seaplane hosted MPT-30B
as the LLM of choice.
from seaplane import task
from langchain.chains import ConversationalRetrievalChain
from seaplane.integrations.langchain import SeaplaneLLM, langchain_vectorstore
# the chat task that performs the document search and feeds them to the LLM
@task(type="inference", id='chat-task')
def chat_task(data):
# create vector store instance with langchain integration
vectorstore = langchain_vectorstore(index_name="chat-documents")
# Create the chain
pdf_qa_hf = ConversationalRetrievalChain.from_llm(
llm=SeaplaneLLM(),
retriever=vectorstore.as_retriever(),
return_source_documents=True,
)
Make sure you select your the correct index that holds your documents for
in-context learning. Notice SeaplaneLLM()
is added as the llm
parameter for
the ConversationalRetrievalChain
. In short, this ensures you use the Seaplane
vector store and Seaplane hosted MPT-30B as your LLM.
Finally, you can query the LLM with the user-provided question.
from seaplane import task
from langchain.chains import ConversationalRetrievalChain
from seaplane.integrations.langchain import SeaplaneLLM, langchain_vectorstore
# the chat task that performs the document search and feeds them to the LLM
@task(type="inference", id='chat-task')
def chat_task(data):
# create vector store instance with langchain integration
vectorstore = langchain_vectorstore(index_name="chat-documents")
# Create the chain
pdf_qa_hf = ConversationalRetrievalChain.from_llm(
llm=SeaplaneLLM(),
retriever=vectorstore.as_retriever(),
return_source_documents=True,
)
# answer the question using MPT-30B
result = pdf_qa_hf({"question": data["query"], "chat_history": data['chat_history']})
# return only the answer to the user
return result["answer"].split("\n\n### Response\n")[1]
The above assumes that the user adds the questions and chat history to the POST
request as query
and chat_hisotry
respectively.