A local Q&A engine using llama and FAISS
FAISS implements similarity search. Sentence Transformers encodes sentences into vectors FAISS can use. Llama can be configured to perform semantic searches in a FAISS vector store. By combining all of the above together we can build a local GenAI powered assistant capable of performing semantic queries to extract information from a local corpus of documents, let’s see how.
For this example I’m using llama-2-7b-chat.ggmlv3.q4_0.bin and this document (notes.txt) I’d like my agent to provide me answers about. Also I’m using faiss-gpu, as I have a compatible graphic card, but the cpu version works too.
# conda install faiss-gpu
# pip install llama-index langchain ctransformers sentence-transformers unstructured
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain_core.vectorstores import VectorStoreRetriever
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
# Encode the document into FAISS
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
loader = UnstructuredFileLoader('data/notes.txt')
documents = loader.load()
text_chunks=text_splitter.split_documents(documents)
embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
vector_store=FAISS.from_documents(text_chunks, embeddings)
retriever = VectorStoreRetriever(vectorstore=vector_store, search_kwargs={'k': 2})
# Load LLama
llm = CTransformers(
model="llama-2-7b-chat.ggmlv3.q4_0.bin",
model_type="llama",
config={'max_new_tokens':500,'temperature':0.1, 'context_length': 2048})
# Define the agent behaviour
template="""Use the following pieces of information to answer the user's question.
If you dont know the answer just say you don't know, don't try to make up an answer.
Context:{context}
Question:{question}
Only return the helpful answer below and nothing else
Helpful answer
"""
# Create the agent
qa_prompt=PromptTemplate(template=template, input_variables=['context', 'question'])
chain = RetrievalQA.from_chain_type(llm=llm,
chain_type='stuff',
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={'prompt': qa_prompt})
# Example question
question="How should windows be leaved before leaving the flat?"
result=chain({'query':question})
print(result['result'])
# response: Windows should be cleaned prior to departure.