Flying memes » Blog Archive » A local Q&A engine using llama and FAISS

A local Q&A engine using llama and FAISS

FAISS implements similarity search. Sentence Transformers encodes sentences into vectors FAISS can use. Llama can be configured to perform semantic searches in a FAISS vector store. By combining all of the above together we can build a local GenAI powered assistant capable of performing semantic queries to extract information from a local corpus of documents, let’s see how.

For this example I’m using llama-2-7b-chat.ggmlv3.q4_0.bin and this document (notes.txt) I’d like my agent to provide me answers about. Also I’m using faiss-gpu, as I have a compatible graphic card, but the cpu version works too.


# conda install faiss-gpu
# pip install llama-index langchain ctransformers sentence-transformers unstructured

from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain_core.vectorstores import VectorStoreRetriever
from langchain import PromptTemplate
from langchain.chains import RetrievalQA

# Encode the document into FAISS
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
loader = UnstructuredFileLoader('data/notes.txt')
documents = loader.load()
text_chunks=text_splitter.split_documents(documents)
embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
vector_store=FAISS.from_documents(text_chunks, embeddings)
retriever = VectorStoreRetriever(vectorstore=vector_store, search_kwargs={'k': 2})

# Load LLama 
llm = CTransformers(
    model="llama-2-7b-chat.ggmlv3.q4_0.bin", 
    model_type="llama", 
    config={'max_new_tokens':500,'temperature':0.1, 'context_length': 2048})

# Define the agent behaviour
template="""Use the following pieces of information to answer the user's question.
If you dont know the answer just say you don't  know, don't try to make up an answer.
Context:{context}
Question:{question}
Only return the helpful answer below and nothing else
Helpful answer
"""

# Create the agent
qa_prompt=PromptTemplate(template=template, input_variables=['context', 'question'])
chain = RetrievalQA.from_chain_type(llm=llm,
                                   chain_type='stuff',
                                   retriever=retriever,
                                   return_source_documents=True,
                                   chain_type_kwargs={'prompt': qa_prompt})

# Example question
question="How should windows be leaved before leaving the flat?"
result=chain({'query':question})
print(result['result'])

# response: Windows should be cleaned prior to departure.