rag_skeleton package

Submodules

rag_skeleton.data_processing module

class rag_skeleton.data_processing.DataProcessor(vectordb_path, data_path='data/raw', embedding_model='Alibaba-NLP/gte-large-en-v1.5')[source]

Bases: object

Handles loading, processing, and creating vector databases for documents.

create_vector_db(docs)[source]

Creates and stores the vector database in ChromaDB.

Parameters:

docs: list, document chunks to vectorize and store.

load_documents(enrich_metadata=False)[source]

Loads PDF documents from the specified data path and optionally enriches metadata.

Parameters:

enrich_metadata (bool): If True, add metadata to each document (e.g., name and year).

Returns:

list: List of loaded documents with optional metadata.

process_and_create_db()[source]: Main method to load, split, and create vectorDB.

split_documents(docs, chunk_size=1500, chunk_overlap=100)[source]

Splits documents into chunks for vectorization.

Parameters:

docs: list, documents to split.
chunk_size: int, size of each chunk. Default is 1500.
chunk_overlap: int, overlap between chunks. Default is 100.

Returns:

list: List of document chunks.

rag_skeleton.generation module

class rag_skeleton.generation.TextGenerator(model_name='meta-llama/Llama-3.2-3B-Instruct', device=None, load_mode='local', api_token=None)[source]

Bases: object

Generates text responses using a specified LLM model.

load_model()[source]

Loads the LLM model and tokenizer and sets up the text generation pipeline.

If load_mode is set to “api”, it uses the Hugging Face API to load the model and requires an API token. If load_mode is “local”, it loads the model and tokenizer locally from the Hugging Face repository.

Raises:: ValueError – If load_mode is “api” and api_token is not provided.

Configurations:

For both “api” and “local” modes, specific parameters such as temperature, do_sample, repetition_penalty, and max_new_tokens are set to control text generation behavior.

For the local model, the eos_token_id parameter is set to stop generation at specified tokens, ensuring response clarity.

rag_skeleton.rag module

class rag_skeleton.rag.RAGPipeline(vectordb_path='vectordb', embedding_model_name='Alibaba-NLP/gte-large-en-v1.5', model_name='meta-llama/Llama-3.2-3B-Instruct', load_mode='local', api_token=None, use_history=True, max_history=4)[source]

Bases: object

Combines document retrieval and text generation to create a Retrieval-Augmented Generation (RAG) pipeline with conversation history.

format_docs_with_history(docs)[source]

Formats retrieved documents and conversation history for the prompt context.

Parameters:

docs: list of documents retrieved for the query.

Returns:

tuple: (str, list) - formatted document text with history, and list of sources.

get_response(question)[source]

Fetches a response to the user’s query using the RAG pipeline, including conversation history.

Parameters:

question: str, the question to be answered.

Returns:

str: The generated response with sources for reference.

preview_prompt(question)[source]

Returns the prompt that will be passed to the LLM without invoking the generation step.

Parameters:

question: str, the question to be previewed.

Returns:

str: The formatted prompt with history and context.

setup_pipeline()[source]

Sets up the full RAG pipeline with retrieval, prompt formatting, and text generation.

This method configures the RAG pipeline by:

Defining the prompt template that guides the language model on how to respond to user queries.

Initializing the PromptTemplate with the defined format to structure the question, context, and conversation history.

Setting up the rag_chain pipeline, which includes:

Document retrieval: Retrieves relevant documents based on the user’s query.

Context formatting: Incorporates retrieved documents and conversation history.

Language model invocation: Passes the formatted prompt to the language model for generating a response.

Output parsing: Structures the final output format for the response.

Raises:: ValueError – If any pipeline component is misconfigured.

The final rag_chain processes queries end-to-end, combining retrieval and generation.

rag_skeleton.retrieval module

class rag_skeleton.retrieval.DocumentRetriever(vectordb_path='vectordb', embedding_model_name='Alibaba-NLP/gte-large-en-v1.5')[source]

Bases: object

Retrieves documents from the ChromaDB vector database using an embedding model.

get_retriever(search_type='similarity', search_kwargs={'k': 5})[source]

Returns a retriever instance for retrieving similar documents.

Parameters:

search_type: str, type of search (Can be “similarity”, “mmr”, or “similarity_score_threshold”). Default is “similarity”.
search_kwargs: dict, additional search parameters. Default is None.

Returns:

retriever: a retriever instance for document retrieval.

rag_skeleton.run module

rag_skeleton.run.main(data_path=None, load_mode='local', model_name='meta-llama/Llama-3.2-3B-Instruct', api_token=None, vectordb_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ragskeleton/checkouts/latest/src/rag_skeleton/data/vectordb'))[source]

Initializes the RAG pipeline, ensuring the vector database is available or created.

Parameters:

data_path (str): Optional path to a directory of PDF files to process and build a new vector database.
load_mode (str): Specifies whether to load the model locally or via Hugging Face API. Options are ‘local’ (default) or ‘api’.
model_name (str): The name of the language model to use for generation. Default is ‘meta-llama/Llama-3.2-3B-Instruct’.
api_token (str, optional): Hugging Face API token, required if load_mode is set to ‘api’.
vectordb_path (Path, optional): Path to store the vector database. Defaults to package directory.

rag_skeleton package

Submodules

rag_skeleton.data_processing module

rag_skeleton.generation module

rag_skeleton.rag module

rag_skeleton.retrieval module

rag_skeleton.run module

Module contents