rag_skeleton package

Submodules

rag_skeleton.data_processing module

class rag_skeleton.data_processing.DataProcessor(vectordb_path, data_path='data/raw', embedding_model='Alibaba-NLP/gte-large-en-v1.5')[source]

Bases: object

Handles loading, processing, and creating vector databases for documents.

create_vector_db(docs)[source]

Creates and stores the vector database in ChromaDB.

Parameters:

  • docs: list, document chunks to vectorize and store.

load_documents(enrich_metadata=False)[source]

Loads PDF documents from the specified data path and optionally enriches metadata.

Parameters:

  • enrich_metadata (bool): If True, add metadata to each document (e.g., name and year).

Returns:

  • list: List of loaded documents with optional metadata.

process_and_create_db()[source]

Main method to load, split, and create vectorDB.

split_documents(docs, chunk_size=1500, chunk_overlap=100)[source]

Splits documents into chunks for vectorization.

Parameters:

  • docs: list, documents to split.

  • chunk_size: int, size of each chunk. Default is 1500.

  • chunk_overlap: int, overlap between chunks. Default is 100.

Returns:

  • list: List of document chunks.

rag_skeleton.generation module

class rag_skeleton.generation.TextGenerator(model_name='meta-llama/Llama-3.2-3B-Instruct', device=None, load_mode='local', api_token=None)[source]

Bases: object

Generates text responses using a specified LLM model.

load_model()[source]

Loads the LLM model and tokenizer and sets up the text generation pipeline.

If load_mode is set to “api”, it uses the Hugging Face API to load the model and requires an API token. If load_mode is “local”, it loads the model and tokenizer locally from the Hugging Face repository.

Raises:

ValueError – If load_mode is “api” and api_token is not provided.

Configurations:

  • For both “api” and “local” modes, specific parameters such as temperature, do_sample, repetition_penalty, and max_new_tokens are set to control text generation behavior.

  • For the local model, the eos_token_id parameter is set to stop generation at specified tokens, ensuring response clarity.

rag_skeleton.rag module

class rag_skeleton.rag.RAGPipeline(vectordb_path='vectordb', embedding_model_name='Alibaba-NLP/gte-large-en-v1.5', model_name='meta-llama/Llama-3.2-3B-Instruct', load_mode='local', api_token=None, use_history=True, max_history=4)[source]

Bases: object

Combines document retrieval and text generation to create a Retrieval-Augmented Generation (RAG) pipeline with conversation history.

format_docs_with_history(docs)[source]

Formats retrieved documents and conversation history for the prompt context.

Parameters:

  • docs: list of documents retrieved for the query.

Returns:

  • tuple: (str, list) - formatted document text with history, and list of sources.

get_response(question)[source]

Fetches a response to the user’s query using the RAG pipeline, including conversation history.

Parameters:

  • question: str, the question to be answered.

Returns:

  • str: The generated response with sources for reference.

preview_prompt(question)[source]

Returns the prompt that will be passed to the LLM without invoking the generation step.

Parameters:

  • question: str, the question to be previewed.

Returns:

  • str: The formatted prompt with history and context.

setup_pipeline()[source]

Sets up the full RAG pipeline with retrieval, prompt formatting, and text generation.

This method configures the RAG pipeline by:

  • Defining the prompt template that guides the language model on how to respond to user queries.

  • Initializing the PromptTemplate with the defined format to structure the question, context, and conversation history.

  • Setting up the rag_chain pipeline, which includes:

    • Document retrieval: Retrieves relevant documents based on the user’s query.

    • Context formatting: Incorporates retrieved documents and conversation history.

    • Language model invocation: Passes the formatted prompt to the language model for generating a response.

    • Output parsing: Structures the final output format for the response.

Raises:

ValueError – If any pipeline component is misconfigured.

The final rag_chain processes queries end-to-end, combining retrieval and generation.

rag_skeleton.retrieval module

class rag_skeleton.retrieval.DocumentRetriever(vectordb_path='vectordb', embedding_model_name='Alibaba-NLP/gte-large-en-v1.5')[source]

Bases: object

Retrieves documents from the ChromaDB vector database using an embedding model.

get_retriever(search_type='similarity', search_kwargs={'k': 5})[source]

Returns a retriever instance for retrieving similar documents.

Parameters:

  • search_type: str, type of search (Can be “similarity”, “mmr”, or “similarity_score_threshold”). Default is “similarity”.

  • search_kwargs: dict, additional search parameters. Default is None.

Returns:

  • retriever: a retriever instance for document retrieval.

rag_skeleton.run module

rag_skeleton.run.main(data_path=None, load_mode='local', model_name='meta-llama/Llama-3.2-3B-Instruct', api_token=None, vectordb_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/ragskeleton/checkouts/latest/src/rag_skeleton/data/vectordb'))[source]

Initializes the RAG pipeline, ensuring the vector database is available or created.

Parameters:

  • data_path (str): Optional path to a directory of PDF files to process and build a new vector database.

  • load_mode (str): Specifies whether to load the model locally or via Hugging Face API. Options are ‘local’ (default) or ‘api’.

  • model_name (str): The name of the language model to use for generation. Default is ‘meta-llama/Llama-3.2-3B-Instruct’.

  • api_token (str, optional): Hugging Face API token, required if load_mode is set to ‘api’.

  • vectordb_path (Path, optional): Path to store the vector database. Defaults to package directory.

Module contents