RAGSkeleton

Figure: Schematic of a Retrieval-Augmented Generation (RAG) system. Adapted from Elevating Your Retrieval Game: Insights from Real-world Deployments.
RAGSkeleton: A foundational, modular framework for building customizable Retrieval-Augmented Generation (RAG) systems across any domain.
This project was originally developed as a Retrieval-Augmented Generation (RAG) system for materials science literature. However, it can be easily adapted for any domain with minimal modifications, making it a flexible RAG skeleton architecture. Users can adjust the system prompt to tailor responses to any field, making this setup a versatile foundation for a RAG system.
With this skeleton, users can seamlessly swap out different components—such as the embedding model, vector database, or LLM for response generation—with their preferred options, and quickly build a RAG system suited to their data and needs. This architecture significantly reduces the time and effort required to create a custom RAG system.
Installation
To set up the necessary environment:
Create the Conda Environment: Use the provided
environment.ymlfile to create a new conda environment with all necessary dependencies:conda env create -f environment.yml
Activate the Environment:
conda activate rag_skeleton
NOTE: The conda environment will have RAGSkeleton installed in editable mode. Some changes, e.g., in setup.cfg, might require you to run pip install -e . again.
Optional Setup for Git Hooks:
Install pre-commit git hooks with:
pre-commit installThis configuration can be modified in .pre-commit-config.yaml.
Install nbstripout to automatically remove notebook output cells in committed files:
nbstripout --install --attributes notebooks/.gitattributes
Setting Up the RAG System
1. Prepare Documents
Place your raw PDF files in the data/raw directory. Ensure the folder contains the files you want to process and query.
2. Process and Index Documents
To process and index your documents, navigate to the project root directory and open a Python interactive session by typing:
python
Then, run the following code to load, process, and create the vector database:
from src.rag_skeleton.data_processing import DataProcessor
# Initialize and run data processing
process_data = DataProcessor()
process_data.process_and_create_db()
After running these commands, exit the Python session by typing:
exit()
This step will:
Load the PDF files.
Parse and split them into chunks of text.
Create an indexable vector database for efficient document retrieval.
Default Components:
The default vector store used is Chroma, an open-source vector database optimized for speed and flexibility.
The default embedding model is “Alibaba-NLP/gte-large-en-v1.5”, chosen for its smaller size and good Retrieval Average on the Massive Text Embedding Benchmark (MTEB) Leaderboard at the time of this project. Note: The MTEB leaderboard is frequently updated with new and improved models. While
"Alibaba-NLP/gte-large-en-v1.5"was a top choice during development, you may want to explore updated models on the MTEB leaderboard to identify the best current model for your specific needs.
Running the RAG System
You can run the RAG system in two modes: local and API. The default mode is local.
Option 1: Local Mode
In local mode, the model is loaded from your local machine. This option is suitable if you have the necessary computational resources.
Run the Chatbot Locally
Before running locally, ensure you have access to the Hugging Face model. You’ll need to log in to Hugging Face and download the model the first time you use it. Run the following command to log in to Hugging Face from your terminal:
huggingface-cli login
This will prompt you for your Hugging Face access token. After logging in, you can proceed to load the model locally.
python src/rag_skeleton/run.py --load_mode local --model_name "your_model_name"
--load_mode: Set tolocalto load the model locally (default islocal).--model_name: Specify the model name you want to use. The default is"meta-llama/Llama-3.2-3B-Instruct".
Example:
python src/rag_skeleton/run.py --load_mode local --model_name "meta-llama/Llama-3.2-3B-Instruct"
Option 2: API Mode
In API mode, the model is accessed via the Hugging Face API. This is useful if you prefer not to host the model locally or lack the computational resources.
With the API, you can opt for larger models that might otherwise be challenging to run locally. However, keep in mind that the Hugging Face Free API has a model size limit of 10GB. If you need to use larger models, consider a paid API plan or explore model optimization techniques.
Run the Chatbot via Hugging Face API
python src/rag_skeleton/run.py --load_mode api --model_name "your_model_name" --api_token YOUR_API_TOKEN
--load_mode: Set toapito use the Hugging Face API.--model_name: Specify the model name available on Hugging Face Hub.--api_token: Your Hugging Face API token (required when usingapimode).
Example:
python src/rag_skeleton/run.py --load_mode api --model_name "meta-llama/Meta-Llama-3-8B-Instruct" --api_token YOUR_API_TOKEN
Note: You must have a Hugging Face account and an API token to use the API mode. You can obtain an API token by signing up at Hugging Face and navigating to your account settings.
Usage
When you run the RAG system, you’ll be greeted with:
Welcome to the RAG chatbot. Type 'exit' to quit.
Type your questions into the chatbot interface. The system will provide answers grounded in the content of your documents.
The response will include sources for the context used in generating the answer, helping you trace the information back to the original documents. These sources will be listed at the end of the response, labeled with document titles or other metadata for easy reference.
To exit, simply type exit.
Documentation
Full documentation is available on Read the Docs.
Dependency Management & Reproducibility
Always keep your abstract (unpinned) dependencies updated in
environment.ymland eventually insetup.cfgif you want to ship and install your package viapiplater on.Create concrete dependencies as
environment.lock.ymlfor the exact reproduction of your environment with:conda env export -n rag_skeleton -f environment.lock.yml
For multi-OS development, consider using
--no-buildsduring the export.Update your current environment with respect to a new
environment.lock.ymlusing:conda env update -f environment.lock.yml --prune
Project Organization
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── CONTRIBUTING.md <- Guidelines for contributing to this project.
├── Dockerfile <- Build a docker container with `docker build .`.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── environment.yml <- The conda environment file for reproducibility.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── scripts <- Analysis and production scripts which import the
│ actual PYTHON_PKG, e.g. train_model.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── rag_skeleton <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
Feedback
Any questions, comments, or suggestions are welcome! This project is a flexible foundation for RAG-based applications, and we’re open to improvements that can make it even more useful across various domains.
Note
This project has been set up using PyScaffold 4.6 and the dsproject extension 0.7.2.