RAG Frameworks | AI Builder Resource Map

Publication status

This page is generated from the seed database and is marked Needs verification. It is useful for local discovery, but final public ranking and indexing should wait for manual source review.

Start here

Move from problem framing to a shortlist by checking required skills first, then tools, repositories, MCP resources, and caveats.

Skills

15 mapped skills

Tools

19 candidate tools

Repos

20 GitHub records

Required and useful skills

API Literacy

Engineering

Understanding authentication, rate limits, request shapes, errors, and source attribution.

Required or useful skill for resources in this use case.

SkillNeeds verificationEngineeringRequired

Data Ingestion

Data

Loading, cleaning, chunking, and normalizing documents or structured data.

Required or useful skill for resources in this use case.

SkillNeeds verificationDataRequired

Data Privacy

Governance

Handling private code, prompts, logs, documents, and user data safely.

Skill linked from curated resource requirements.

SkillNeeds verificationGovernanceRequired

Deployment Basics

Ops

Shipping static sites, APIs, and background jobs with clear environment boundaries.

Required or useful skill for resources in this use case.

SkillNeeds verificationOpsRecommended

Documentation Writing

Documentation

Writing concise setup, usage, troubleshooting, and reference documentation.

Skill linked from curated resource requirements.

SkillNeeds verificationDocumentationRecommended

Evaluation

Quality

Testing LLM quality, retrieval quality, task completion, and regression behavior.

Required or useful skill for resources in this use case.

SkillNeeds verificationQualityRecommended

Frontend Development

Web

Building usable interfaces with component systems, routing, and responsive layouts.

Skill linked from curated resource requirements.

SkillNeeds verificationWebRecommended

LLM Application Architecture

Designing data flow, model calls, tools, memory, and evaluation boundaries.

Required or useful skill for resources in this use case.

SkillNeeds verificationAIRecommended

Prompt Design

Structuring instructions, context, and examples for reliable AI outputs.

Required or useful skill for resources in this use case.

SkillNeeds verificationAIRequired

Python

Programming

General Python programming for automation, data, AI, and backend scripts.

Required or useful skill for resources in this use case.

SkillNeeds verificationProgrammingRequired

Retrieval-Augmented Generation

Connecting LLMs to external knowledge with retrieval, ranking, and grounded responses.

Required or useful skill for resources in this use case.

SkillNeeds verificationAIRequired

Self-Hosting Basics

Ops

Running open-source tools locally or on controlled infrastructure.

Required or useful skill for resources in this use case.

SkillNeeds verificationOpsOptional

Recommended tools

AnythingLLM

RAG App Platforms

Self-hosted workspace and RAG interface for documents and LLM workflows.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Chainlit

AI App Frameworks

Open-source Python framework for building conversational AI interfaces.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Chroma

Vector Databases

Embedding database commonly used for local and application-level RAG prototypes.

Seeded from manual curation; metadata enriched where possible.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Crawl4AI

Scraping Crawling Tools

Open-source crawler and scraper designed for LLM-friendly web extraction.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedGitHub linked

Dify

RAG App Platforms

Open-source platform for building LLM apps, agents, workflows, and RAG systems.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Firecrawl

Scraping Crawling Tools

Tool and API for turning websites into LLM-ready markdown or structured data.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Flowise

RAG App Platforms

Low-code visual builder for LLM apps, agent flows, and RAG workflows.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Haystack

RAG Frameworks

Open-source framework for building production-style search, RAG, and NLP pipelines.

Seeded from manual curation; metadata enriched where possible.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

LangChain

AI Agent Frameworks

Framework ecosystem for building LLM applications, agents, and RAG workflows.

Seeded from manual curation; metadata enriched where possible.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Langflow

RAG App Platforms

Visual framework for building LLM apps, agents, and RAG pipelines.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

LangGraph

AI Agent Frameworks

Framework for building stateful, controllable LLM agent workflows.

Curated tool relationship for future one-stop directory pages.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

LlamaIndex

RAG Frameworks

Framework focused on connecting private or domain data to LLM applications.

Seeded from manual curation; metadata enriched where possible.

ToolNeeds verificationOpen sourceSelf-hostedFreemiumGitHub linked

Repository candidates

ollama/ollama

Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedGoMITHigh starsActive signal

langchain-ai/langchain

Python

The agent engineering platform.

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedPythonMITHigh starsActive signal

run-llama/llama_index

Python

LlamaIndex is the leading document agent and OCR platform

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedPythonMITHigh starsActive signal

qdrant/qdrant

Rust

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedRustApache-2.0Active signal

chroma-core/chroma

Rust

Search infrastructure for AI

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedRustApache-2.0Active signal

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedMDXApache-2.0Active signal

milvus-io/milvus

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Official GitHub repository metadata fetched where API allowed.

RepositoryAuto enrichedGoApache-2.0Active signal

langgenius/dify

Typescript

Production-ready platform for agentic workflow development.

High-star GitHub discovery seed from query `topic:rag stars:>1000`.

RepositoryAuto enrichedTypeScriptHigh starsActive signal

open-webui/open-webui

Python

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

High-star GitHub discovery seed from query `topic:rag stars:>1000`.

RepositoryAuto enrichedPythonHigh starsActive signal

Shubhamsaboo/awesome-llm-apps

Python

100+ AI Agent & RAG apps you can actually run — clone, customize, ship.

High-star GitHub discovery seed from query `topic:rag stars:>1000`.

RepositoryAuto enrichedPythonApache-2.0High starsActive signal

infiniflow/ragflow

Python

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

High-star GitHub discovery seed from query `topic:rag stars:>1000`.

RepositoryAuto enrichedPythonApache-2.0High starsActive signal

PaddlePaddle/PaddleOCR

Python

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

High-star GitHub discovery seed from query `topic:rag stars:>1000`.

RepositoryAuto enrichedPythonApache-2.0High starsActive signal

Caveats and failure modes

Records are useful for discovery but still need source, license, pricing, and summary review before public ranking.