FlagEmbedding
Retrieval and Retrieval-augmented LLMs
📊 仓库数据
🔗 相关工具
txtai
开源⭐ 13kgithub.com/neuml/txtai
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
RAG-Anything
开源⭐ 20kgithub.com/hkuds/rag-anything
"RAG-Anything: All-in-One RAG Framework"
langchain4j
开源⭐ 12kgithub.com/langchain4j/langchain4j
LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popular LLM providers and vector stores, and makes implementing too
Vearch
免费⭐ 2.3kgithub.com/vearch/vearch
Distributed vector search for AI-native applications
MarkItDown
开源⭐ 118k↑+412github.com/microsoft/markitdown
微软开源的通用文件转 Markdown 工具,支持 PDF、Word、PPT、Excel、HTML、音频、图片 OCR 等数十种格式转换,专为 LLM 和 RAG 数据预处理设计,插件系统可扩展,MIT 协议
🎯 文档格式转换、内容预处理
Firecrawl
免费+付费⭐ 113k↑+173github.com/firecrawl/firecrawl
AI 友好的网页抓取 API,支持 URL → Markdown/结构化数据,110K+ stars。专为 LLM 应用设计,自动处理 JS 渲染、分页、反爬,是 RAG 系统的理想数据源
🎯 RAG 系统数据源、AI 训练数据收集、网站内容提取