QAnything (Question and Answer based on Anything) 是致力于支持任意格式文件或数据库的本地知识库问答系统,可断网安装使用。
您的任何格式的本地文件都可以往里扔,即可获得准确、快速、靠谱的问答体验。
目前已支持格式: PDF,Word(doc/docx),PPT,Markdown,Eml,TXT,图片(jpg,png等),网页链接,更多格式,敬请期待…
特点
架构
知识库数据量大的场景下两阶段优势非常明显,如果只用一阶段embedding检索,随着数据量增大会出现检索退化的问题,如下图中绿线所示,二阶段rerank重排后能实现准确率稳定增长,即数据越多,效果越好。
BCEmbedding是由网易有道开发的中英双语和跨语种语义表征算法模型库,其中包含 EmbeddingModel和 RerankerModel两类基础模型。EmbeddingModel专门用于生成语义向量,在语义搜索和问答中起着关键作用,而 RerankerModel擅长优化语义搜索结果和语义相关顺序精排。
BCEmbedding作为有道的检索增强生成式应用(RAG)的基石,特别是在QAnything [github]中发挥着重要作用。QAnything作为一个网易有道开源项目,在有道许多产品中有很好的应用实践,比如有道速读和有道翻译。
QAnything使用的检索组件BCEmbedding有非常强悍的双语和跨语种能力,能消除语义检索里面的中英语言之间的差异,
从而实现:
EmbeddingModel支持中文和英文(之后会支持更多语种);RerankerModel支持中文,英文,日文和韩文。
模型名称 | Retrieval | STS | PairClassification | Classification | Reranking | Clustering | 平均 |
---|---|---|---|---|---|---|---|
bge-base-en-v1.5 | 37.14 | 55.06 | 75.45 | 59.73 | 43.05 | 37.74 | 47.20 |
bge-base-zh-v1.5 | 47.60 | 63.72 | 77.40 | 63.38 | 54.85 | 32.56 | 53.60 |
bge-large-en-v1.5 | 37.15 | 54.09 | 75.00 | 59.24 | 42.68 | 37.32 | 46.82 |
bge-large-zh-v1.5 | 47.54 | 64.73 | 79.14 | 64.19 | 55.88 | 33.26 | 54.21 |
jina-embeddings-v2-base-en | 31.58 | 54.28 | 74.84 | 58.42 | 41.16 | 34.67 | 44.29 |
m3e-base | 46.29 | 63.93 | 71.84 | 64.08 | 52.38 | 37.84 | 53.54 |
m3e-large | 34.85 | 59.74 | 67.69 | 60.07 | 48.99 | 31.62 | 46.78 |
bce-embedding-base_v1 | 57.60 | 65.73 | 74.96 | 69.00 | 57.29 | 38.95 | 59.43 |
模型名称 | Reranking | 平均 |
---|---|---|
bge-reranker-base | 57.78 | 57.78 |
bge-reranker-large | 59.69 | 59.69 |
bce-reranker-base_v1 | 60.06 | 60.06 |
Language: en
Task Type: Reranking
Model | AskUbuntuDupQuestions | MindSmallReranking | SciDocsRR | StackOverflowDupQuestions | AVG |
---|---|---|---|---|---|
bge-reranker-base | 54.70 | 28.48 | 67.09 | 37.55 | 46.96 |
bge-reranker-large | 58.73 | 28.84 | 71.30 | 39.04 | 49.48 |
bce-reranker-base_v1 | 56.54 | 30.73 | 75.79 | 42.88 | 51.48 |
Summary on en
Model | Reranking | AVG |
---|---|---|
bge-reranker-base | 46.96 | 46.96 |
bge-reranker-large | 49.48 | 49.48 |
bce-reranker-base_v1 | 51.48 | 51.48 |
Language: zh
Task Type: Reranking
Model | T2Reranking | MMarcoReranking | CMedQAv1 | CMedQAv2 | AVG |
---|---|---|---|---|---|
bge-reranker-base | 67.28 | 35.46 | 81.27 | 84.10 | 67.03 |
bge-reranker-large | 67.60 | 37.64 | 82.14 | 84.18 | 67.89 |
bce-reranker-base_v1 | 70.25 | 34.13 | 79.64 | 81.31 | 66.33 |
Summary on zh
Model | Reranking | AVG |
---|---|---|
bge-reranker-base | 67.03 | 67.03 |
bge-reranker-large | 67.89 | 67.89 |
bce-reranker-base_v1 | 66.33 | 66.33 |
Language: en-zh
Task Type: Reranking
Model | T2RerankingEn2Zh | MMarcoRerankingEn2Zh | AVG |
---|---|---|---|
bge-reranker-base | 60.45 | 64.41 | 62.43 |
bge-reranker-large | 61.64 | 67.17 | 64.41 |
bce-reranker-base_v1 | 63.63 | 67.92 | 65.78 |
Summary on en-zh
Model | Reranking | AVG |
---|---|---|
bge-reranker-base | 62.43 | 62.43 |
bge-reranker-large | 64.41 | 64.41 |
bce-reranker-base_v1 | 65.78 | 65.78 |
Language: zh-en
Task Type: Reranking
Model | T2RerankingZh2En | MMarcoRerankingZh2En | AVG |
---|---|---|---|
bge-reranker-base | 63.94 | 63.79 | 63.87 |
bge-reranker-large | 64.13 | 67.89 | 66.01 |
bce-reranker-base_v1 | 65.38 | 67.23 | 66.31 |
Summary on zh-en
Model | Reranking | AVG |
---|---|---|
bge-reranker-base | 63.87 | 63.87 |
bge-reranker-large | 66.01 | 66.01 |
bce-reranker-base_v1 | 66.31 | 66.31 |
Summary on all langs: ['en', 'zh', 'en-zh', 'zh-en']
Model | Reranking (12) | AVG (12) |
---|---|---|
bge-reranker-base | 59.04 | 59.04 |
bge-reranker-large | 60.86 | 60.86 |
bce-reranker-base_v1 | 61.29 | 61.29 |
NOTE:
开源版本QAnything的大模型基于通义千问,并在大量专业问答数据集上进行微调;在千问的基础上大大加强了问答的能力。
如果需要商用请遵循千问的license,具体请参阅:通义千问
git clone https://github.com/netease-youdao/QAnything.git
cd QAnything
bash run.sh # 默认在0号GPU上启动
cd QAnything
bash run.sh 0 # 指定0号GPU启动 GPU编号从0开始 windows机器一般只有一张卡,所以只能指定0号GPU
cd QAnything
bash run.sh 0,1 # 指定0,1号GPU启动,请确认有多张GPU可用,最多支持两张卡启动
your_host
:5052/qanything/your_host
:8777/api/bash close.sh
#通过命令查看脚本文件是dos格式还是unix格式,dos格式的文件行尾为^M$ ,unix格式的文件行尾为$:
cat -A scripts/run_for_local.sh # 验证文件格式
sed -i "s/\r//" scripts/run_for_local.sh
sed -i "s/^M//" scripts/run_for_local.sh
cat -A scripts/run_for_local.sh # 验证文件格式
在前端页面输入问题后,返回结果报错:Triton Inference Error (error_code: 4)
在前端页面输入问题后,返回结果是类似后面的乱码:omiteatures贶.scrollHeight㎜eaturesodo Curse.streaming pulumi窟IDI贶沤贶.scrollHeight贶贶贶eatures谜.scrollHeight她是
服务启动报错,在api.log中显示:mysql.connector.errors.DatabaseError: 2003 (HY000): Can’t connect to MySQL server on ‘mysql-container-local:3306’ (111)
服务启动报错:ERROR: for qanything-container-local Cannot start service qanything_local: could not select device driver “nvidia” with capabilities: [[gpu]]
服务启动报错:nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/xxxxxx/libnvidia-ml.s0.1: file exists: unknown
参考链接:
https://github.com/netease-youdao/QAnything/blob/master
更多内容请关注:
更多优质内容请关注公号:汀丶人工智能;会提供一些相关的资源和优质文章,免费获取阅读。