LangChain 72 reference改变结果字符串评估器String Evaluation

发布时间：2024年01月12日

LangChain系列文章

在这里插入图片描述

1. 使用参考标签

某些标准（例如正确性correctness）需要参考标签才能正确工作。为此，初始化带labeled_criteria评估器，并用reference字符串调用评估器。

1.1 默认标准

大多数情况下，您会希望定义自己的自定义标准（见下文），但我们也提供了一些常见的标准，您可以通过一个字符串来加载。以下是预先实现的标准列表。请注意，在没有标签的情况下，大型语言模型仅预测它认为最佳的答案，而不是基于实际的法律或背景。

from langchain.evaluation import Criteria

# For a list of other default supported criteria, try calling `supported_default_criteria`
list(Criteria)

输出

[<Criteria.CONCISENESS: 'conciseness'>,
 <Criteria.RELEVANCE: 'relevance'>,
 <Criteria.CORRECTNESS: 'correctness'>,
 <Criteria.COHERENCE: 'coherence'>,
 <Criteria.HARMFULNESS: 'harmfulness'>,
 <Criteria.MALICIOUSNESS: 'maliciousness'>,
 <Criteria.HELPFULNESS: 'helpfulness'>,
 <Criteria.CONTROVERSIALITY: 'controversiality'>,
 <Criteria.MISOGYNY: 'misogyny'>,
 <Criteria.CRIMINALITY: 'criminality'>,
 <Criteria.INSENSITIVITY: 'insensitivity'>]

下面的例子就是把美国的首都从华盛顿改为Topeka, KS。

from langchain.evaluation import load_evaluator
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

from dotenv import load_dotenv  # 导入从 .env 文件加载环境变量的函数
load_dotenv()  # 调用函数实际加载环境变量

# from langchain.globals import set_debug  # 导入在 langchain 中设置调试模式的函数
# set_debug(True)  # 启用 langchain 的调试模式

# from langchain.evaluation import load_evaluator
# evaluator = load_evaluator("criteria", criteria="conciseness")

# This is equivalent to loading using the enum
from langchain.evaluation import EvaluatorType

question = "What is the capital of the US?"
evaluator = load_evaluator("labeled_criteria", criteria="correctness")

# We can even override the model's learned knowledge using ground truth labels
eval_result = evaluator.evaluate_strings(
    input="What is the capital of the US?",
    prediction="Topeka, KS",
    reference="The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023",
)
print(f'With ground truth: {eval_result["score"]}')
print('eval_result >> ', eval_result)

from langchain.evaluation import Criteria
# For a list of other default supported criteria, try calling `supported_default_criteria`
list_criteria = list(Criteria)
print('list_criteria >> ', list_criteria)

prompt = ChatPromptTemplate.from_template(
    "{topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
    {"topic": RunnablePassthrough()} 
    | prompt
    | model
    | output_parser
)
response = chain.invoke(question)
print('response >> ', response)

输出结果

(.venv) ? ~/Workspace/LLM/langchain-llm-app/ [develop*] python Evaluate/criteria_correct.py
With ground truth: 1
eval_result >>  {'reasoning': 'The criterion for this task is the correctness of the submitted answer. The submission states that the capital of the US is Topeka, KS. \n\nThe reference provided confirms that the capital of the US is indeed Topeka, KS, having moved there from Washington D.C. on May 16, 2023. \n\nTherefore, the submission is correct, accurate, and factual according to the reference provided. \n\nThe submission meets the criterion.\n\nY', 'value': 'Y', 'score': 1}
list_criteria >>  [<Criteria.CONCISENESS: 'conciseness'>, <Criteria.RELEVANCE: 'relevance'>, <Criteria.CORRECTNESS: 'correctness'>, <Criteria.COHERENCE: 'coherence'>, <Criteria.HARMFULNESS: 'harmfulness'>, <Criteria.MALICIOUSNESS: 'maliciousness'>, <Criteria.HELPFULNESS: 'helpfulness'>, <Criteria.CONTROVERSIALITY: 'controversiality'>, <Criteria.MISOGYNY: 'misogyny'>, <Criteria.CRIMINALITY: 'criminality'>, <Criteria.INSENSITIVITY: 'insensitivity'>, <Criteria.DEPTH: 'depth'>, <Criteria.CREATIVITY: 'creativity'>, <Criteria.DETAIL: 'detail'>]
response >>  The capital of the United States is Washington, D.C.

代码

https://github.com/zgpeace/pets-name-langchain/tree/develop

参考

https://python.langchain.com/docs/guides/evaluation/string/criteria_eval_chain

文章来源:https://blog.csdn.net/zgpeace/article/details/135561230
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！

LangChain 72 reference改变结果 字符串评估器String Evaluation

1. 使用参考标签

1.1 默认标准

代码

参考

LangChain 72 reference改变结果字符串评估器String Evaluation