黄色福利社,色丁香婷婷,91AV二区,AAA在线观看视频,免费黄色视频观看,国产精品色婷婷99久久精品,亚洲干x网,在线免费高清无码

??在使用大模型的時候，盡管我們的prompt已經(jīng)要求大模型給出固定的輸出格式，比如JSON，但很多時候，大模型還是會輸出額外的信息，使得我們在對輸出信息進行結(jié)構(gòu)化的時候產(chǎn)生困難。LangChain工具可以很好地將輸出信息進行結(jié)構(gòu)化。
??關(guān)于LangChain的結(jié)構(gòu)化輸出，可參考網(wǎng)址：https://python.langchain.com/docs/modules/model_io/output_parsers/ ，這其中，我們較為關(guān)注的是Structured output parser，通過定義StructuredOutputParser來使用，使用方式如下：

      response_schemas?=?[
????ResponseSchema(name="answer",?description="answer?to?the?user's?question"),
????ResponseSchema(name="source",?description="source?used?to?answer?the?user's?question,?should?be?a?website.")
]
output_parser?=?StructuredOutputParser.from_response_schemas(response_schemas)

一個例子

??首先，我們先來看一個簡單的結(jié)構(gòu)化輸出的prompt的寫法：

      #?-*-?coding:?utf-8?-*-
from?langchain.output_parsers?import?StructuredOutputParser,?ResponseSchema
from?langchain.prompts?import?PromptTemplate

#?告訴他我們生成的內(nèi)容需要哪些字段，每個字段類型式啥
response_schemas?=?[
????ResponseSchema(type="string",?name="bad_string",?description="This?a?poorly?formatted?user?input?string"),
????ResponseSchema(type="string",?name="good_string",?description="This?is?your?response,?a?reformatted?response")
]

#?初始化解析器
output_parser?=?StructuredOutputParser.from_response_schemas(response_schemas)

#?生成的格式提示符
format_instructions?=?output_parser.get_format_instructions()
print(format_instructions)

#?加入至template中
template?=?"""
You?will?be?given?a?poorly?formatted?string?from?a?user.
Reformat?it?and?make?sure?all?the?words?are?spelled?correctly

{format_instructions}

%?USER?INPUT:
{user_input}

YOUR?RESPONSE:
"""

#?將我們的格式描述嵌入到prompt中去，告訴llm我們需要他輸出什么樣格式的內(nèi)容
prompt?=?PromptTemplate(
????input_variables=["user_input"],
????partial_variables={"format_instructions":?format_instructions},
????template=template
)

promptValue?=?prompt.format(user_input="welcom?to?califonya!")
print(promptValue)

此時，結(jié)構(gòu)化輸出的內(nèi)容如下：

      The?output?should?be?a?markdown?code?snippet?formatted?in?the?following?schema,?including?the?leading?and?trailing?"```json"?and?"```":

\```json
{
????"bad_string":?string??//?This?a?poorly?formatted?user?input?string
????"good_string":?string??//?This?is?your?response,?a?reformatted?response
}
\```

prompt的內(nèi)容如下：

      You?will?be?given?a?poorly?formatted?string?from?a?user.
Reformat?it?and?make?sure?all?the?words?are?spelled?correctly

The?output?should?be?a?markdown?code?snippet?formatted?in?the?following?schema,?including?the?leading?and?trailing?"```json"?and?"```":

\```json
{
????"bad_string":?string??//?This?a?poorly?formatted?user?input?string
????"good_string":?string??//?This?is?your?response,?a?reformatted?response
}
\```

%?USER?INPUT:
welcom?to?califonya!

YOUR?RESPONSE:

將上述內(nèi)容使用LLM進行回復，示例Pyhon代碼如下（接上述代碼）：

      from?langchain.llms?import?OpenAI

#?set?api?key
import?os
os.environ["OPENAI_API_KEY"]?=?'sk-xxx'

llm?=?OpenAI(model_name="text-davinci-003")

llm_output?=?llm(promptValue)
print(llm_output)

#?使用解析器進行解析生成的內(nèi)容
print(output_parser.parse(llm_output))

輸出結(jié)果如下：

      ```json
{
????"bad_string":?"welcom?to?califonya!",
????"good_string":?"Welcome?to?California!"
}
\```
{'bad_string':?'welcom?to?califonya!',?'good_string':?'Welcome?to?California!'}

可以看到，大模型的輸出結(jié)果正是我們所要求的JSON格式，且字段和數(shù)據(jù)類型、輸出結(jié)果付出預期。

結(jié)構(gòu)化抽取

??有了上述的結(jié)構(gòu)化輸出，我們嘗試對文本進行結(jié)構(gòu)化抽取，并使得抽取結(jié)果按JSON格式輸出。示例代碼如下：

      #?-*-?coding:?utf-8?-*-
from?langchain.output_parsers?import?StructuredOutputParser,?ResponseSchema
from?langchain.prompts?import?PromptTemplate
from?langchain.llms?import?OpenAI

#?set?api?key
import?os
os.environ["OPENAI_API_KEY"]?=?'sk-xxx'

llm?=?OpenAI(model_name="gpt-3.5-turbo")

#?告訴他我們生成的內(nèi)容需要哪些字段，每個字段類型式啥
response_schemas?=?[
????ResponseSchema(type="number",?name="number",?description="文本中的數(shù)字"),
????ResponseSchema(type="string",?name="people",?description="文本中的人物"),
????ResponseSchema(type="string",?name="place",?description="文本中的地點"),
]

#?初始化解析器
output_parser?=?StructuredOutputParser.from_response_schemas(response_schemas)

#?生成的格式提示符
format_instructions?=?output_parser.get_format_instructions()
print(format_instructions)

template?=?"""
給定下面的文本，找出特定的結(jié)構(gòu)化信息。

{format_instructions}

%?USER?INPUT:
{user_input}

YOUR?RESPONSE:
"""

#?prompt
prompt?=?PromptTemplate(
????input_variables=["user_input"],
????partial_variables={"format_instructions":?format_instructions},
????template=template
)

promptValue?=?prompt.format(user_input="張曉明今天在香港坐了2趟地鐵。")
print(promptValue)
llm_output?=?llm(promptValue)
print(llm_output)

#?使用解析器進行解析生成的內(nèi)容
print(output_parser.parse(llm_output))

在這個例子中，我們要求從輸入文本中抽取出number、people、place字段，數(shù)據(jù)類型分別為number、string、string，抽取要求為文本中的數(shù)字、人物、地點。注意，ResponseSchema中的數(shù)據(jù)類型（type）與JSON數(shù)據(jù)類型一致。
??對兩個樣例文本進行抽取，抽取結(jié)果如下：

輸入：張曉明今天在香港坐了2趟地鐵。
抽取結(jié)果：{'number': 2, 'people': '張曉明', 'place': '香港'}

輸入：昨天B站14周年的分享會上，B站CEO陳睿對這個指標做了官方的定義，用戶觀看視頻所花費的時間，也就是播放分鐘數(shù)。
抽取結(jié)果：{'number': 14, 'people': '陳睿', 'place': 'B站'}

抽取的結(jié)果大多符合預期，除了第二句中將B站識別為地點，不太合理。當然，我們寫的prompt過于簡單，讀者可以嘗試更多更好的prompt的描述方式，這樣也許能提升抽取的效果。

結(jié)構(gòu)化抽取進階

??在這個例子中，我們使用結(jié)構(gòu)化輸出，來實現(xiàn)NLP中的常見任務(wù)：命名實體識別（NER），我們需要從文本中抽取出其中的時間、人物、地點、組織機構(gòu)等，并以JSON格式輸出，每個字段都以列表形式呈現(xiàn)。
??實現(xiàn)的Python代碼如下:

      #?-*-?coding:?utf-8?-*-
from?langchain.output_parsers?import?StructuredOutputParser,?ResponseSchema
from?langchain.prompts?import?PromptTemplate
from?langchain.llms?import?OpenAI

#?set?api?key
import?os
os.environ["OPENAI_API_KEY"]?=?'sk-xxx'

llm?=?OpenAI(model_name="gpt-3.5-turbo")

#?告訴他我們生成的內(nèi)容需要哪些字段，每個字段類型式啥
response_schemas?=?[
????ResponseSchema(type="array",?name="time",?description="文本中的日期時間列表"),
????ResponseSchema(type="array",?name="people",?description="文本中的人物列表"),
????ResponseSchema(type="array",?name="place",?description="文本中的地點列表"),
????ResponseSchema(type="array",?name="org",?description="文本中的組織機構(gòu)列表"),
]

#?初始化解析器
output_parser?=?StructuredOutputParser.from_response_schemas(response_schemas)

#?生成的格式提示符
format_instructions?=?output_parser.get_format_instructions()
print(format_instructions)

template?=?"""
給定下面的文本，找出特定的實體信息，并以結(jié)構(gòu)化數(shù)據(jù)格式返回。

{format_instructions}

%?USER?INPUT:
{user_input}

YOUR?RESPONSE:
"""

#?prompt
prompt?=?PromptTemplate(
????input_variables=["user_input"],
????partial_variables={"format_instructions":?format_instructions},
????template=template
)

promptValue?=?prompt.format(user_input="6月26日，廣汽集團在科技日上首次公開展示飛行汽車項目，飛行汽車GOVE完成全球首飛。廣汽研究院院長吳堅表示，GOVE可以垂直起降，并搭載雙備份多旋翼飛行系統(tǒng)，保障飛行安全。")
print(promptValue)
llm_output?=?llm(promptValue)
print(llm_output)

#?使用解析器進行解析生成的內(nèi)容
print(output_parser.parse(llm_output))

我們在三個示例文本中進行實驗，看看大模型在NER方面的表現(xiàn)：

輸入：6月26日周一，烏克蘭總統(tǒng)澤連斯基視察了烏克蘭軍隊東線司令部總部，而就在幾小時前，俄羅斯宣布控制該國東部頓涅茨克以南的利夫諾波爾。
抽取結(jié)果：{'time': ['6月26日周一'], 'people': ['澤連斯基'], 'place': ['烏克蘭軍隊東線司令部總部', '頓涅茨克', '利夫諾波爾'], 'org': ['烏克蘭總統(tǒng)', '俄羅斯']}

輸入：日前，馬自達找來梁家輝代言汽車，引發(fā)了業(yè)內(nèi)的熱議。相信很多人對于馬自達的品牌認知來自梁家輝的那部電影。
抽取結(jié)果：{'time': [], 'people': ['梁家輝'], 'place': [], 'org': ['馬自達']}

輸入：6月26日，廣汽集團在科技日上首次公開展示飛行汽車項目，飛行汽車GOVE完成全球首飛。廣汽研究院院長吳堅表示，GOVE可以垂直起降，并搭載雙備份多旋翼飛行系統(tǒng)，保障飛行安全。
抽取結(jié)果：{'time': ['6月26日'], 'people': ['吳堅'], 'place': [], 'org': ['廣汽集團', '廣汽研究院']}

可以看到，在經(jīng)過簡單的prompt和結(jié)構(gòu)化輸出后，大模型的抽取結(jié)果大致上令人滿意。

總結(jié)

??本文主要介紹了LangChain的結(jié)構(gòu)化輸出，這在我們需要對模型輸出結(jié)果進行結(jié)構(gòu)化解析時較為有用，同時，我們也能接住結(jié)構(gòu)化輸出完成一些常見的NLP的任務(wù)，如NER等。

參考文獻

LangChain 中文入門教程: https://liaokong.gitbook.io/llm-kai-fa-jiao-cheng/
Structured output parser: https://python.langchain.com/docs/modules/model_io/output_parsers/structured

NLP(五十六)LangChain的結(jié)構(gòu)化輸出

一個例子

結(jié)構(gòu)化抽取

結(jié)構(gòu)化抽取進階

總結(jié)

參考文獻