StarCoder代碼生成語言模型
StarCoder(150 億參數(shù))是 Hugging Face 聯(lián)合 ServiceNow 發(fā)布的免費(fèi)大型語言模型,該模型經(jīng)過訓(xùn)練主要用途是可以生成代碼,目的是為了對抗 GitHub Copilot 和亞馬遜 CodeWhisperer 等基于 AI 的編程工具。
其訓(xùn)練數(shù)據(jù)包含 80 多種不同的編程語言以及從 GitHub 中提取的文本。
安裝
首先,我們必須安裝 requirements.txt 中列出的所有庫
pip install -r requirements.txt
代碼生成
代碼生成 pipeline 如下
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "bigcode/starcoder"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# to save memory consider using fp16 or bf16 by specifying torch.dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
或者
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
checkpoint = "bigcode/starcoder"
model = AutoModelForCausalLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print( pipe("def hello():") )
評論
圖片
表情
