这个新闻可能比GPT4还要大

2023 年 3 月 15 日

GPT4已经发布，让我们拭目以待，凡事市场，总得有2个以上玩家才好玩

斯坦福大学开发了一个名叫“羊驼”的语言模型，这个新闻大不是因为斯坦福大学热衷于羊驼，而是这个“羊驼”可能让你身边最便宜的小设备可以跑比ChatGPT强100倍的AI。

斯坦福微调了 7B LLaMA 模型，只用了 52K 的数据，达到了和达芬奇003类似的效果（达芬奇003是openAI最引以为豪的GPT技术，在OpenAI API上卖的最贵），并且可以跑在比如树莓派（Raspberry Pi）的消费级设备上，而且有个家伙已经跑通了。

引述大v Orange.ai的话是：“这个模型没有经过道德训练，也就是会乱说触犯各国人类禁忌的话。如果以后人手一个自己的本地语言模型，审查会完全失灵。它的训练成本奇低，数据生成过程产生 52K 条独特指令和相应的输出，使用 OpenAI API 的成本不到 500 美元。在 8 个 80GB A100 上微调一个 7B LLaMA 模型需要 3 个小时，这对大多数云计算提供商来说成本不到 100 美元。"

以下是羊驼的详细（亦可移步github，https://github.com/tatsu-lab/stanford_alpaca）

Stanford Alpaca: An Instruction-followingLLaMA model

斯坦福羊驼：一种遵循指令的 LLaMA 模型

This is the repo for the Stanford Alpaca project, which aims to build and sharean instruction-following LLaMA model. The repo contains:

这是 Stanford Alpaca 项目的 repo，该项目旨在构建和共享指令遵循的 LLaMA模型。回购包含：

A web demo to interact with our Alpaca model
与我们的羊驼模型交互的网络演示
The 52K data used for fine-tuning the model
用于微调模型的 52K 数据
The code for generating the data
生成数据的代码

Overview 概述

The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section. In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the text-davinci-003 model on the Self-Instruct instruction-following evaluation suite [2].

当前的羊驼模型是根据 7B LLaMA 模型 [1] 在 Self-Instruct [2] 论文中的技术生成的 52K 指令跟随数据上进行微调的，我们将在下一节中讨论一些修改。在初步的人类评估中，我们发现 Alpaca 7B 模型在 Self-Instruct 指令遵循评估套件 [2] 上的行为类似于 text-davinci-003 模型。

Alpaca is still under development, and there are many limitations that have tobe addressed. Importantly, we have not yet fine-tuned the Alpaca model to besafe and harmless. We thus encourage users to be cautious when interactingwith Alpaca, and to report any concerning behavior to help improve the safetyand ethical considerations of the model.

Alpaca 仍在开发中，有许多限制需要解决。重要的是，我们还没有微调羊驼模型使其安全无害。因此，我们鼓励用户在与羊驼互动时保持谨慎，并报告任何相关行为，以帮助提高模型的安全性和道德考虑。

Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission todo so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.

我们的初始版本包含数据生成过程、数据集和训练方法。如果 LLaMA 的创建者允许我们这样做，我们打算发布模型权重。目前，我们选择举办现场演示，以帮助读者更好地了解 Alpaca 的能力和局限性，同时也是一种帮助我们更好地评估 Alpaca 在更广泛受众中的表现的方式。

Please read our release blog post for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thoughtprocess of an open-source release.

请阅读我们的发布博文，了解有关该模型的更多详细信息、我们对羊驼毛模型的潜在危害和局限性的讨论，以及我们对开源发布的思考过程。

[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron,Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1

[1]：LLaMA：开放高效的基础语言模型。 Hugo Touvron、Thibaut Lavril、Gautier Izacard、Xavier Martinet、Marie-Anne Lachaux、Timothée Lacroix、Baptiste Rozière、Naman Goyal、Eric Hambro、Faisal Azhar、Aurelien Rodriguez、Armand Joulin、Edouard Grave、Guillaume Lample。 https://arxiv.org/abs/2302.13971v1

[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560

[2]：自指导：将语言模型与自生成指令对齐。 Yizhong Wang、Yeganeh Kordi、Swaroop Mishra、Alisa Liu、Noah A. Smith、Daniel Khashabi、HannanehHajishirzi。 https://arxiv.org/abs/2212.10560

Data Release 资料发布

alpaca_data.json contains 52K instruction-following data we used for fine-tuning the Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields:

alpaca_data.json 包含我们用于微调羊驼模型的 52K 指令跟随数据。这个JSON 文件是一个字典列表，每个字典包含以下字段：

instruction: str, describes the task the model should perform. Each of the 52K instructions is unique.
instruction : str ，描述了模型应该执行的任务。 52K 条指令中的每一条都是唯一的。
input: str, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
input : str ，任务的可选上下文或输入。例如，当指令是“总结以下文章”时，输入就是文章。大约 40% 的示例有输入。
output: str, the answer to the instruction as generated by text-davinci-003.
output : str ，由 text-davinci-003 生成的指令的答案。

We used the following prompts for fine-tuning the Alpaca model:

我们使用以下提示来微调羊驼模型：

for examples with a non-empty input field:
对于具有非空输入字段的示例：

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

for examples with an empty input field:
对于输入字段为空的示例：

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:

Data Generation Process 数据生成过程

Running the code 运行代码

We built on the data generation pipeline from self-instruct and made the following modifications:

我们建立在自我指导的数据生成管道之上，并进行了以下修改：

We used text-davinci-003 to generate the instruction data instead of davinci.
我们使用 text-davinci-003 而不是 davinci 来生成指令数据。
We wrote a new prompt (prompt.txt) that explicitly gave the requirement ofinstruction generation to text-davinci-003.
我们编写了一个新的提示符（ prompt.txt ），明确将生成指令的要求交给了 text-davinci-003 。
We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
我们采用了更积极的批量解码，即一次生成 20 条指令，这显着降低了数据生成的成本。
We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
我们通过丢弃分类和非分类指令之间的差异来简化数据生成管道。
We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].
我们只为每条指令生成一个实例，而不是 [1] 中的 2 到 3 个实例。

This produced an instruction-following dataset with 52K examples obtained ata much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by self-instruct. We plot the below figure (in the style of Figure 2 in the self-instruct paper to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.

这产生了一个指令跟随数据集，其中包含 52K 个示例，并且成本要低得多（不到 500 美元）。在初步研究中，我们还发现我们的 52K 生成数据比自我指导发布的数据更加多样化。我们绘制了下图（采用自我指导论文中图 2 的样式，以展示我们数据的多样性。图的内圈代表指令的词根动词，外圈代表直接宾语。

Fine-tuning 微调

We fine-tune our model using standard huggingface training code with the following hyperparameters:

我们使用具有以下超参数的标准 huggingface 训练代码微调我们的模型：

Hyperparameter 超参数ValueBatch size 批量大小128Learning rate 学习率2e-5Epochs3Max length 最长长度512Weight decay 重量衰减1

We are waiting for huggingface to officially support the llama models (i.e. this PR to be merged) before we release a stable version of the finetuning code.

在我们发布稳定版本的微调代码之前，我们正在等待 huggingface 正式支持 llama 模型（即此 PR 将被合并）。

Authors

All grad students below contributed equally and the order is determined by random draw.

以下所有研究生贡献均等，顺序由随机抽签决定。

Rohan Taori Rohan Taori
Ishaan Gulrajani
Tianyi Zhang
Yann Dubois
Xuechen Li Xuechen Li

All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.

所有建议均由 Tatsunori B. Hashimoto 提供。 Percy Liang 也为 Yann 提供建议，而 Carlos Guestrin 也为 Xuechen 提供建议。

Citation 引用

Please cite the repo if you use the data or code in this repo.

如果您使用此 repo 中的数据或代码，请引用 repo。

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].

当然，您还应该引用原始的 LLaMA 论文 [1] 和 Self-Instruct 论文 [2]。

答观众问（工具分享）

有朋友问为什么我可以拿到第一手的AI新闻，我猜可能我们的习惯不同，我个人喜欢开很多个浏览器Tab，看看都有什么更新，谷歌原生浏览器确实比较消耗资源，你可以试试看其他的一些浏览器，比如sigmaOS、Sidekick等，他们消耗资源少，tab布局更加紧凑，并且基于Chrome可以装Chrome插件。

以下是使用Sidekick邀请链接

https://join.meetsidekick.com/hgike

另外，最近有朋友私信我怎么登陆不上ChatGPT或者怎么安装ChatGPT的，统一回复：

可以到我的Udemy课程上看如何使用ChatGPT：（免费的ChatGPT101课程）

https://www.udemy.com/user/da-li-26

CC BY-NC-ND 2.0 授权

喜欢我的作品吗？别忘了给予支持与赞赏，让我知道在创作的路上有你陪伴，一起延续这份热忱！

Assisi分享微信朋友圈、微博等墙内媒体不能分享的内容。历史/哲学/心理学/和其他。

来自作者
相关推荐