This blog is based on datawhale files and a nice survey.
Following pre-training, LLMs can develop general capabilities for addressing various tasks. However, an increasing body of research indicates that the abilities of LLMs can be further tailored to specific objectives. This blog will present two primary methods for adapting pre-trained LLMs: instruction tuning and alignment tuning. The former primarily seeks to enhance or unlock the capabilities of LLMs, while the latter aims to align the behaviors of LLMs with human values or preferences. Additionally, this blog will also explore efficient tuning and quantization for model adaptation in resource-constrained environments. And this topic contains so much knowledge that for further study, can see the survey.
Essentially, instruction tuning involves fine-tuning pre-trained LLMs using a set of formatted instances in natural language, which is closely related to supervised fine-tuning and multi-task prompted training. To carry out instruction tuning, the first step is to gather or create instances formatted as instructions. These formatted instances are then used to fine-tune LLMs in a supervised learning manner, such as training with sequence-to-sequence loss. Following instruction tuning, LLMs can exhibit enhanced abilities to generalize to unseen tasks, even in multilingual settings.
And Instruction Tuning contains:
Formatted Instance Construction
Instruction Tuning Strategies
The Effect of Instruction Tuning
Empirical Analysis for Instruction Tuning
This section initially provides an overview of alignment, including its definition and criteria, then delves into the acquisition of human feedback data for aligning LLMs, and ultimately explores the pivotal technique of reinforcement learning from human feedback (RLHF) for alignment tuning.
Alignment Tuning contains:
Alignment Criteria
Collecting Human Feedback
Reinforcement Learning from Human Feedback
In prior research, there has been significant focus on parameter-efficient fine-tuning, which seeks to minimize the number of trainable parameters while maintaining optimal performance.
Because of the substantial quantity of model parameters, LLMs require a significant memory footprint for inference, rendering deployment in real-world applications costly.
In the end, I collect some surveys about this topic, readers interested in this field can further read:
Domain | Title | Paper URL | Project URL | Release Month |
---|---|---|---|---|
Instruction Tuning | Are Prompts All the Story? No. A Comprehensive and Broader View of Instruction Learning | https://arxiv.org/pdf/2303.10475.pdf | https://github.com/RenzeLou/awesome-instruction-learning | 2023.03 |
Instruction Tuning | Instruction Tuning for Large Language Models: A Survey | https://arxiv.org/pdf/2308.10792.pdf | None | 2023.08 |
Instruction Tuning | Vision-Language Instruction Tuning: A Review and Analysis | https://arxiv.org/pdf/2311.08172.pdf | https://github.com/palchenli/VL-Instruction-Tuning | 2023.11 |
Human Alignment for LLM | Aligning Large Language Models with Human: A Survey | https://arxiv.org/pdf/2307.12966.pdf | https://github.com/GaryYufei/AlignLLMHumanSurvey | 2023.07 |
Human Alignment for LLM | From Instructions to Intrinsic Human Values – A Survey of Alignment Goals for Big Model | https://arxiv.org/pdf/2308.12014.pdf | https://github.com/ValueCompass/Alignment-Goal-Survey | 2023.08 |
Human Alignment for LLM | Large Language Model Alignment: A Survey | https://arxiv.org/pdf/2309.15025.pdf | None | 2023.09 |
Human Alignment for LLM | AI Alignment: A Comprehensive Survey | https://arxiv.org/pdf/2310.19852 | https://www.alignmentsurvey.com/ | 2023.10 |
Efficient LLMs | The Efficiency Spectrum of Large Language Models: An Algorithmic Survey | https://arxiv.org/pdf/2310.10844.pdf | https://github.com/tding1/Efficient-LLM-Survey | 2023.12 |
Efficient LLMs | Efficient Large Language Models: A Survey | https://arxiv.org/pdf/2312.03863 | https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey | 2023.12 |
END