浏览代码

Generating a synthetic dataset for instruction finetuning (#245)

* Generating a synthetic dataset for instruction finetuning

* fix link
Sebastian Raschka 1 年之前
父节点
当前提交
dbcdc7593b

+ 1 - 0
README.md

@@ -106,6 +106,7 @@ Several folders contain optional materials as a bonus for interested readers:
 - **Chapter 7:**
   - [Dataset Utilities for Finding Near Duplicates and Creating Passive Voice Entries](ch07/02_dataset-utilities)
   - [Evaluating Instruction Responses Using the OpenAI API and Ollama](ch07/03_model-evaluation)
+  - [Generating a Dataset for Instruction Finetuning](ch07/05_dataset-generation)
 
 <br>
 &nbsp

+ 6 - 0
ch07/05_dataset-generation/README.md

@@ -0,0 +1,6 @@
+# Generating a Dataset for Instruction Finetuning
+
+This folder contains utility code that can be used for generating a dataset for instruction finetuning.
+
+- [llama3-ollama.ipynb](llama3-ollama.ipynb): A notebook that creates an synthetic instruction finetuning dataset using Llama 3 and Ollama
+

文件差异内容过多而无法显示
+ 3 - 0
ch07/05_dataset-generation/instruction-data-llama3-7b.json


文件差异内容过多而无法显示
+ 457 - 0
ch07/05_dataset-generation/llama3-ollama.ipynb


+ 4 - 0
ch07/README.md

@@ -9,3 +9,7 @@
 - [02_dataset-utilities](02_dataset-utilities) contains utility code that can be used for preparing an instruction dataset.
 
 - [03_model-evaluation](03_model-evaluation) contains utility code for evaluating instruction responses using a local Llama 3 model and the GPT-4 API.
+
+- [04_preference-tuning-with-dpo](04_preference-tuning-with-dpo) implements code for preference finetuning with DPO (in progress)
+
+- [05_dataset-generation](05_dataset-generation) contains code to generate synthetic datasets for instruction finetuning

部分文件因为文件数量过多而无法显示