1 年之前 · 1e69c8e0b5
--- a/ch05/01_main-chapter-code/ch05.ipynb
+++ b/ch05/01_main-chapter-code/ch05.ipynb
@@ -260,7 +260,7 @@
 
				     "id": "0f3d7ea2-637f-4490-bc76-e361fc81ae98"
			
 
				    },
			
 
				    "source": [
			
 
				-    "### 5.1.2 Calculating the text generation loss: cross entropy, and perplexity"
			
 
				+    "### 5.1.2 Calculating the text generation loss: cross-entropy and perplexity"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -558,7 +558,7 @@
 
				    "metadata": {},
			
 
				    "source": [
			
 
				     "- In deep learning, instead of maximizing the average log-probability, it's a standard convention to minimize the *negative* average log-probability value; in our case, instead of maximizing -10.7722 so that it approaches 0, in deep learning, we would minimize 10.7722 so that it approaches 0\n",
			
 
				-    "- The value negative of -10.7722, i.e., 10.7722, is also called cross entropy loss in deep learning"
			
 
				+    "- The value negative of -10.7722, i.e., 10.7722, is also called cross-entropy loss in deep learning"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -601,7 +601,7 @@
 
				    "id": "e8aaf9dd-3ee6-42bf-a63f-6e93dbfb989d",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "- Before we apply the cross entropy function, let's check the shape of the logits and targets"
			
 
				+    "- Before we apply the `cross_entropy` function, let's check the shape of the logits and targets"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -638,7 +638,7 @@
 
				    "id": "1d3d65f0-6566-4865-93e4-0c0bcb10cd06",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "- For the cross `entropy_loss` function in PyTorch, we want to flatten these tensors by combining them over the batch dimension:"
			
 
				+    "- For the `cross_entropy` function in PyTorch, we want to flatten these tensors by combining them over the batch dimension:"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -709,8 +709,8 @@
 
				    "id": "0f15ce17-fd7b-4d8e-99da-b237523a7a80",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "- A concept related to the cross entropy loss is the perplexity of an LLM\n",
			
 
				-    "- The perplexity is simply the exponential of the cross entropy loss"
			
 
				+    "- A concept related to the cross-entropy loss is the perplexity of an LLM\n",
			
 
				+    "- The perplexity is simply the exponential of the cross-entropy loss"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -1077,7 +1077,7 @@
 
				    "id": "5c3085e8-665e-48eb-bb41-cdde61537e06",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "- Next, we implement a utility function to calculate the cross entropy loss of a given batch\n",
			
 
				+    "- Next, we implement a utility function to calculate the cross-entropy loss of a given batch\n",
			
 
				     "- In addition, we implement a second utility function to compute the loss for a user-specified number of batches in a data loader"
			
 
				    ]
			
 
				   },
			
--- a/ch06/01_main-chapter-code/ch06.ipynb
+++ b/ch06/01_main-chapter-code/ch06.ipynb
@@ -1715,7 +1715,7 @@
 
				    "source": [
			
 
				     "- Before we can start finetuning (/training), we first have to define the loss function we want to optimize during training\n",
			
 
				     "- The goal is to maximize the spam classification accuracy of the model; however, classification accuracy is not a differentiable function\n",
			
 
				-    "- Hence, instead, we minimize the cross entropy loss as a proxy for maximizing the classification accuracy (you can learn more about this topic in lecture 8 of my freely available [Introduction to Deep Learning](https://sebastianraschka.com/blog/2021/dl-course.html#l08-multinomial-logistic-regression--softmax-regression) class)\n",
			
 
				+    "- Hence, instead, we minimize the cross-entropy loss as a proxy for maximizing the classification accuracy (you can learn more about this topic in lecture 8 of my freely available [Introduction to Deep Learning](https://sebastianraschka.com/blog/2021/dl-course.html#l08-multinomial-logistic-regression--softmax-regression) class)\n",
			
 
				     "\n",
			
 
				     "- The `calc_loss_batch` function is the same here as in chapter 5, except that we are only interested in optimizing the last token `model(input_batch)[:, -1, :]` instead of all tokens `model(input_batch)`"
			
 
				    ]
			
@@ -2370,7 +2370,7 @@
 
				    "name": "python",
			
 
				    "nbconvert_exporter": "python",
			
 
				    "pygments_lexer": "ipython3",
			
 
				-   "version": "3.11.4"
			
 
				+   "version": "3.10.11"
			
 
				   }
			
 
				  },
			
 
				  "nbformat": 4,
			
--- a/ch06/02_bonus_additional-experiments/additional-experiments.py
+++ b/ch06/02_bonus_additional-experiments/additional-experiments.py
@@ -418,7 +418,7 @@ if __name__ == "__main__":
 
				         type=int,
			
 
				         default=-100,
			
 
				         help=(
			
 
				-            "Sets the `ignore_index` in the cross entropy loss."
			
 
				+            "Sets the `ignore_index` in the cross-entropy loss."
			
 
				         )
			
 
				     )
			
 
				 
			
--- a/ch07/01_main-chapter-code/ch07.ipynb
+++ b/ch07/01_main-chapter-code/ch07.ipynb
@@ -1020,7 +1020,7 @@
 
				     "id": "cef09d21-b652-4760-abea-4f76920e6a25"
			
 
				    },
			
 
				    "source": [
			
 
				-    "- As we can see, the resulting loss on these 3 training examples is the same as the loss we calculated from the 2 training examples, which means that the cross entropy loss function ignored the training example with the -100 label\n",
			
 
				+    "- As we can see, the resulting loss on these 3 training examples is the same as the loss we calculated from the 2 training examples, which means that the cross-entropy loss function ignored the training example with the -100 label\n",
			
 
				     "- By default, PyTorch has the `cross_entropy(..., ignore_index=-100)` setting to ignore examples corresponding to the label -100\n",
			
 
				     "- Using this -100 `ignore_index`, we can ignore the additional end-of-text (padding) tokens in the batches that we used to pad the training examples to equal length\n",
			
 
				     "- However, we don't want to ignore the first instance of the end-of-text (padding) token (50256) because it can help signal to the LLM when the response is complete"
			
@@ -2051,7 +2051,7 @@
 
				     "  - automated conversational benchmarks, where another LLM like GPT-4 is used to evaluate the responses, such as AlpacaEval ([https://tatsu-lab.github.io/alpaca_eval/](https://tatsu-lab.github.io/alpaca_eval/))\n",
			
 
				     "\n",
			
 
				     "- In the next section, we will use an approach similar to AlpaceEval and use another LLM to evaluate the responses of our model; however, we will use our own test set instead of using a publicly available benchmark dataset\n",
			
 
				-    "- For this, we add the model response to the `test_set` dictionary and save it as a `\"instruction-data-with-response.json\"` file for record-keeping so that we can load and analyze it in separate Python sessions if needed"
			
 
				+    "- For this, we add the model response to the `test_data` dictionary and save it as a `\"instruction-data-with-response.json\"` file for record-keeping so that we can load and analyze it in separate Python sessions if needed"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -2105,7 +2105,7 @@
 
				     "id": "228d6fa7-d162-44c3-bef1-4013c027b155"
			
 
				    },
			
 
				    "source": [
			
 
				-    "- Let's double-check one of the entries to see whether the responses have been added to the `test_set` dictionary correctly"
			
 
				+    "- Let's double-check one of the entries to see whether the responses have been added to the `test_data` dictionary correctly"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -2727,7 +2727,7 @@
 
				    "name": "python",
			
 
				    "nbconvert_exporter": "python",
			
 
				    "pygments_lexer": "ipython3",
			
 
				-   "version": "3.11.4"
			
 
				+   "version": "3.10.11"
			
 
				   }
			
 
				  },
			
 
				  "nbformat": 4,