Cheap and Secure Web Hosting Provider : See Now

Best practices for normalizing up training, validation, and test sets

, , No Comments
Problem Detail: 

I was reading up on how to normalize my training, validation, and test sets for a neural network, when I read this snippet:

An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).


Does this mean the following?

  1. Split my training set T into training set T1 & validation set V1
  2. Find the mean/var of T1, mean_T1, var_T1
  3. Normalize T1, V1, and my testing set with mean_T1, var_T1.
  4. Train & test accordingly...


Asked By : sir_thursday
Answered By : D.W.

Yes, that's what it means. Basically, mean_T1 and var_T1 become part of the model that you're learning. So, same as you'd apply machine learning to the training set to learn a model based on the training set, you'll compute the mean and variance based on the training set.

Best Answer from StackOverflow

Question Source :

3200 people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback