Definition
In machine learning, a parameter of a model is typically derived THROUGH training,
while a hyperparameter is a parameter that decides and controls training, by specifying the type, the complexity of models, or deciding the decision threshold (such as the one I mentioned about when generative the ROC curve), or regularization (such as Lasso, Ridge), etc. The hyperparameter is predetermined BEFORE training starts.
(参数通过训练中得到,超参数predetermine)
Example
One example is the polynomial. The maximum degree of a polynomial model is a hyperparameter. It decided the complexity of the model (1 being linear, 2 being quadratic, 3 being cubic, etc). You need to specify it before training and fix it during training. The coefficient and bias, however, is regular parameters that need to be derived through training.
The hyperparameter is typically evaluated AFTER a few rounds of training, on the validation dataset. A validation set is obtained by split the data 3:1:1 prior to training, purely by random. 3 for training, 1 for validation (to adjust models and hyperparameters), and 1 for test (to perform the final performance evaluation of the models).
Lastly, in the cross-entropy loss section, there is a \theta in the function h(xi). That \theta is a normal parameter. Since it will be updated by minimizing the value of the loss function. The loss function is typically NOT used as a performance measure (although you indeed can report the loss value using inputs and outputs from the test set as a part of the model evaluation). Instead, It is a part of the machine learning model that is crucial for training. The normal parameter (i.e. not hyperparameter) is an integral part of the loss function that will be updated to minimize the loss function.