Training Identification Nets¶
This example will present how to train nets with identification loss on popular datasets.
The objective of training an identification net is to learn good feature representation for persons. If the features of the same person are similar, while the features of different people are dissimilar, then querying a target person from a gallery database would become easy.
Different loss functions could be adopted for this purpose, for example,
- Softmax cross entropy loss [zheng2016person] [xiao2016learning]
- Triplet loss [hermans2017in]
- Online instance matching (OIM) loss [xiaoli2017joint]
Head First Example¶
After cloning the repository, we can start with training an Inception net on VIPeR with Softmax loss from scratch
python examples/softmax_loss.py -d viper -b 64 -j 2 -a inception --logs-dir logs/softmax-loss/viper-inception
This script automatically downloads the VIPeR dataset and starts training, with
batch size of 64 and two processes for data loading. Softmax cross entropy is
used as the loss function. The training log should be print to screen as well as
saved to logs/softmax-loss/viper-inception/log.txt
. When training ends, it
will evaluate the best model (the one with best validation performance) on the
test set, and report several commonly used metrics.
Training Options¶
Many training options are available through command line arguments. See all the
options by python examples/softmax_loss.py -h
. Here we elaborate on several
commonly used options.
Datasets¶
Specify the dataset by -d name
, where name
can be one of cuhk03
,
cuhk01
, market1501
, dukemtmc
, and viper
currently. For some
datasets that cannot be downloaded automatically, running the script will raise
an error with a link to the dataset. One may need to manually download it and
put it to the directory instructed by the error message.
Model Architectures¶
Specify the model architecture by -a name
, where name
can be one of
resnet18
, resnet34
, resnet50
, resnet101
, resnet152
, and
inception
currently. For resnet*
, running the scripts will download an
ImageNet pretrained model automatically, and then finetune from it. For
inception
, the scripts just train the net from scratch.
Multi-GPU and Batch Size¶
All the examples support data parallel training on multiple GPUs. By default,
the program will use all the GPUs listed in nvidia-smi
. To control which
GPUs to be used, one need to specify the environment variable
CUDA_VISIBLE_DEVICES
before running the python script. For example,
# 4 GPUs, with effective batch size of 256
CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/softmax_loss.py -d viper -b 256 --lr 0.1 ...
# 1 GPU, reduce the batch size to 64, lr to 0.025
CUDA_VISIBLE_DEVICES=0 python examples/softmax_loss.py -d viper -b 64 --lr 0.025 ...
Note that the effective batch size specified by the -b
option will be
divided by the number of GPUs automatically for each GPU. For example, 4 GPUs
with -b 256
will have 64 minibatch samples on each one.
In the second command above, we reduce the batch size and initial learning rate to 1/4, in order to adapt the original 4 GPUs setting to only 1 GPU.
Resume from Checkpoints¶
After each training epoch, the script would save a latest checkpoint.pth.tar
in the specified logs directory, and update a model_best.pth.tar
if the
model achieves the best validation performance so far. To resume from this
checkpoint, just run the script with --resume /path/to/checkpoint.pth.tar
.
Evaluate a Trained Model¶
To evaluate a trained model, just run the script with --resume
/path/to/model_best.pth.tar --evaluate
. Different evaluation metrics,
especially different versions of CMC could lead to drastically different
numbers.
Tips and Tricks¶
Training a baseline network can be tricky. Many options and parameters could (significantly) affect the reported performance number. Here we list some tips and tricks for experiments.
- Combine train and val
- One can first use separate training and validation set to tune the
hyperparameters, then fix the hyperparameters and combine both sets together
to train a final model. This can be done by appending an option
--combine-trainval
, and could lead to much better performance on the test set. - Input size
- Larger input image size could benefit the performance. It depends on the
network architecture. You may specify it by
--height
and--width
. By default, we use256x128
forresnet*
and144x56
forinception
. - Multi-scale multi-crop test
- Using multi-scale multi-crop for test normally guarantees performance gain. However, it sacrifices the running speed significantly. We have not implemented this yet.
- Classifier initialization for softmax cross entropy loss
- We found that initializing the softmax classifier weight with normal
distribution
std=0.001
generally leads to better performance. It is also important to use larger learning rate for the classifier if underlying CNN is already pretrained.
References¶
[zheng2016person] |
|
[xiao2016learning] |
|
[xiaoli2017joint] |
|
[hermans2017in] |
|