1. Use cuDNN 5.1 (cuDNN 6 is ~10% slower).
2. Reduce the `--net_resolution` (e.g. to 320x176) (lower accuracy).
0.2fps when resulation is 656x368,and 0.9fps when 320x176
3. For face, reduce the `--face_net_resolution`. The resolution 320x320 usually works pretty decently.
4. Use the `MPI_4_layers` model (lower accuracy and lower number of parts).
5. Change GPU rendering by CPU rendering to get approximately +0.5 FPS (`--render_pose 1`).