深度学习推荐阅读的论文

最新推荐文章于 2022-04-01 09:00:00 发布

kexinxin1

最新推荐文章于 2022-04-01 09:00:00 发布

阅读量1.6k

点赞数 2

本文链接：https://blog.csdn.net/kexinxin1/article/details/93013863

版权

本文精选了深度学习领域的关键论文，涵盖从基础概念到高级应用，以及硬件加速器设计的重要进展。包括LeCun等人的深度学习综述、LeNet、AlexNet等经典模型，以及Google TPU、微软Adam项目等高性能硬件解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Papers to Read

General Introduction

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.nature14539.pdf

[This is a general introduction by three towering figures of the field]

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.LeNet 00726791.pdf

[This is the original LeNet by Yann Le Cun]
Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on (pp. 6645-6649). IEEE.Speech RNN Hinton 06638947.pdf

[This work boosted MicroSoft's Speech Technology]

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems(pp. 3104-3112).sequence-to-sequence-learning-with-neural-networks NIPS 2014 .pdf

[This leads to Google's better speech understanding, Gmail answers, ..]

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).ReLU icml2010_NairH10.pdf

[ReLU is better than Sigmoid dealing with the vanishing gradient problem]

Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 15(1), 1929-1958.Dropout srivastava14a.pdf

[Dropout (Brain-Damage) gives robust net work]

Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).Batch Normalization icml2015_ioffe15.pdf

[This technique makes training faster]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).Generative Adversarial Nets.pdf

[Many said GAN is the most important paper over the past few years]

ImageNet Challenge Winners

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).AlexNet-imagenet-classification-with-deep-convolutional-neural-networks.pdf [AlexNet]
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. VGGNet 1409.1556.pdf [VGGNet]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015, June). Going deeper with convolutions. CVPR 2015 GoogLeNet Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf [GoogLeNet]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).ResNet He_Deep_Residual_Learning_CVPR_2016_paper.pdf [ResNet]

Diannoa Family

Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., & Temam, O. (2014, February). Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices (Vol. 49, No. 4, pp. 269-284). ACM.DianNao p269-chen.pdf

Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., ... & Temam, O. (2014, December). Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 609-622). IEEE Computer Society.DadianNao p609-chen.pdf

Liu, D., Chen, T., Liu, S., Zhou, J., Zhou, S., Teman, O., ... & Chen, Y. (2015, March). Pudiannao: A polyvalent machine learning accelerator. In ACM SIGARCH Computer Architecture News (Vol. 43, No. 1, pp. 369-381). ACM.

Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., ... & Temam, O. (2015, June). ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News (Vol. 43, No. 3, pp. 92-104). ACM.ShiDiannaop92-du.pdf

Chen, Y., Chen, T., Xu, Z., Sun, N., & Temam, O. (2016). DianNao family: energy-efficient hardware accelerators for machine learning. Communications of the ACM, 59(11), 105-112.DianNao Family p105-che.pdf

Liu, S., Du, Z., Tao, J., Han, D., Luo, T., Xie, Y., ... & Chen, T. (2016, June). Cambricon: An instruction set architecture for neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture(pp. 393-405). IEEE Press.Cambricon 07551409.pdf

Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., ... & Chen, Y. (2016, October). Cambricon-X: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.Cabricon X 07783723.pdf

Lu, W., Yan, G., Li, J., Gong, S., Han, Y., & Li, X. (2017, February). FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 553-564). IEEE.Flexflow HPCA2017 07920855.pdf

Vivienne Sze (MIT)

Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. (2017). Efficient processing of deep neural networks: A tutorial and survey. arXiv preprint arXiv:1703.09039.MIT Sze Survey 1703.09039.pdf

Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2017). Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127-138.Eyeriss 07738524.pdf

Chen, Y. H., Emer, J., & Sze, V. (2017). Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators. IEEE Micro, 37(3), 12-21.MIT DataFlow IEEE Micro 07948671.pdf

Han Song and Bill Dally (Stanford)

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.SqueezNet 1602.07360.pdf

Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Deep Compression 1510.00149.pdf

Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135-1143).Han Song 5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf

Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016, June). EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture (pp. 243-254). IEEE Press.EIE p243-han.pdf

Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., ... & Yang, H. (2017, February). ESE: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 75-84). ACM.ESE Sparse LSTM FPGA.pdf

More Compression Approaches

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 27-40. DOI: https://doi.org/10.1145/3079856.3080254SCNN ISCA 2017 p27-Parashar.pdf

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N. E., & Moshovos, A. (2016, June). Cnvlutin: ineffectual-neuron-free deep neural network computing. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on (pp. 1-13). IEEE.Cnvlutin 07551378.pdf

Judd, P., Delmas, A., Sharify, S., & Moshovos, A. (2017). Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing. arXiv preprint arXiv:1705.00125.Cnvlutin2 1705.00125.pdf

Google TPU

Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., ... & Boyle, R. (2017). In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760.TPU ISCA 2017 1704.04760.pdf

Microsoft

Chilimbi, T. M., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014, October). Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI (Vol. 14, pp. 571-582).Adam osdi14-paper-chilimbi.pdf

Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., & Chung, E. S. (2015). Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2(11).15 DCNN hardware.pdf

Putnam, A., Caulfield, A. M., Chung, E. S., Chiou, D., Constantinides, K., Demme, J., ... & Haselman, M. (2014, June). A reconfigurable fabric for accelerating large-scale datacenter services. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on (pp. 13-24). IEEE.microsoft catapult 2014 06853195.pdf

FPGA

Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 161-170). ACM.UCLA Cong p161-zhang.pdf

Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., ... & Esmaeilzadeh, H. (2016, October). From high-level deep neural models to FPGAs. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.HL DNN to FPGA 2016 Micro 07783720.pdf

Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., ... & Cao, Y. (2016, February). Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 16-25). ACM.OpenCL FPGA p16-suda.pdf

Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., & Zhou, X. (2017). DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(3), 513-517.DLAU Yuan Xie 07505926.pdf

Peemen, M., Setio, A. A., Mesman, B., & Corporaal, H. (2013, October). Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on (pp. 13-19). IEEE.Memory Centric CNN 06657019.pdf

Various Acceleration Approaches

Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 1737-1746).Num Precision gupta15.pdf

Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830.Binary 1602.02830.pdf

Vanhoucke, V., Senior, A., & Mao, M. Z. (2011, December). Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop (Vol. 1, p. 4).CPU Improvement Google VanhouckeNIPS11.pdf

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 13-26. DOI: https://doi.org/10.1145/3079856.3080244ScaleDeep ISCA 2017 p13-Venkataramani.pdf

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 548-560. DOI: https://doi.org/10.1145/3079856.3080215Scalpel ISCA 2017 p548-Yu.pdf

Judd, P., Albericio, J., Hetherington, T., Aamodt, T. M., & Moshovos, A. (2016, October). Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.Stripes bit serial Micro 2016 07783722.pdf
Kim, Y. D., Park, E., Yoo, S., Choi, T., Yang, L., & Shin, D. (2015). Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530.Yoo Compressed CNN 1511.06530.pdf

Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., & Li, X. (2016, June). C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE (pp. 1-6). IEEE.C-Brain.pdf

Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., ... & Mudge, T. (2017, February). 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International (pp. 250-251). IEEE.Michigan 2017 ISSCC.pdf

Song, L., Qian, X., Li, H., & Chen, Y. (2017, February). PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 541-552). IEEE.PipeLayer.pdf

Aydonat, U., O'Connell, S., Capalija, D., Ling, A. C., & Chiu, G. R. (2017). An OpenCL (TM) Deep Learning Accelerator on Arria 10. arXiv preprint arXiv:1701.03534.OpenCL Toronto.pdf

Venkatesh, G., Nurvitadhi, E., & Marr, D. (2017, March). Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on(pp. 2861-2865). IEEE.Sparsity Intel.pdf

You, Y. (2016). Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA (A thesis) .sparsity Analysis.pdf

Wei, X., Yu, C. H., Zhang, P., Chen, Y., Wang, Y., Hu, H., ... & Cong, J. (2017, June). Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE (pp. 1-6). IEEE.Systolic Array Synthesis Cong.pdf

Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., & Yang, G. (2017, May). swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International (pp. 615-624). IEEE.swDNN Taihu Light.pdf

Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017, January). FPGA-based accelerator for long short-term memory recurrent neural networks. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific (pp. 629-634). IEEE.LSTM FPGA Cong.pdf

Edstrom, J., Gong, Y., Chen, D., Wang, J., & Gong, N. (2017). Data-Driven Intelligent Efficient Synaptic Storage for Deep Learning. IEEE Transactions on Circuits and Systems II: Express Briefs.Eff Synap Storage.pdf

Du, L., Du, Y., Li, Y., Su, J., Kuan, Y. C., Liu, C. C., & Chang, M. C. F. (2017). A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers.Reconfig DCNN Frank Chang.pdf
Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., & Li, X. (2016, June). C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE (pp. 1-6). IEEE.C-Brain.pdf

Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., ... & Mudge, T. (2017, February). 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International (pp. 250-251). IEEE.Michigan 2017 ISSCC.pdf

Song, L., Qian, X., Li, H., & Chen, Y. (2017, February). PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 541-552). IEEE.PipeLayer.pdf

Aydonat, U., O'Connell, S., Capalija, D., Ling, A. C., & Chiu, G. R. (2017). An OpenCL (TM) Deep Learning Accelerator on Arria 10. arXiv preprint arXiv:1701.03534.OpenCL Toronto.pdf

Venkatesh, G., Nurvitadhi, E., & Marr, D. (2017, March). Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on(pp. 2861-2865). IEEE.Sparsity Intel.pdf

You, Y. (2016). Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA (A thesis) .sparsity Analysis.pdf

Wei, X., Yu, C. H., Zhang, P., Chen, Y., Wang, Y., Hu, H., ... & Cong, J. (2017, June). Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE (pp. 1-6). IEEE.Systolic Array Synthesis Cong.pdf

Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., & Yang, G. (2017, May). swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International (pp. 615-624). IEEE.swDNN Taihu Light.pdf

Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017, January). FPGA-based accelerator for long short-term memory recurrent neural networks. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific (pp. 629-634). IEEE.LSTM FPGA Cong.pdf

Edstrom, J., Gong, Y., Chen, D., Wang, J., & Gong, N. (2017). Data-Driven Intelligent Efficient Synaptic Storage for Deep Learning. IEEE Transactions on Circuits and Systems II: Express Briefs.Eff Synap Storage.pdf

Du, L., Du, Y., Li, Y., Su, J., Kuan, Y. C., Liu, C. C., & Chang, M. C. F. (2017). A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers.Reconfig DCNN Frank Chang.pdf

Additional References

****************************************************************************************************

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.Schmidhuber Overview NN1-s2.0-S0893608014002135-main.pdf
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems (pp. 396-404).handwritten-digit-recognition-with-a-back-propagation-network.pdf

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.RNN Comparison 1412.3555.pdf

Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.word2vec explained 1402.3722.pdf

Lin, H. W., Tegmark, M., & Rolnick, D. (2016). Why does deep and cheap learning work so well?. Journal of Statistical Physics, 1-25.Why DL Work10.1007\s10955-017-1836-5.pdf

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec), 3371-3408.vincent10a.pdf

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828.Learning Representation 06472238.pdf

Ota, K., Dao, M. S., Mezaris, V., & De Natale, F. G. (2017). Deep Learning for Mobile Multimedia: A Survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(3s), 34.DL mobile survey.pdf

Video

Tutorials by Bill Dally and Song Han (nVidia/Stanford)
- Tutorial: High-Performance Hardware for Machine Learning https://youtu.be/J-GOkwiwg4c
- Efficient Methods and Hardware for Deep Learning https://youtu.be/eZdOkDtYMoo

Tutorials by Vivienne Sze of MIT
- Efficient Processing for Deep Learning: Challenges and Opportunitieshttps://www.youtube.com/watch?v=kYEUkpHpOKA&t=61s
Tutorials by Professor Sungjoo Yoo of SNU

4-1 Deep Learning Algorithms, Optimization Methods, and Hardware Accelerators (Prof. Sungjoo Yoo) https://youtu.be/ebqVpK4c3cw

4-2 Example of Object Detection Result (Prof. Sungjoo Yoo)https://youtu.be/MEgwTaUdmqw

4-3 Convoiution with Matrix Multiplication (Prof. Sungjoo Yoo)https://youtu.be/2ExjsudgDU4

4-4 High Performance Accelerator Architecture (Prof. Sungjoo Yoo)https://youtu.be/hAZ2t0a7rdU