Papers to Read
General Introduction
-
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.nature14539.pdf
[This is a general introduction by three towering figures of the field]
-
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.LeNet 00726791.pdf
[This is the original LeNet by Yann Le Cun]
-
Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on (pp. 6645-6649). IEEE.Speech RNN Hinton 06638947.pdf
[This work boosted MicroSoft's Speech Technology]
-
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems(pp. 3104-3112).sequence-to-sequence-learning-with-neural-networks NIPS 2014 .pdf
[This leads to Google's better speech understanding, Gmail answers, ..]
-
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).ReLU icml2010_NairH10.pdf
[ReLU is better than Sigmoid dealing with the vanishing gradient problem]
-
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 15(1), 1929-1958.Dropout srivastava14a.pdf
[Dropout (Brain-Damage) gives robust net work]
-
Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (pp. 448-456).Batch Normalization icml2015_ioffe15.pdf
[This technique makes training faster]
-
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).Generative Adversarial Nets.pdf
[Many said GAN is the most important paper over the past few years]
ImageNet Challenge Winners
-
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).AlexNet-imagenet-classification-with-deep-convolutional-neural-networks.pdf [AlexNet]
-
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. VGGNet 1409.1556.pdf [VGGNet]
-
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015, June). Going deeper with convolutions. CVPR 2015 GoogLeNet Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf [GoogLeNet]
-
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).ResNet He_Deep_Residual_Learning_CVPR_2016_paper.pdf [ResNet]
Diannoa Family
-
Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., & Temam, O. (2014, February). Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices (Vol. 49, No. 4, pp. 269-284). ACM.DianNao p269-chen.pdf
-
Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., ... & Temam, O. (2014, December). Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 609-622). IEEE Computer Society.DadianNao p609-chen.pdf
-
Liu, D., Chen, T., Liu, S., Zhou, J., Zhou, S., Teman, O., ... & Chen, Y. (2015, March). Pudiannao: A polyvalent machine learning accelerator. In ACM SIGARCH Computer Architecture News (Vol. 43, No. 1, pp. 369-381). ACM.
-
Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., ... & Temam, O. (2015, June). ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News (Vol. 43, No. 3, pp. 92-104). ACM.ShiDiannaop92-du.pdf
-
Chen, Y., Chen, T., Xu, Z., Sun, N., & Temam, O. (2016). DianNao family: energy-efficient hardware accelerators for machine learning. Communications of the ACM, 59(11), 105-112.DianNao Family p105-che.pdf
-
Liu, S., Du, Z., Tao, J., Han, D., Luo, T., Xie, Y., ... & Chen, T. (2016, June). Cambricon: An instruction set architecture for neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture(pp. 393-405). IEEE Press.Cambricon 07551409.pdf
-
Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., ... & Chen, Y. (2016, October). Cambricon-X: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.Cabricon X 07783723.pdf
-
Lu, W., Yan, G., Li, J., Gong, S., Han, Y., & Li, X. (2017, February). FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 553-564). IEEE.Flexflow HPCA2017 07920855.pdf
Vivienne Sze (MIT)
-
Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. (2017). Efficient processing of deep neural networks: A tutorial and survey. arXiv preprint arXiv:1703.09039.MIT Sze Survey 1703.09039.pdf
-
Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2017). Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits, 52(1), 127-138.Eyeriss 07738524.pdf
-
Chen, Y. H., Emer, J., & Sze, V. (2017). Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators. IEEE Micro, 37(3), 12-21.MIT DataFlow IEEE Micro 07948671.pdf
Han Song and Bill Dally (Stanford)
-
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360.SqueezNet 1602.07360.pdf
-
Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Deep Compression 1510.00149.pdf
-
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135-1143).Han Song 5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf
-
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016, June). EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture (pp. 243-254). IEEE Press.EIE p243-han.pdf
-
Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., ... & Yang, H. (2017, February). ESE: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 75-84). ACM.ESE Sparse LSTM FPGA.pdf
More Compression Approaches
-
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 27-40. DOI: https://doi.org/10.1145/3079856.3080254SCNN ISCA 2017 p27-Parashar.pdf
-
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N. E., & Moshovos, A. (2016, June). Cnvlutin: ineffectual-neuron-free deep neural network computing. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on (pp. 1-13). IEEE.Cnvlutin 07551378.pdf
-
Judd, P., Delmas, A., Sharify, S., & Moshovos, A. (2017). Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing. arXiv preprint arXiv:1705.00125.Cnvlutin2 1705.00125.pdf
Google TPU
-
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., ... & Boyle, R. (2017). In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760.TPU ISCA 2017 1704.04760.pdf
Microsoft
-
Chilimbi, T. M., Suzue, Y., Apacible, J., & Kalyanaraman, K. (2014, October). Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI (Vol. 14, pp. 571-582).Adam osdi14-paper-chilimbi.pdf
-
Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., & Chung, E. S. (2015). Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2(11).15 DCNN hardware.pdf
-
Putnam, A., Caulfield, A. M., Chung, E. S., Chiou, D., Constantinides, K., Demme, J., ... & Haselman, M. (2014, June). A reconfigurable fabric for accelerating large-scale datacenter services. In Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on (pp. 13-24). IEEE.microsoft catapult 2014 06853195.pdf
FPGA
-
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015, February). Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 161-170). ACM.UCLA Cong p161-zhang.pdf
-
Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., ... & Esmaeilzadeh, H. (2016, October). From high-level deep neural models to FPGAs. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.HL DNN to FPGA 2016 Micro 07783720.pdf
-
Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., ... & Cao, Y. (2016, February). Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 16-25). ACM.OpenCL FPGA p16-suda.pdf
-
Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., & Zhou, X. (2017). DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(3), 513-517.DLAU Yuan Xie 07505926.pdf
-
Peemen, M., Setio, A. A., Mesman, B., & Corporaal, H. (2013, October). Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on (pp. 13-19). IEEE.Memory Centric CNN 06657019.pdf
Various Acceleration Approaches
-
Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15) (pp. 1737-1746).Num Precision gupta15.pdf
-
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830.Binary 1602.02830.pdf
-
Vanhoucke, V., Senior, A., & Mao, M. Z. (2011, December). Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop (Vol. 1, p. 4).CPU Improvement Google VanhouckeNIPS11.pdf
-
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 13-26. DOI: https://doi.org/10.1145/3079856.3080244ScaleDeep ISCA 2017 p13-Venkataramani.pdf
-
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 548-560. DOI: https://doi.org/10.1145/3079856.3080215Scalpel ISCA 2017 p548-Yu.pdf
-
Judd, P., Albericio, J., Hetherington, T., Aamodt, T. M., & Moshovos, A. (2016, October). Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (pp. 1-12). IEEE.Stripes bit serial Micro 2016 07783722.pdf
-
Kim, Y. D., Park, E., Yoo, S., Choi, T., Yang, L., & Shin, D. (2015). Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530.Yoo Compressed CNN 1511.06530.pdf
-
Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., & Li, X. (2016, June). C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE (pp. 1-6). IEEE.C-Brain.pdf
-
Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., ... & Mudge, T. (2017, February). 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International (pp. 250-251). IEEE.Michigan 2017 ISSCC.pdf
-
Song, L., Qian, X., Li, H., & Chen, Y. (2017, February). PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 541-552). IEEE.PipeLayer.pdf
-
Aydonat, U., O'Connell, S., Capalija, D., Ling, A. C., & Chiu, G. R. (2017). An OpenCL (TM) Deep Learning Accelerator on Arria 10. arXiv preprint arXiv:1701.03534.OpenCL Toronto.pdf
-
Venkatesh, G., Nurvitadhi, E., & Marr, D. (2017, March). Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on(pp. 2861-2865). IEEE.Sparsity Intel.pdf
-
You, Y. (2016). Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA (A thesis) .sparsity Analysis.pdf
-
Wei, X., Yu, C. H., Zhang, P., Chen, Y., Wang, Y., Hu, H., ... & Cong, J. (2017, June). Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE (pp. 1-6). IEEE.Systolic Array Synthesis Cong.pdf
-
Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., & Yang, G. (2017, May). swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International (pp. 615-624). IEEE.swDNN Taihu Light.pdf
-
Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017, January). FPGA-based accelerator for long short-term memory recurrent neural networks. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific (pp. 629-634). IEEE.LSTM FPGA Cong.pdf
-
Edstrom, J., Gong, Y., Chen, D., Wang, J., & Gong, N. (2017). Data-Driven Intelligent Efficient Synaptic Storage for Deep Learning. IEEE Transactions on Circuits and Systems II: Express Briefs.Eff Synap Storage.pdf
-
Du, L., Du, Y., Li, Y., Su, J., Kuan, Y. C., Liu, C. C., & Chang, M. C. F. (2017). A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers.Reconfig DCNN Frank Chang.pdf
-
Song, L., Wang, Y., Han, Y., Zhao, X., Liu, B., & Li, X. (2016, June). C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE (pp. 1-6). IEEE.C-Brain.pdf
-
Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., ... & Mudge, T. (2017, February). 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International (pp. 250-251). IEEE.Michigan 2017 ISSCC.pdf
-
Song, L., Qian, X., Li, H., & Chen, Y. (2017, February). PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on (pp. 541-552). IEEE.PipeLayer.pdf
-
Aydonat, U., O'Connell, S., Capalija, D., Ling, A. C., & Chiu, G. R. (2017). An OpenCL (TM) Deep Learning Accelerator on Arria 10. arXiv preprint arXiv:1701.03534.OpenCL Toronto.pdf
-
Venkatesh, G., Nurvitadhi, E., & Marr, D. (2017, March). Accelerating Deep Convolutional Networks using low-precision and sparsity. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on(pp. 2861-2865). IEEE.Sparsity Intel.pdf
-
You, Y. (2016). Sparsity Analysis of Deep Learning Models and Corresponding Accelerator Design on FPGA (A thesis) .sparsity Analysis.pdf
-
Wei, X., Yu, C. H., Zhang, P., Chen, Y., Wang, Y., Hu, H., ... & Cong, J. (2017, June). Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE (pp. 1-6). IEEE.Systolic Array Synthesis Cong.pdf
-
Fang, J., Fu, H., Zhao, W., Chen, B., Zheng, W., & Yang, G. (2017, May). swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International (pp. 615-624). IEEE.swDNN Taihu Light.pdf
-
Guan, Y., Yuan, Z., Sun, G., & Cong, J. (2017, January). FPGA-based accelerator for long short-term memory recurrent neural networks. In Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific (pp. 629-634). IEEE.LSTM FPGA Cong.pdf
-
Edstrom, J., Gong, Y., Chen, D., Wang, J., & Gong, N. (2017). Data-Driven Intelligent Efficient Synaptic Storage for Deep Learning. IEEE Transactions on Circuits and Systems II: Express Briefs.Eff Synap Storage.pdf
-
Du, L., Du, Y., Li, Y., Su, J., Kuan, Y. C., Liu, C. C., & Chang, M. C. F. (2017). A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE Transactions on Circuits and Systems I: Regular Papers.Reconfig DCNN Frank Chang.pdf
Additional References
****************************************************************************************************
-
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61, 85-117.Schmidhuber Overview NN1-s2.0-S0893608014002135-main.pdf
-
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems (pp. 396-404).handwritten-digit-recognition-with-a-back-propagation-network.pdf
-
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.RNN Comparison 1412.3555.pdf
-
Goldberg, Y., & Levy, O. (2014). word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722.word2vec explained 1402.3722.pdf
-
Lin, H. W., Tegmark, M., & Rolnick, D. (2016). Why does deep and cheap learning work so well?. Journal of Statistical Physics, 1-25.Why DL Work10.1007\s10955-017-1836-5.pdf
-
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec), 3371-3408.vincent10a.pdf
-
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1798-1828.Learning Representation 06472238.pdf
-
Ota, K., Dao, M. S., Mezaris, V., & De Natale, F. G. (2017). Deep Learning for Mobile Multimedia: A Survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13(3s), 34.DL mobile survey.pdf
Video
-
Tutorials by Bill Dally and Song Han (nVidia/Stanford)
-
Tutorial: High-Performance Hardware for Machine Learning https://youtu.be/J-GOkwiwg4c
-
Efficient Methods and Hardware for Deep Learning https://youtu.be/eZdOkDtYMoo
-
-
Tutorials by Vivienne Sze of MIT
- Efficient Processing for Deep Learning: Challenges and Opportunitieshttps://www.youtube.com/watch?v=kYEUkpHpOKA&t=61s
- Tutorials by Professor Sungjoo Yoo of SNU
4-1 Deep Learning Algorithms, Optimization Methods, and Hardware Accelerators (Prof. Sungjoo Yoo) https://youtu.be/ebqVpK4c3cw
4-2 Example of Object Detection Result (Prof. Sungjoo Yoo)https://youtu.be/MEgwTaUdmqw
4-3 Convoiution with Matrix Multiplication (Prof. Sungjoo Yoo)https://youtu.be/2ExjsudgDU4
4-4 High Performance Accelerator Architecture (Prof. Sungjoo Yoo)https://youtu.be/hAZ2t0a7rdU