中文
1、ctw 腾讯和清华大学合作标注的数据集
2、rctw 2017年ICARD比赛的数据集
RCTW-17 is a competition on reading Chinese Text in images. For training and testing, we provide a large-scale dataset that consists of various kinds of images, including street views, posters, menus, indoor scenes
http://rctw.vlrlab.net/dataset/
3、SCUT_FORU_Chinese
4、MSRA-TD500
500 (300 training + 200 testing) natural images that their resolution of the image vary 1296x864~1920x1280; Chinese , English or mixture of both
C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu. Detecting texts of arbitrary orientations in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR'12), Providence, Rhode Island, 2012. (MSRA-TD 500 Dataset)
http://pages.ucsd.edu/~ztu/publication/MSRA-TD500.zip