|
[1]
|
Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., et al. (2017) Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Transactions of the Association for Computational Linguistics, 5, 339-351. [Google Scholar] [CrossRef]
|
|
[2]
|
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-manan, D., et al. (2014) Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B. and Tuy-telaars, T., Eds., European Conference on Computer Vision, Springer, Cham, 740-755. [Google Scholar] [CrossRef]
|
|
[3]
|
Flickr Image Dataset. Kaggle.com. https://www.kaggle.com/hsankesara/flickr-image-dataset
|
|
[4]
|
Vinyals, O., Toshev, A., Bengio, S. and Erhan, D. (2015) Show and Tell: A Neural Image Caption Generator. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 7-12 June 2015, 3156-3164. [Google Scholar] [CrossRef]
|
|
[5]
|
Karpathy, A. and Li, F.-F. (2015) Deep Visual-Semantic Alignments for Generating Image Descriptions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 7-12 June 2015, 3128-3137. [Google Scholar] [CrossRef]
|
|
[6]
|
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need. In: Advances in Neural Information Processing Systems, 5998-6008.
|
|
[7]
|
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., et al. (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. International Conference on Machine Learning, June 2015, 2048-2057.
|
|
[8]
|
You, Q., Jin, H., Wang, Z., Fang, C. and Luo, J. (2016) Image Captioning with Semantic Attention. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 27-30 June 2016, 4651-4659. [Google Scholar] [CrossRef]
|
|
[9]
|
Lu, J., Xiong, C., Parikh, D. and Socher, R. (2017) Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 375-383. [Google Scholar] [CrossRef]
|
|
[10]
|
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W. and Chua, T.S. (2017) SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 5659-5667. [Google Scholar] [CrossRef]
|
|
[11]
|
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S. and Zhang, L. (2018) Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 18-23 June 2018, 6077-6086. [Google Scholar] [CrossRef]
|
|
[12]
|
He, C. and Hu, H. (2019) Image Captioning with Text-Based Vis-ual Attention. Neural Processing Letters, 49, 177-185. [Google Scholar] [CrossRef]
|
|
[13]
|
He, X., Yang, Y., Shi, B. and Bai, X. (2019) VD-SAN: Visu-al-Densely Semantic Attention Network for Image Caption Generation. Neurocomputing, 328, 48-55. [Google Scholar] [CrossRef]
|
|
[14]
|
Fang, H., Gupta, S., Iandola, F., Srivastava, R.K., Deng, L., Dollár, P., et al. (2015) From Captions to Visual Concepts and Back. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 7-12 June 2015, 1473-1482. [Google Scholar] [CrossRef]
|
|
[15]
|
Li, N. and Chen, Z. (2018) Image Cationing with Visu-al-Semantic LSTM. IJCAI, July 2018, 793-799. [Google Scholar] [CrossRef]
|
|
[16]
|
Wang, Y., Lin, Z., Shen, X., Cohen, S. and Cottrell, G.W. (2017) Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition. Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 7272-7281. [Google Scholar] [CrossRef]
|
|
[17]
|
Ren, Z., Wang, X., Zhang, N., Lv, X. and Li, L.J. (2017) Deep Re-inforcement Learning-Based Image Captioning with Embedding Reward. Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, Honolulu, HI, 21-26 July 2017, 290-298. [Google Scholar] [CrossRef]
|
|
[18]
|
Zhang, L., Sung, F., Liu, F., Xiang, T., Gong, S., Yang, Y. and Hospedales, T.M. (2017) Actor-Critic Sequence Training for Image Captioning. arXiv preprint arXiv:1706.09601
|
|
[19]
|
Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556
|
|
[20]
|
Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. [Google Scholar] [CrossRef] [PubMed]
|
|
[21]
|
Papineni, K., Roukos, S., Ward, T. and Zhu, W.J. (2002) BLEU: A Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July 2002, 311-318. [Google Scholar] [CrossRef]
|
|
[22]
|
Lin, C.Y. and Och, F.J. (2004) Looking for a Few Good Metrics: ROUGE and Its Evaluation. NTCIR Workshop, Tokyo, 2-4 June 2004.
|
|
[23]
|
Vedantam, R., Lawrence Zitnick, C. and Parikh, D. (2015) Cider: Consensus-Based Image Description Evaluation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 7-12 June 2015, 4566-4575. [Google Scholar] [CrossRef]
|
|
[24]
|
Sun, J. (2012) Jieba Chinese Word Segmen-tation Tool. https://github.com/fxsjy/jieba
|
|
[25]
|
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Li, F.-F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recog-nition, Miami, FL, 20-25 June 2009, 248-255. [Google Scholar] [CrossRef]
|
|
[26]
|
Ling, W., Dyer, C., Black, A.W. and Trancoso, I. (2015) Two/Too Simple Adaptations of Word2Vec for Syntax Problems. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Co, May-June 2015, 1299-1304. [Google Scholar] [CrossRef]
|
|
[27]
|
gensim: Topic Modelling for Humans. Radimrehurek.com. https://radimrehurek.com/gensim/models/word2vec.html
|