Review on Textual Description of Image Contents
Vasundhara Kadam, Ramesh M. Kagalkar "Review on Textual Description of Image Contents". International Journal of Computer Trends and Technology (IJCTT) V30(4):213-217, December 2015. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
Visual image relation with visually
descriptive language is a major challenge for
computer vision specifically becoming additional
relevant as recognition as well as detection
techniques are beginning to work. This paper
reviews on techniques that are used for image
description such as associations between objects
present in that image. Additionally, paper presents
an approach to automatically make natural
language descriptions from images shortly. This
proposed system consists of two parts called content
planning and surface realization. The first part,
content planning, smooths the output of computer
vision-based recognition and detection algorithms
with statistics extracted from large groups of
visually descriptive text to define the best content
words to use to define an image. The another step,
surface realization, selects words to build natural
language sentences based on the projected content
and overall statistics from natural language.
References
[1] Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik
Dhar, Siming Li, Alexander C. Berg, and Tamara L. Berg,
“BabyTalk: Understanding and Generating Simple Image
Descriptions”, IEEE Transactions on Pattern Analysis and
Machine Intelligence., vol. 35, no. 12, December 2013.
[2] P.F. Felzenszwalb, R.B. Girshick, and D. McAllester,
“Discriminatively Trained Deformable Part Models,
Release 4,” http://people.cs.uchicago.edu/pff/latentrelease4/,
2012.
[3] P. Kuznetsova, V. Ordonez, A.C. Berg, T.L. Berg, and Y.
Choi, “Collective Generation of Natural Image Descriptions,” Proc. Conf. Assoc. for Computational
Linguistics, 2012.
[4] G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A.C.
Berg, and T.L. Berg, “Babytalk: Understanding and
Generating Simple Image Descriptions,” Proc. IEEE Conf.
Computer Vision and Pattern Recognition, 2011.
[5] S. Li, G. Kulkarni, T.L. Berg, A.C. Berg, and Y. Choi,
“Composing Simple Image Descriptions Using Web-Scale
n-Grams,” Proc. 15th Conf. Computational Natural
Language Learning, pp. 220-228, June 2011.
[6] V. Ordonez, G. Kulkarni, and T.L. Berg, “Im2text:
Describing Images Using 1 Million Captioned
Photographs,” Proc. Neural Information Processing
Systems), 2011.
[7] Y. Yang, C.L. Teo, H. Daume, and Y. Aloimonos,
“Corpus-Guided Sentence Generation of Natural Images,”
Proc. Conf. Empirical Methods in Natural Language
Processing, 2011.
[8] A. Aker and R. Gaizauskas, “Generating Image
Descriptions Using Dependency Relational Patterns,” Proc.
28th Ann. Meeting Assoc. for Computational Linguistics,
pp. 1250-1258, 2010.
[9] T.L. Berg, A.C. Berg, and J. Shih, “Automatic Attribute
Discovery and Characterization from Noisy Web Data,”
Proc. European Conf. Computer Vision, 2010.
[10] A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C.
Rashtchian, J. Hockenmaier, and D.A. Forsyth, “Every
Picture Tells a Story: Generating Sentences for Images,”
Proc. European Conf. Computer Vision, 2010.
[11] Y. Feng and M. Lapata, “How Many Words Is a Picture
Worth? Automatic Caption Generation for News Images,”
Proc. Assoc. for Computational Linguistics, pp. 1239-1249,
2010.
[12] S. Gupta and R.J. Mooney, “Using Closed Captions as
Supervision for Video Activity Recognition,” Proc. 24th
AAAI Conf. Artificial Intelligenc, pp. 1083-1088, July
2010.
[13] C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier,
“Collecting Image Annotations Using Amazon?s
Mechanical Turk,” Proc. NAACL HLT Workshop Creating
Speech and Language Data with Amazon?s Mechanical
Turk, 2010.
[14] A. Torralba, K.P. Murphy, and W.T. Freeman, “Using the
Forest to See the Trees: Exploiting Context for Visual
Object Detection and Localization,” Comm. ACM, vol. 53,
pp. 107-114, Mar. 2010.
[15] M.-C. de Marnee and C.D. Manning, Stanford Typed
Dependencies Manual, 2009.
[16] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-
Fei, “ImageNet: A Large-Scale Hierarchical Image
Database,” Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2009.
[17] C. Desai, D. Ramanan, and C. Fowlkes, “Discriminative
Models for Multi-Class Object Layout,” Proc. 12th IEEE
Int?l Conf. Computer Vision, 2009.
[18] A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth,
“Describing Objects by Their Attributes,” Proc. IEEE Conf.
Computer Vision and Pattern Recognition, 2009.
[19] A. Gupta, P. Srinivasan, J. Shi, and L.S. Davis,
“Understanding Videos Constructing Plots: Learning a
Visually Grounded Storyline Model from Annotated
Videos,” Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2009.
[20] S. Gupta and R. Mooney, “Using Closed Captions to Train
Activity Recognizers that Improve Video Retrieval,” Proc.
IEEE Computer Vision and Pattern Recognition Workshop
Visual and Contextual Learning from Annotated Images
and Videos, June 2009.
[21] N. Kumar, A.C. Berg, P.N. Belhumeur, and S.K. Nayar,
“Attribute and Simile Classifiers for Face Verification,”
Proc. 12th IEEE Int?l Conf. Computer Vision, 2009.
[22] C. Lampert, H. Nickisch, and S. Harmeling, “Learning to
Detect Unseen Object Classes by Between-Class Attribute
Transfer,” Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2009.
[23] L.-J. Li and L. Fei-Fei, “OPTIMOL: Automatic Online
Picture Collection via Incremental Model Learning,” Int?l J.
Computer Vision, vol. 88, pp. 147-168, 2009.
[24] J. Shotton, J. Winn, C. Rother, and A. Criminisi,
“Textonboost for Image Understanding: Multi-Class
Object Recognition and Segmentation by Jointly Modeling
Texture, Layout, and Context,” Int?l J. Computer Vision,
vol. 81, pp. 2-23, Jan. 2009.
[25] J. Sivic, M. Everingham, and A. Zisserman, ““Who Are
You?” Learning Person Specific Classifiers from Video,”
Proc. IEEE Conf. Computer Vision and Pattern
Recognition, 2009.
[26] J. Wang, K. Markert, and M. Everingham, “Learning
Models for Object Recognition from Natural Language
Descriptions,” Proc. British Machine Vision Conf., 2009.
Keywords
Computer vision, image description
generation, content planning, surface realization.