How to Cite?
Kavitha Srinivasan, Shanmuga Velayutham V, Vignesh G, Subash R, "Object Recognition for Visually Impaired People," International Journal of Computer Trends and Technology, vol. 68, no. 8, pp. 33-38, 2020. Crossref, https://doi.org/10.14445/22312803/IJCTT-V68I8P105
Abstract
Deep learning techniques are evolving rapidly in computer vision for many real-time applications, namely object detection, recognition, classification, segmentation, prediction and analysis. In this paper, an object recognition model for visually impaired people is proposed and validated using deep learning techniques for multiple datasets. The proposed model identifies multiple objects in a frame with its corresponding text, and the identified objects are converted into speech to guide the visually impaired people in real-time. The object identification process is carried out using a bounding box technique and a single convolutional neural network. The resulting bounding boxes with less probability than the threshold are eliminated, and the remaining objects are identified using a pre-trained Darkflow model. Then the identified objects are mapped to relevant text and converted to speech using Text-to-Speech (TTS) tool. The proposed model has been validated using four types of datasets, such as Pascal VOC dataset, COCO dataset, BROID challenge dataset and Auto Rickshaw detection challenge dataset. The novelty of this work is: modified intersection over union algorithm for better recognition, chosen datasets have different sets of images, and the weight file is modified to recognize the objects of the challenge dataset. OpenCV and Compute Unified Device Architecture (CUDA) are used for image manipulation and graphics processing along with Tensorflow. The final output is obtained in audio format by applying TTS to the objects identified using Pyttsx, which is a python package that converts simple text to the speech signal.
Keywords
Object identification, Object recognition, YOLO, SSD, Intersection over Union, Darkflow, Text to speech.
Reference
[1] R. Girshick, "Fast R-CNN", IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1440-1448, 2015, doi: 10.1109/ICCV.2015.169.
[2] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 39, Issue: 6, pp. 1137–1149, June 2017, doi: 10.1109/TPAMI.2016.2577031
[3] Liu W. et al., “SSD: Single Shot MultiBox Detector", In Leibe B., Matas J., Sebe N., Welling M. (eds) Computer Vision – ECCV 2016, Lecture Notes in Computer Science, Volume: 9905, Springer, pp. 21-37, 2016, doi: 10.1007/978- 3-319-46448-0_2
[4] R. Joseph and F. Ali, “YOLO9000: Better, Faster, Stronger”, Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, 2016.
[5] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 779-788, 2016, doi: 10.1109/CVPR.2016.91.
[6] M. Everingham, S.M.A. Eslami, L. Van Gool, C.K.L. Williams, J. Winn and Zisserman, “The PASCAL visual object classes challenge”, International Journal of Computer Vision, Volume: 88, Issue: 2, pp. 303–338, 2010.
[7] X. Zhou, W. Gong, W. Fu and F. Du, "Application of deep learning in object detection", IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, pp. 631-634, 2017, doi: 10.1109/ICIS.2017.7960069.
[8] Darkflow installation in Ubuntu. https://github.com/KleinYuan/ easy-Yolo, April 2018.
[9] Darkflow installation in Windows. https://github.com/thtrieu/ dark flow, April 2018.
[10] Deep learning models. http://cv-tricks.com/object-detection/ faster-r-cnn-yolo-ssd, April 2018.