An image caption is something that describes an image in the form of text. It is widely used in programs where one needs information from any image in automatic text format. We analyze three components of the process: convolutional neural networks (CNN), recurrent neural networks (RNN) and sentence production. It develops a model that decomposes both images and sentences into their elements, regions of intelligent languages in photography with the help of LSTM model and NLP methods. It also introduces the implementation of the LSTM Method with additional efficiency features. The Gated Recurrent Unit (GRU) and LSTM Method are tested in this paper. According to tests using BLEU Metrics LSTM is identified as the best with 80% efficiency. This method enhances the best results in the Visual Genome role-caption database.
Software And Hardware