EST - Trabalho de Conclusão de Curso Graduação
URI permanente para esta coleçãohttps://ri.uea.edu.br/handle/riuea/4795
Navegar
1 resultados
Resultados da Pesquisa
Item Classificação de cenas aéreas em sensoriamento remoto: Uma abordagem utilizando dados de imagem e som e self-supervised learning(Universidade do Estado do Amazonas, 2024-02-08) Ayres, Talissa Moura; Figueiredo, Carlos Maurício Serodio; Figueiredo, Carlos Maurício Serodio; Pantoja, Antônio Luiz Alencar; Cardoso, Fábio de SousaScene classification is an activity in computer vision where models can understand a context or environment without focusing solely on classifying a single object, as in image classification. Therefore, it is an area of extensive research currently, as it is used in important tasks such as content-based retrieval and smart content moderation. Additionally, when performed with remote sensing data, it is crucial for understanding the environment around us, being applied in tasks such as city monitoring and land use classification. Emphasizing the classification of aerial scenes, many of these studies are based on using convolutional neural networks for this activity, thus relying on a large number of annotations for images. Hence, the application of new training techniques such as self-supervised learning (SSL), where the model first learns to generate representations from pseudolabels before performing the main task, has been more widely applied in recent literature. Furthermore, the possibility of using multimodal data with geolocated images and sounds to improve model performance in this task has been demonstrated through the ADVANCE and SoundingEarth datasets. Therefore, this paper demonstrates the use of SSL and audiovisual remote sensing data in conjunction with the application of vision transformers, a new deep learning architecture based on attention mechanisms, for generating embeddings. Firstly, pre-training was conducted on SoundingEarth, using batch triplet loss to bring closer pairs of positive image and sound data and separate distinct pairs. Subsequently, these representations were applied to a logistic regression model to classify aerial scenes from ADVANCE. The results obtained showed precision, recall, and F1-Score above 80% for models trained with both image and sound embeddings. Considering only image embeddings, results were also above 80%, and considering only audio, results were above 40% for these metrics.