Classificação de cenas aéreas em sensoriamento remoto: Uma abordagem utilizando dados de imagem e som e self-supervised learning

Ayres, Talissa Moura

Classificação de cenas aéreas em sensoriamento remoto: Uma abordagem utilizando dados de imagem e som e self-supervised learning

Arquivos

tcc_uea.pdf (5.55 MB)

Data

2024-02-08

Autores

Ayres, Talissa Moura

Editor

Universidade do Estado do Amazonas

Resumo

Scene classification is an activity in computer vision where models can understand a context or environment without focusing solely on classifying a single object, as in image classification. Therefore, it is an area of extensive research currently, as it is used in important tasks such as content-based retrieval and smart content moderation. Additionally, when performed with remote sensing data, it is crucial for understanding the environment around us, being applied in tasks such as city monitoring and land use classification. Emphasizing the classification of aerial scenes, many of these studies are based on using convolutional neural networks for this activity, thus relying on a large number of annotations for images. Hence, the application of new training techniques such as self-supervised learning (SSL), where the model first learns to generate representations from pseudolabels before performing the main task, has been more widely applied in recent literature. Furthermore, the possibility of using multimodal data with geolocated images and sounds to improve model performance in this task has been demonstrated through the ADVANCE and SoundingEarth datasets. Therefore, this paper demonstrates the use of SSL and audiovisual remote sensing data in conjunction with the application of vision transformers, a new deep learning architecture based on attention mechanisms, for generating embeddings. Firstly, pre-training was conducted on SoundingEarth, using batch triplet loss to bring closer pairs of positive image and sound data and separate distinct pairs. Subsequently, these representations were applied to a logistic regression model to classify aerial scenes from ADVANCE. The results obtained showed precision, recall, and F1-Score above 80% for models trained with both image and sound embeddings. Considering only image embeddings, results were also above 80%, and considering only audio, results were above 40% for these metrics.

Palavras-chave

Sensoriamento remoto, Classificação de cenas aéreas, Deep learning, Self-supervised learning, Modelos multimodais, Remote sensing, aerial scene classification, multimodal models

URI

https://ri.uea.edu.br/handle/riuea/6275

Coleções

Bacharelado em Engenharia Elétrica

Página do item completo

Classificação de cenas aéreas em sensoriamento remoto: Uma abordagem utilizando dados de imagem e som e self-supervised learning

Arquivos

Data

Autores

Título da Revista

ISSN da Revista

Título de Volume

Editor

Resumo

Descrição

Palavras-chave

Citação

URI

Coleções

Avaliação

Revisão

Suplementado Por

Referenciado Por