Deformable patch-based-multi-layer perceptron mixer model for forest fire aerial image classification

Payal Mittal, Akashdeep Sharma*, Raman Singh

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

20 Downloads (Pure)


Unmanned aerial vehicles (UAVs) that include mounted camera sensors enable a wide range of remote sensing application deployments. Due to UAVs’ capacity to explore distant locations such as woods, situational awareness for applications, such as search and rescue in wildfires, estimation of endangered flora and fauna, and emergency responses have undergone a paradigm change. A multi-layer perceptron (MLP)-Mixer architecture is suggested for classifying burned piles in dense forests. Convolutional neural networks (CNNs) and the more recent attention-based transformer models have produced cutting-edge outcomes in picture prediction. Convolutions and attention are used in the MLP Mixer architecture in an effort to overcome their drawbacks and improve performance. By including a new module of DePatch in the suggested MLP Mixer model, which separates the input images in a deformable pattern to identify forest fires at an early stage, the shallow learning of CNN layers and fixed-size patch embedding in transformers were eliminated. This suggests that the DePatch-based MLP classification model outperforms transformer approaches in terms of performance, achieving a substantial accuracy of 77.23. Our proposed classification system was evaluated on the pile photos dataset taken during a burning pile of debris in an Arizona pine forest.
Original languageEnglish
Article number022203
Number of pages13
JournalJournal of Applied Remote Sensing
Issue number2
Publication statusPublished - 7 Nov 2022


  • aerial image classification
  • convolutional neural networks
  • transformers
  • MLP mixer
  • computer vision
  • deep learning


Dive into the research topics of 'Deformable patch-based-multi-layer perceptron mixer model for forest fire aerial image classification'. Together they form a unique fingerprint.

Cite this