BoatNet: Automated Small Boat Composition Detection using Deep Learning on Satellite Imagery

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Header Revision notes Abstract Content Author and article information Comments

Version and Review History

Published Version

Preprint version 2

Reviewed by Stefan Peters

Preprint version 1

Reviewed by Irwan Priyanto Reviewed by Stefan Peters

Record: found
Abstract: found
Article: found

Is Open Access

BoatNet: Automated Small Boat Composition Detection using Deep Learning on Satellite Imagery

Preprint

research-article

This is not the latest version for this article. If you want to read the latest version, click here.

Author(s): Guo Jialeng ¹ ^, , Santiago Suarez de la Fuente ¹ , Tristan Smith ¹

Publication date (Electronic preprint): 6 January 2023

Journal: UCL Open: Environment Preprint

Publisher: UCL Press

Keywords: Object Detection, Deep Learning, Transfer Learning, Small boats activity, Climate Change, The Environment, Climate, Energy, Statistics, Sustainable development, Policy and law

Bookmark

Revision notes

We would like to thank Dr Stefan Peters for his assessment of our work and the useful suggestions. As discussed below, we have considered all suggestions and made relevant edits and improvements to the manuscript.

1. The authors missed to provide details on the imagery source and did not discuss resolution details on the available and used satellite imagery extracted that was from Google Earth Pro. For the applied eye altitude (200m) Google Earth Pro usually uses mosaiced true color composites derived from Digital Globe's WorldView-1/2/3 series, GeoEye-1, and Airbus' Pleiades, all of which provide data at around 0.5m spatial resolution. How does this refer to the resolution you extracted at 200m eye altitude?

Response: (1) We will answer the question of imagery source in the following comment (i.e. comment 2). (2) Resolution: As mentioned in the third paragraph of page 5, the resolution of images extracted from Google Earth Pro were 4800 pixels x 2908 pixels. (3) Eye altitude: the eye altitude at Google Earth Pro has no revenant with the resolution of the satellite image. A 50 cm spatial resolution means you won’t be able to recognize any object smaller than 50 cm. However, for objects such as small boats the 50 cm resolution is adequate. In this work, we fixed all eye altitudes to 200 meters because we wanted to fix the scale of the map, whose benefits had been discussed detailed on “D. Object Measurement and Classification” on page 6.

2. Google Earth Pro does display the satellite imagery source, as for instance: “Image @ 2022 CNES / Airbus”, which refers to Airbus' Pleiades imagery. However, the authors did not mention the satellite imagery source(s) of the 694 high-resolution imagery being used.

Response: Thank you very much for your reminder. We appreciate the importance of disclose of imagery sources. Thus, we have supplemented imagery sources to the preprint version 2 manual, for instance, Image © 2022 CNES/Airbus. Another minor error, the counting number of images of the Gulf of California from 2018 to 2021 is 690 instead of 694, which is now corrected.

3. The paper also leaves a few further open questions: What would be the benefit of additional spectral bands in the IR part of the EMS, if available? Worldview3 for example comes with 29 bands. PlanetScope with 5 bands. What would be the minimum required spatial resolution? Which multispectral (high-res) satellite imagery is available for free, and which one is not? A discussion on satellite imagery access, availability, costs and in particular resolution (spatial, spectral, temporal) would be beneficial for this research.

Response: Thank you very much for the recommendation. We recognise the value of bands, availability and costs in satellite imagery, however, this is a little bit out of the scope of the paper because we did all the image extraction work directly at Google Earth Pro. But we will consider your suggestion for a future comparative paper on satellite imagery access, availability, and costs!

4. Did you account for the count of duplicates?

Response: Yes, we checked for duplication under the methodology developed for this work. However, due to fixed timestamps and coordinates being used in this research and avoiding duplicated areas when extracting images from Google Earth Pro, avoided the duplication or double-accounting of the small boats.

5. Details on how satellite images were extracted from Google Earth Pro are missing.

Response: We have now added the details on how satellite images were extracted from Google Earth Pro on page 5.

6. The issue of boats located on 2 adjacent images could have been addressed by applying tiling with spatial overlap (for instance of 50%, depending on the set tile size) …page 7 second paragraph: “…some large vessels …do not appear fully in an image”

Response: We are grateful you mentioned the spatial overlap. We did not have problems with this spatial issue since we manually extracted the images, allowing us to filter them where small boats did not appear fully.

7. The authors could also consider masking out land areas.

Response: We appreciate your suggestion on masking out land areas for better detection of small boats. However, due to the transfer learning, all images used for the training data sets are based on the boats on the sea. The algorithm will not detect the ships on the land areas in high confidence, for instance, Figure 13 on page 9.

8. Page 6: 2 paragraph: The authors mentioned shadows and clouds but did discuss in the next sentence the removal of haze. This leaves the reader with open questions about shadows and clouds (although Google Earth Pro imagery is as good as everywhere cloud-free mosaics).

Response: (1) This has been addressed in the text where it is now clear that what we are talking about haze clouds. (2) According to the paper “Single Image Haze Removal Using Dark Channel Prior” [1], shadow is one of the three factors that cause low intensity in the dark channel. Thus, while the cited paper focuses on removing haze, shadows can be removed in the same way.

Minor comments

1. Abstract: 1st sentence: improve wording.

Response: We believe this sentence is clear.

2. Abstract: I suggest to replace the term “Techno-activity” with “Technology…”

Response: We have replaced the term “Techno-activity” with “technological and operational assumptions”.

3. Abstract: Replace GPS with “Global navigation satellite system (GNSS)”

Response: Replaced.

4. Abstract: “…The work produced a methodology named BoatNet that can detect, measure and classify small boats…” – I suggest to also inform about target classes (shipping/leisure)

Response: Text modified to incorporate the boat classes.

5. Page 1: Unit ‘Mt’ should be written in full when using the first time: “Megaton (Mt)”

Response: Replaced.

6. Same accounts for CO2e: Carbon dioxide equivalent (page 2 last line)

Response: Replaced.

7. Page 2: first 3 paragraphs: I suggest adding the respective literature references to back up your statements.

Response: Added 20 more citations: 14 to 35.

8. Page 2 – section C.: First sentence: “Bringing deep….is essential.” Why is it essential – what for?

Response: Thank you for your comment. We have added the argument on why deep learning is essential for satellite image recognition on page 2.

9. Page 2 – section C.: Third sentence: I suggest using the term ‘resolution’ instead of ‘quality’

Response: Thank you for your comment. In fact, it is not “resolution”. However, we have deleted this sentence to better explain what you suggested.

10. Page 2 – section C.: Forth sentence: I suggest rewording into something like: “Machine learning is widely used for satellite imagery analysis.

Response: Thank you for your comment. We have deleted this sentence to better explain what you suggested.

11. Whole text: You may consider replacing “Satellite image” with “Satellite imagery”

Response: Thank you for your comment. It has been fixed.

12. Page 3 – line 8: “…and fuel used data” Did you mean fuel-used data? The sentence wasn’t clear to me.

Response: Thank you for your comment. It is fuel-used data. Fixed.

13. Page 3 – line 10: “CO2e” Does “e” refer to estimate? Please write full form when using an abbreviation for the first time in the text.

Response: It is equivalent and has been fixed in Section A of Part II of page 3.

14. Page 3 – section B – paragraph 2: “…each number is neither zero nor new, but…” Are you sure that is correct? What is a “new” number? Please adjust to improve clarity.

Response: Thank you for your comment. It should be “one”. Fixed.

15. Page 5 – end of chapter II: I recommend adding a summary of the literature review including argumentation for why Yolo CNN was selected for this work (just before the last paragraph)

Response: Thank you for your comment. I have added one paragraph on page 5 for arguing that.

16. Page 5 – end of chapter II- last paragraph: “…aims at detecting small boats…” I recommend adding the fact that the proposed model intends to detect specific boat types (fishing, recreational)

Response: Thank you for your comment. It is been corrected.

17. Page 6 - Fig. 5 caption: reference should be replaced by ref 51 (Dwivedi …Yolov5), or at least ref 51 should be added.

Response: Thank you for your comment. You are right. Ref 51 (now Ref 86) is one of the key references for Fig.5. I have added Ref 51 (now Ref 86) to the caption of Fig. 5.

18. Page 6 - the last paragraph: “ To validate… was done beteen BoatNet…” à correct between

Response: Corrected.

19. Page 7: what is the loss of prediction (detection) accuracy due to image resizing? Depending on research goals (and other factors), it is sometimes worth running a multiple day training.

Response: Thank you for your comment. Firstly, the detected imagery is not resized. Secondly, the reason for resizing the training dataset in this work is that we wanted to remove the “blank information” of an image. For instance, in Fig. 4, the two small boats are extremely concentrated. Since we labelled small boats in the training data sets, most resized images are not labelled and, thus, are full of “blank information”. Therefore, considering the prediction loss due to image resizing then it might not be suitable. For the question of multiple-day training, Colab Pro limits RAM to 32 GB while Pro+ limits RAM to 52 GB. Colab Pro and Pro+ limit sessions to 24 hours. Besides, one of our main aim in the future is to find an efficient computing way to apply object detection to edge computing, and a multiple-day training idea might just be an option. We take your suggestion as future work on this topic.

20. Page 8: false counts caused by nearby located boats: couldn’t the boat type classification allow to at least distinguish between different (adjacent) boat types?

Response: Thank you for your comment. Most false counts or the loss in precision are caused by misdetection on the same boat classification, for instance, Fig. 11. However, the key is with the image quality (our detected imagery’s pixels are 4800 pixels x 2908 pixels), not the algorithm.

21. Page 8: last paragraph: “Nevertheless, as Figures 11, 12, 13 demonstrate, the model still detects most small boats in poorly detailed satellite images,…” What exactly did you refer to with “poorly detailed” ?

Response: Thank you for your comment. Poorly detailed images mean that although the pixels are 4800 pixels x 2908 pixels in our work, the imagery details are still unclear. For instance, when taking tourist photos of your friends with your mobile phone, you may notice that the details on your friends' faces are more visible than the snowy mountains in the background.

22. “precision of training can be up to 93.9%,” … why isn’t this result not explained in the result section? How did you derive 93

Response: Thank you for your comment. However, in the results section, we hope to focus on (1) fundamental issues behind satellite image detection in the real-world, for instance, imagery quality; (2) statistics results relevant to the maritime energy-emission problem for further research; (3) The precision can be derived with the formula: Precision = TP / (TP + FP), where TP is true positive, FP is false positive. Precision reflects the sensitivity of the classifier to positive categories, and a high Precision indicates that the classifier rarely misclassifies negative categories as positive categories.

23. Page 10, fourth paragraph: “Due to the low data quality of the selected regions, the images are less suitable as training datasets.” What exactly did you mean with “low data quality”

Response: Thank you for your comment. Hopefully, Answer 21 has already resolved this question!

[1] He, K., Sun, J. and Tang, X., 2010. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, 33(12), pp.2341-2353.

We would like to thank Dr Irwan Priyanto for his assessment of our work and the useful suggestions. As discussed below, we have considered all suggestions and made relevant edits and improvements to the manuscript.

1. Authors should present the YOLOv5l architecture and cite previous research using YOLO5vl.

Response: Thank you for your comment. We have added the argument on why we use YOLO and citations of previous research on page 5.

2. In addition, it is necessary to include GPU resources and the employed framework along with computational cost analysis.

Response: Thank you for your comment. We added sentences on the GPU resources and the employed framework we used in Section C of page 6. Regarding the computational cost analysis, Fig 5 shows the relationship between Average Precision (AP) and GPU Speed.

3. To enrich the analysis, the author should add a comparison of research results with other methods.

Response: Thank you very much for the recommendation. We recognise the value of doing the same image recognition with different models. While we considered this step in the initial stages of the research as we progressed with the relevant discrimination processes to narrow down the potential object detection method, it was clear that this task became out of the scope of the objective of this paper. But we considered your suggestion for a future comparative paper on different algorithms for the same problem on page 10.

Abstract

Content

Author and article information

Journal

Title: UCL Open: Environment Preprint

Publisher: UCL Press

Publication date (Electronic preprint): 6 January 2023

Affiliations

[1 ] UCL Energy Institute, The Bartlett School of Environment, Energy and Resources, University College London, 14 Upper Woburn Place, London WC1H 0NN, UK;

Author notes

[* ]Email: jialeng.guo@ 123456ucl.ac.uk .

Author information

Guo Jialeng https://orcid.org/0000-0003-1640-5443

Santiago Suarez de la Fuente https://orcid.org/0000-0002-8787-8531

Tristan Smith https://orcid.org/0000-0002-1925-169X

Article

DOI: 10.14324/111.444/000177.v2

SO-VID: 7a1bf69a-4e35-4a34-9fc2-760902ba41ab

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY) 4.0 https://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

History

Date received : 20 July 2022

Date accepted : 3 April 2023

Comments

UCL Open: Environment Editorial Office wrote:

Date: 03 April 2023

Handling Editor: Dr Craig Styan

Editorial decision: Accept. This revised article has been accepted following peer review and it is suitable for publication in UCL Open: Environment.

2023-04-03 13:04 UTC

UCL Open: Environment Editorial Office wrote:

Date: 06 January 2023

Handling Editor: Dr Craig Styan

The article has been revised, this article remains a preprint article and peer-review has not been completed. It is under consideration following submission to UCL Open: Environment for open peer review.

2023-01-16 15:19 UTC

One person recommends this

UCL Press journals including UCL Open Environment have now moved website.

You will now find the journal, all publications, reviews and submission information at https://journals.uclpress.co.uk/ucloe

BoatNet: Automated Small Boat Composition Detection using Deep Learning on Satellite Imagery

Revision notes

Abstract

Content

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article