Kelp Talk

Coast Selection Algorithms

  • Quia by Quia

    A number of folk have asked us about the color skewing, and also asked us about the algorithm we use to select images. Stay tuned, as we’re working with Zooniverse to release the code they use to select images, and then anyone who is interested in having at it – either for their own applications (say, spotting coral reefs in the tropics, and needing to subset out only coastal images) or who is interested in trying to make our process better (and reduce the number of land images while not losing coastline images). We’d love to collaborate with more folk out there!

    From the blog this past November... I am wondering if any progress has been made on this front? It seems this kind of endeavor has already been undertaken to great success, using fairly basic thresholding. Obviously we are not interested in inland bodies of water, but trimming down 'all coastlines' to 'all connected coastlines' is not that big of a step.


  • Quia by Quia

    Well, I've done a bit of work here myself now. Here's a quick run down of the progress I've made.

    I read the Floating Forest blog post that said that you guys were hoping to improve the algorithm for selecting images, but didn't see the algorithm shared anywhere, so I went and poked at the Landsat images. My poor little laptop did not enjoy trying to process those gigantic slices and I had to give that up. I went and read a few papers about locating coastlines from just landsat imagery, and it actually seemed to be a reasonably simple problem.

    So I wrote a small script in python that goes through the already sliced and processed Floating Forest images to try and select a smaller set of probable coastlines. It removes about 60% of the junk images, while having a near zero false negative rate(need more data!).

    It does a hsv threshold to select the water pixels in an image, very broadly defining 'water' as the extremes of the washed out/dark images, and checks to see if 5% of the pixels in the image are water. This selection misses some very small edge cases where we have a small bit of ocean in the corner, so we also check to see if any of the edges have > 20 pixels of water.

    Using the actual channels from landsat rather than the combined image would probably reduce the need to have such broad thresholds, removing even more false positives. Even better would be to do feature recognition based on those thresholds and on the whole landsat image and extract all the coastlines ( )

    These are the false negatives out of 240 random images, and 46 coastlines selected from talk. They're not exactly high quality classifying images, but I am including them as false negatives because they are noticeable as a coastline by a human.

    Of course, after doing all that I went looking for who to send it to and found the selection algorithm you're using on the project github... This is more of a post-processing 'does it even help all that much' test setup, anyways. The answer is definitely yes, it does help remove a lot of the junk images, with a very computationally inexpensive test that doesn't require reprocessing the whole landsat images over again.


  • DZM by DZM admin

    Thanks for posting this, @Quia! Looks like you worked darn hard on it. Impressed!

    I've messaged the science team to let them know you put this together.

    Thanks again!


  • jebyrnes by jebyrnes scientist

    This is fantastic! We're talking about this on email right now - and hopefully can get this up and running as part of the flow of getting new images up (we have plans to try and add some new regions soon!) Hopefully one of the devs will comment back here to give their thoughts!


  • Quia by Quia

    Good to hear!

    One thing that may be required to put this to use is making the water threshold range not hardcoded, and instead pull the min/max values of water from the landsat image currently being processed. (One of the neat things in that paper is how they automatically get all these values!) It would remove even more junk land images, all the heavily shadowed mountains and rainclouds that still make it through this filter. Also everything I've read so far says that you should absolutely not try and do this on the whole natural-color images and should only use band 5 (or 4 or 6) because of how much land you will end up selecting as water. That's likely the biggest source of improvement.

    Going beyond cleaning up the existing output, though, the thresholded images do make great images for feature analysis and making a new database of 'coastline' to feed into the existing selection algorithm, rather than trying to clean up the messy output of the current selection. It's on my list of things to try out, this little program was just the simplest idea to carry out and see how much it helped. ( more papers to reference for doing that! )


  • Quia by Quia

    I borrowed a workstation with a bit more RAM. 😃

    If you took the output of the threshold function and made a contour from the ocean/land boundary, and converted it to lat-long coords instead of relative within the image, you'd have a lovely, accurate coastline for any 10% cloud cover landsat image you ran it on. That contour could then be used as a much more accurate check if a given floating forest sized slice intersects the coastline. I don't want to touch on doing this without knowing what kind of format the current coastline database is in.

    You could also use something like the script I previously shared to remove heavily clouded images from the subject pool after selecting the coastline.

    Sample output