August 20, 2018
Contact: Heather Soulen

The Rise of Machines: Using CNNs to Help Estimate Skate Abundance

Scientists from the Northeast Fisheries Science Center are presenting their research at the 2018 American Fisheries Society (AFS) annual conference< in Atlantic City, New Jersey, from August 19 - 23. Operations research analyst Dvora Hart’s AFS presentation focuses on her collaborative work using convolutional neural networks (CNNs) to detect skates in images collected during sea scallop HabCam surveys. This feature story is an extension of her presentation for AFS participants and those involved in fisheries science and natural resource management.

When the Conventional Won’t Work

Stock assessment scientists provide high-quality science information to resource managers to answer important questions about current stock status, how much catch is sustainable, and what steps are needed to rebuild depleted stocks. Conventional stock assessments require certain kinds of data including catch -- the removal of fish from a population by fishing and fishing bycatch. Fishermen in the northeast encounter 7 skate species, but the majority of catch is simply reported as “skate,” making it nearly impossible to effectively estimate absolute abundance and biomass by species. Essentially, we’re catching skates without knowing how many skates are out there, and we may be disproportionately fishing a certain species. This is particularly concerning when there’s a large, thriving fishery for skates, particularly winter skates (Leucoraja ocellata).

How can stock assessment scientists generate reliable estimates for absolute abundance and biomass for skate species? Scientists at our Science Center are collaborating with computer scientists at Kitware to solve this problem using images taken from the Science Center’s annual sea scallop HabCam surveys and a high-tech machine learning using trained convolutional neural networks (CNNs).

The Rise of Machines: CNNs

Convolutional neural networks is a popular machine learning technique commonly used to analyze visual imagery. “CNNs are kind of the big breakthrough in the last 5 years in artificial intelligence. They’re meant to mimic the way the brain works,” said Dvora Hart, operations research analyst at the Science Center. After learning more about CNNs, Hart saw an opportunity to apply CNNs to her stock assessment needs and later teamed up with Kitware to use their specialized software called Video and Image Analytics for a Marine Environment (VIAME), an open-source system for analysis of underwater video and imagery. The goal of VIAME is to enable rapid, low-cost integration of new algorithmic modules, datasets and workflows. VIAME has been developed as part of a nationwide NOAA strategic initiative on automated image analysis.

Convolutional neural networks have training, detection and classification, and calibration steps. During the training phase, people went through a small fraction of the 2015 HabCam survey images, and marked sea scallops, skates, and other organisms of interest they saw. If a skate was identified, then the reviewer drew a bounding box around it. This process is referred to as manual annotation. Of the roughly 6 million images collected by HabCam in 2015, reviewers manually annotated around 120,000 images, about 2% of the total. Out of that 2%, only around a thousand images actually had skates present. The 2015 manually annotated images were then used to train the CNN to identify skates. Once training was completed, the CNN was tested using the 4.1 million images collected by HabCam in 2016. People also manually annotated about 110,000 images from 2016. Using the manually annotated images from 2016, Hart evaluated the performance of the CNN and calibrate it for accuracy. The calibrated CNN was then used to estimate skate abundance in 2016 images based on all 4.1 million images. She notes that you can not calibrate and evaluate the performance of a CNN using the images it was trained on, meaning training was completed with 2015 images while evaluation and calibration was completed with 2016 images.

The Pros and Cons

The Pros:

  1. VIAME has a common framework that’s relatively simple for non-computer scientists to use and the user interfaces (command line or graphical user interface) is similar for all of the different modules.
  2. They have modules for video and still images, and for training and detecting scallops, fish, and skates.
  3. The CNN used for the skate analysis is called YOLOv2, but other CNNs are also available in VIAME. There is also a rapid model building tool, IQR, that, unlike CNNs, doesn’t require a huge number of images for training.
  4. While it does require specialized graphics processing equipment (GPUs, listed in the “Cons” section below), they’re coming down in price (a 3500 core GPU now costs about $700 to $800).
  5. While it takes time to do the manual annotations and learn the expertise required to run CNNs, it’s many times faster than the time it takes to show/train someone to identify skates and then go through millions of images.

The Cons:

  1. CNNs are a bit of a black box: this kind of machine learning isn’t transparent since it’s similar to how human brains work (i.e., human learning is more nebulous).
  2. It requires a lot of images in order to confidently train the CNN.
  3. CNNs aren’t perfect. By comparing the data from the manually annotated images from that generated by VIAME, Hart found 38 images in which the CNN missed a skate. She also found another 25 or so examples where the manually annotated images were wrong and the CNN was right (i.e., human error is also a problem).
  4. CNNs require some expertise and specialized computing equipment: high end graphical processing units and computers with a large power supply to run them. Hart said if you tried running CNNs on normal computers it would take forever, “I tried once on a laptop and it would take 10-20 seconds an image. Now that doesn’t sound too bad but when you multiply this by 4, 5 or 6 million images -- depending on the year -- that works out that it would take it a year to go through those images.”
  5. Another thing to consider is bandwidth in communicating with networks to find and process images, files, etc. 

Hart says this is all a work in progress, but using CNNs allows the team to outline a path forward -- a way to get closer to being able to estimate absolute abundance for skates. “The first thing to do is to calculate the area where the HabCam goes. We can estimate densities in that area.” She notes that one snag is that there are skates outside the HabCam survey area, like the Gulf of Maine, deeper waters and shallower waters. She sees linking the HabCam survey up with the long-running bottom trawl survey conducted annually in the spring and fall at stations across the Northeast continental shelf. Normally, the HabCam and bottom trawl surveys don’t overlap in space and time. However in 2016, the spring bottom trawl survey was delayed and was within a month of the HabCam survey. Hart thinks this might just be close enough in time to compare her 2016 CNN skate abundance estimates to the catch data from the bottom trawl survey. 

Bottom Line

The development of CNNs was a huge breakthrough for all kinds of fields including fisheries. “A lot of times science works that way. Things are going along, doing a little better, a little better, then all of sudden someone has this great idea and there’s this great leap forward. That’s what happened with this and it’s only going to get better over time. This stuff is improving by leaps and bounds. We couldn’t have done this 5 years ago,” said Hart. “It’s already good enough that we can use it for skates. This is the wave of the future in my opinion. Humans can only look at a tiny percent of the images, but this is the way we can analyze all the images without a human.”

Always Looking to the Future

There are a lot of other potential applications with HabCam images. Hart and her team are looking at how this could be applied for hakes, flounders, scallops, goosefish, and even habitat. “Another thing that we’ve actually talked about is habitat. Habitat like sponges and bryozoans and others. We only have the vaguest idea where all this stuff is. HabCam started in 2012 and if you include images collected through 2016, that amounts to seven years worth of images,” Hart said. “We probably over that time have more than 40 million images. Wouldn’t it be great to look through all those images for sponges, bryozoans, barnacles or mussels? We could potentially identify all these kinds of things.”

The VIAME framework is built on top of open-source tools from Kitware’s Kitware Image and Video Exploitation and Retrieval (KWIVER) toolkit..To view the VIAME project, please visit the VIAME github project page..

For more information, contact Heather Soulen.