Detecting Solar Panels In Imagery Using Semantic Segmentation
Computer vision is at the forefront of machine learning research. Although much attention has gone to the latest and greatest language models (like GPT3), it would be hard to argue that any sector of machine learning has actually driven as much business value than computer vision. Whether the model is used to identify defects on an assembly line, detect traffic signs or pedestrians in a self driving car, or find a user's face in a camera in portrait mode, computer vision models are already integral to many systems we take for granted.
Since the first geographic information system programs came about in the 1980's, environmental scientists and militaries have been trying to systematically categorize areas on the earth's surface using aerial imagery. Measuring the acres of forest vs non-forest, comparing a region's crop field health over time or identifying possible foreign military bases has led to a robust literature of applying computer vision techniques to aerial imagery. However, the large abundance of high resolution imagery (like found on google earth) is allowing for many more applications of object detection in imagery and the old methods often do not scale. In 2010, being able to detect whether an area of forest is highly productive vs. moderately productive was exciting. Those old techniques typically involved using support vector machine models (SVM) or simply classifying ranges of colors for specific objects of interest. Today, high resolution imagery and neural network allow can identify airplanes, boats, cars, parking spaces and more.
In this project, I build a deep convolutional semantic segmentation neural network to identify rooftop solar panels using aerial imagery. This notebook contains a brief overview of the model architecture and my methodology but the code used to prepare the data, build the model, train the model and make predictions is in the solar_panel_detection.py file in this Notebook's Github repository.
Methods¶
There are several ways to frame a problem like this and each requires a different model architecture. The immense amount of money and talent in the self-driving car industry has led to the publication of several promising flavors of convolutional neural networks including those for semantic segmentation and those for object detection. In an object detection model, the prediction is the minimum bounding box containing the object of interest. For a semantic segmentation model, the output is a mask where each pixel is an integer representing a class of object defined in model training. Unlike the older histogram-mapping techniques that have been around for decades, this type of model can learn context by taking into consideration the entire scene. Although I have no evidence that a semantic segmentation model would be more appropriate than an object detection model for this use case, I chose a semantic segmentation architecture to play with here. Thus, for the training dataset I need to manually create masks where solar panels have pixel values of '1' and non-solar panels have pixel values of '0'. Having more classes like building, road, etc. may lead to better performance since it gives the model more context but I don't have much time to trace out those features for a large training area! below is the general workflow:
1) Choose a study area
I chose San Francisco due to my suspicion the highest density of rooftop solar panels in the US may be there
2) Find high resolution imagery for your study area
The USGS Earth Explorer has the largest catalog of free and public imagery datasets. I downloaded the most recent high-resolution ortho-imagery for my area of interest from here. I only downloaded 20 frames since I don't have much time to trace features.
3) Choose a training area
The more the merrier, but it takes a long time to manually draw in features. I choose 4 out of 20 frames for my training area.
4) Draw polygons representing all classes of interest
In my case, I drew a polygon in a GIS around each solar panel and created a union with the outline polygon of the training area so that the entire coverage of the training area can be categorized in the 2 classes of interest. If you are new to GIS, use QGIS software as there is excellent documentation. Make sure the polygon is in the same projection as your imagery data and save it as a shapefile. If you get that far, you can use the .py in this repo to do the rest of the preprocessing. I traced about 500 solar panels and it took me about 3 hours. The output of this step should be a polygon shapefile with the exact extent and projection of the training area imagery tile with all portions of the shapefile belonging to the classes of interest. You can take a look at the shapefiles in the training folder of this repo for an example.
5) Create a raster of your training area(s) classified by the polygon you just created
The GDAL library, the most robust python library for manipulating spatial data, contains a tool that converts a polygon layer to a raster. The output should be a raster file of your masks in the same format, extent, cell size, and projection as the original imagery tile. Once again, take a look at my code in solar_panel_detection.py file to see how to do this.
6) Tile the train and test areas of your input imagery and labels
Cut the images into smaller pieces for faster processing and batching in the model. GDAL contains a nice tool for this.
7) Load the tiles as arrays to form a train, test, and labels dataset
The full dataset array in this case will be 4D (num. tiles, tile size, tile size, num. color channels)
8) Make modified copies of the training examples (augment) to increase train dataset size
Augmenting often leads to better performance, especially if your training dataset size is as small as this.
9) Build and train the model
10) Get prediction array and convert predictions to georeferenced raster images
Take the shape, projection, and geometry of the test image's tile and overwrite the data with the prediction array from TensorFlow and then save it as a new, georeferenced tif file. This was a bit tricky to figure out but the required code is surprisingly short.
11) Convert rasters to polygons and extract classes of interest (solar panels)
The desired output of the project is a polygon around solar panels, so I convert these prediction mask rasters to polygons, and merged all the tiles' panels into a single geojson for convenient viewing.
12) Visualize results on a map over the original imagery
see below!
The Model¶
I built the model using Tensorflow and Keras. The basic model architecture is a pretrained, lightweight imagenet model (MobileNetV2) used as an encoder, downsampling the images. Then, an upsampling decoder produces a full-resolution output of logits which can be used to make predictions. This type of architecture is described as a u-net model, and my implementation of this model closely follows the TensorFlow documentation found here. Below is a graph of the model as output of tf.keras.utils.plot_model(model, show_shapes=True)
Visualize Results¶
The below code generates a Folium map of the training areas, test areas, and predicted solar panel polygons.
import folium
import geopandas as gpd
import tensorflow as tf
from tensorflow import keras
from matplotlib import pyplot as plt
import branca
import urllib.request, json
# get the spatial data from Github
project_area = gpd.read_file('https://raw.githubusercontent.com/bimewok/Semantic_Segmentation_Imagery/main/shapes/test_area.geojson')
predictions = gpd.read_file('https://raw.githubusercontent.com/bimewok/Semantic_Segmentation_Imagery/main/predictions/rooftop.geojson')
train_solar_panels = gpd.read_file('https://raw.githubusercontent.com/bimewok/Semantic_Segmentation_Imagery/main/shapes/train_solar_panels.geojson')
train_area = gpd.read_file('https://raw.githubusercontent.com/bimewok/Semantic_Segmentation_Imagery/main/shapes/train_area.geojson')
# convert projection of data to match image overlays
project_outline_json = project_area.to_crs(epsg='4326').to_json()
predictions_json = predictions.to_crs(epsg='4326').to_json()
train_area_json = train_area.to_crs(epsg='4326').to_json()
train_panels_json = train_solar_panels.to_crs(epsg='4326').to_json()
# initialize the map
m = folium.Map([37.708, -122.351],
zoom_start=10,
tiles='OpenStreetMap',
max_zoom=22, control=False)
# load a json text file with coordinate extents
# for each image overlay
with urllib.request.urlopen("https://raw.githubusercontent.com/bimewok/Semantic_Segmentation_Imagery/main/map_tiles/tile_extents.txt") as url:
data = json.loads(url.read().decode())
# add an image overlay for each imagery tile used
# in the project
for key in data.keys():
url = 'https://raw.githubusercontent.com/bimewok/Semantic_Segmentation_Imagery/main/map_tiles/{}.jpg?raw=true'.format(key)
coords = data[str(key)]
folium.raster_layers.ImageOverlay(url, bounds=coords, control=False).add_to(m)
# define a styling function for each layer
# in the legend as to set the color
def outline_style_funct(anything):
return {"fillOpacity": 0.0, 'weight':6}
def predictions_style_funct(anything):
return {"fillOpacity": 0.0, 'weight':2, 'color':'#BB3C21'}
def train_style_funct(anything):
return {"fillOpacity": 0.0, 'weight':6, 'color':'#29930E'}
def train_panel_style_funct(anything):
return {"fillOpacity": 0.0, 'weight':2, 'color':'#7C2ED4'}
lgd_txt = '<span style="color: {col};">{txt}</span>'
# add layers to map
test_area_layer = folium.features.GeoJson(
project_outline_json,
style_function=outline_style_funct,
name= lgd_txt.format(txt='Test Area', col='#258BFF')
)
predictions_layer = folium.features.GeoJson(
predictions_json,
style_function=predictions_style_funct,
name=lgd_txt.format(txt='Predicted Solar Panels', col='#BB3C21'))
train_area_layer = folium.features.GeoJson(
train_area_json,
style_function=train_style_funct,
name=lgd_txt.format(txt='Train Area', col='#29930E'))
train_panels_layer = folium.features.GeoJson(
train_panels_json,
style_function=train_panel_style_funct,
name=lgd_txt.format(txt='Train Solar Panels', col='#7C2ED4'))
m.add_child(test_area_layer)
m.add_child(predictions_layer)
m.add_child(train_area_layer)
m.add_child(train_panels_layer)
folium.LayerControl(collapsed=False, position='bottomright').add_to(m)
# show map
m