Choosing GIS factors for habitat models

Many habitat models are based on factors such as land cover, topography, and human disturbance, not because they fully describe habitat, but because these are the only relevant factors available as GIS layers. Given the low accuracy of land cover layers and incompleteness of the set of available factors, we guess most models are no more than 70% successful. We discuss how to deal with the crudeness of habitat models, and we recommend metrics for some factors.

We recommend categorical metrics over continuous ones, and use of few rather than many categories within each categorical factor. For some species, steepness or ruggedness are important factors, and easy to model. Topographic position can be useful, but requires us to guess how topographic position was defined in the habitat-use studies we rely on. We recommend using distance to roads rather than road density as a measure of human disturbance. If appropriate soil maps are available in your linkage analysis area, we encourage using soil properties as factors in habitat models for some species.

Habitat factors and metrics

Metrics for habitat factors can be categorical (land-cover types, topographic classes) or continuous (percent slope, distance from a cover type or road). When we have the choice between the two, we usually prefer categorical metrics. For example, if habitat suitability is a function of steepness, we find it easier to characterize the suitability of 2 or 3 steepness classes than to estimate intercept, slope, quadratic terms, or inflection points that would be needed for a linear, curvilinear, or step function of a continuous variable.

When using a categorical variable, we usually limit the number of classes based on biological understanding. For example, suppose we are using distance-to-road classes in a habitat model for a snake. We know snakes get killed on roads. The average daily movement of this snake has a width of about 200 m, so snakes up to 200 m away might be affected by increased risk of mortality. We also know that snakes hear through their jaws, and a study has shown that these reptiles can perceive vibrations from cars passing 50 m away. These vibrations may confuse the snake, or may cause it to avoid the area within 50 m of a road. This suggests that 3 classes (0-50 m, 50-200 m, and >200 m from a road) are all I need. I could create 10 classes, but how would I estimate habitat suitability for each of them? The complex model would be no better than the simple one. Let's face it—our model is crude, and making it more complex is just polishing a turd.

GIS layers commonly used in habitat suitability models include land cover, topographic variables, distance to streams, human disturbance, and soils.

Land cover

Land cover is often the most important factor in the habitat models of many species. The importance of land cover reflects the fact that land cover is related to food, hiding cover, thermal cover, and (for classes like urban land use) human disturbance. The term “vegetation type” is sometimes used for this factor, because most land cover classes are names of vegetation communities. However, land cover also includes mines, farms, urban areas, open water, and other classes that make “vegetation type” an inappropriate term.

Land cover map

Land-cover data are usually treated categorically. Examples of continuous metrics would be tree-canopy closure or distance from forest. Land cover data may be available in a GIS layer with 20-30 coarse classes (National Land Cover Database in the USA) or 70-100 classes (GAP data layers in the USA). However, we have found it useful to lump the 70-100 GAP classes into 25-50 classes for two reasons. First, the scientific literature we use to parameterize our models does not distinguish among habitat suitabilities of several closely related land-cover types—we'd end up scoring them all the same anyway. Second, the tables of cross-classification accuracy for GAP data layers show that many errors involve confusion between closely-related land covers. Pooling these closely-related types thus likely increases the classification accuracy of the map.

Most wildlife habitat studies using land cover layers present the data as if they represent reality, although classification accuracy is typically 60% to 80%. Digital maps developed from different remotely-sensed images can produce markedly different depictions of vegetation. The GIS analyst should always report the resolution and source for land cover data. Typically the developers of land-cover data layers also report classification accuracy; you should pass this information on to the users of your models. It is depressing to report that the land cover map—the most important factor in the model—is also more error-ridden than digital elevation models, census layers, or road layers. But transparency is a hallmark of science, and we gotta tell it like it is.

Topographic variables


Elevation is a determinant of land cover. It also affects the thermal environment of an animal, the amount of precipitation, and the form (rain, snow) of precipitation. Fortunately, digital elevation models (DEM) are available for every land area on Earth. Elevation map In our models, we typically use elevation as a factor when we have literature stating that the species occurs within a certain range of elevation. Depending on our interpretation of the literature, we often recognize 3 classes (below, within, and above the elevation limits) or 5 classes (if we suspect the literature was a crude generalization and we want to assign intermediate suitability to elevation classes near the reported limits).

DEMs are also the basis for several derived variables, including aspect, slope, and topographic position.

Topographic position

Topographic position is correlated with moisture, heat, cover, and vegetation. It also is relevant to cost of movement, and is therefore an attractive factor to include in a habitat model that will be used as a travel cost model. In scientific papers, some animals are reported to be associated with canyon bottoms, steep slopes, or other topographic positions.

Topographic position map

Topographic position can be estimated by classifying pixels into any number of classes such as steep slope, ridgetop, or valley bottom. Topographic position algorithms analyze pattern within a moving window, the size of which must be specified by the analyst. While it is tempting to scale the moving window size to reflect the way each focal species may perceive the landscape, we caution against this. There have been virtually no studies on how any non-human organism assesses topography. More important, all published habitat-selection studies refer to the topographic position as perceived by the human researcher, not the animal! This still leaves the non-trivial issue of estimating the moving window size human researchers use to characterize topographic position. Unable to find any scientific papers on this topic, we have found a moving window size of 200-300 m to yield reasonable results.

Slope and ruggedness

Slope are ruggedness are correlated with protection from predators and cost of movement. Two of the best documented examples are the close association between bighorn sheep and steep terrain they require to escape predators, and the strong association between pronghorn and gentle slopes.


In temperate zones, aspect is a determinant of solar radiation, and consequently temperature, soil moisture, and vegetation. Few habitat models use aspect, however, because few habitat studies suggest that aspect is directly associated with habitat suitability for animals.

Distance to streams

Distance to water is correlated with water, movement, and food for some species. The scientific literature occasionally includes statements that a certain species is usually found within a certain distance of water. In the arid southwest we have unfortunately found that GIS layers often depict springs or artificial waters (earthen tanks) that do not exist on the ground, and do not accurately portray perennial versus ephemeral sections of mapped watercourses. A site visit and conversations with local land or wildlife managers can greatly increase accuracy of any water map.

Human disturbance

Most habitat models contain a factor related to human disturbance. All of our models in California and Arizona used either road density or distance to roads.

Disturbance variables related to roads

Many linkage designs use road density within a moving window around the focal pixel. Unfortunately, despite the seeming scale-invariance of length per length-squared, the calculated value of road density changes erratically and non-intuitively with the size of the moving window. For example, in the image below, if a straight-line road runs through the focal pixel (yellow box). The road density is 6.4 km/km2 within the 100-m radius moving window, 1.3 km/km2 in the 500-m window and 0.6 km/km2 in the 1000-m window!

Thus, it is difficult to reliably estimate resistance for road density classes, and published estimates of animal occurrence with respect to road density cannot be translated to a different moving window size. Because distance to nearest road avoids this problem, and because scientific reports using this metric can be directly imported into a model, we now use it in preference to road density.

Some models assign pixels containing a road a resistance value so high that the pixel is impermeable, or nearly so. However, we advise against this practice because the raster representation of curves in a road will always have spuriously thicker and thinner areas. The “thin” areas will be spuriously modeled as areas of lower resistance. Such distortion can seriously affect the modeled corridor. For example, this would cause a modeled corridor to completely avoid a road that runs partway through the width of the matrix, even if all other habitat characteristics near that road are far superior for the animal. Following “The Cinderella Principle,” we prefer to make the road fit the animal (e.g., by adding underpasses) rather than making the animal's movement fit the road (conserving inferior habitat as a linkage and lengthening the linkage because the large resistance value blinded us to the otherwise optimal route).

Human density and census-derived variables

Some corridor models use human density, but census blocks are often polygons within which humans are not uniformly distributed. Allocating the mean population density to every pixel in a census block will create errors, especially when the mean density does not occur anywhere in the block! Census data can be useful for corridors at a continental scale, or assessing potential release sites for reintroducing a wide-ranging animal, but are not helpful for most linkage designs.

Soils and substrate

Soil texture is important for burrowing species such as kit foxes, badgers, and some toads. Many lizards, rattlesnakes, and pikas are closely associated with rockpiles. However, know of no linkage design which has included soil as a factor in a habitat model.

There are several problems with most soil maps. First, it is often difficult to find a seamless soil map for any linkage area. Even at the county level, there are several maps, each compiled by a different protocol and each providing an idiosyncratic set of soil classes. Second, metadata are sometimes lacking, leaving the user to guess the meaning of soil attributes. In other cases, each polygon had many attributes, none of which (as near as we could tell) were highly correlated with presence of rockpiles, soil suitable for burrows, or the factors we are interested.

Perhaps more useful maps exist in areas where agriculture is more important than where we work (Arizona and southern California). Bottom line: we would like to include soil as a factor, but so far we haven't been able to do so.

GIS-based habitat models are crude and incomplete

Habitat use is driven by availability of food, nest sites, and other resources, safety from predators and other hazards, presence of competitors or facilitating species, and other factors. However these factors are rarely included in GIS models for linkage design! Instead these models are typically based on one to five factors, including land cover, one or two factors related to human disturbance, and one or two topographic factors. The model is built on these factors for a simple reason: they are the only relevant factors for which georeferenced data are available for the entire planning area.

As we described above, each of these GIS layers is related to some aspect of food, cover, and other important components of habitat. But, these GIS layers don't correspond exactly with habitat factors. Statisticians tell us that any statistical or GIS model that fails to cover all aspects of the problem can give misleading results. We simply do not know how strongly the GIS layers we use are correlated with habitat use or movement by most focal species. We'd be delighted with 90% explanatory power and disappointed with 10%. Given the low accuracy of land use layers and incompleteness of the set of available factors, we guess most models are no more than 70% successful. Much better than letting a monkey with a crayon create a habitat map, but far short of the certainty we'd like to provide conservation investors who are risking scarce resources to conserve a linkage.

What can be done about the incompleteness of our models? We propose three responses:

  • Simple honesty. We may have no choice but to build models based on factors for which data are available, even if the factors are not comprehensive, but our credibility is strengthened by acknowledging the issue.
  • Sensitivity analysis. Sensitivity analysis can be used to see how much your map of the predicted best corridor or linkage changes as you make different assumptions about the inputs or structure of the model.
  • Develop good GIS maps of soils, rock outcrops, permanent water sources, and other factors known to affect habitat use by focal species. In our work in the southwestern USA, these factors are important for focal species such as pronghorn, bighorn sheep, prairie dogs, and many reptiles. With reliable GIS coverages of such features, we could immediately improve many models.

Redundancy among GIS factors is (mostly) a non-issue

Statisticians tell us that a model can give unreliable results when there is redundancy among the factors. If two factors in a model were elevation in feet, and elevation in meters, these two factors are perfectly redundant. Although you are too smart to build a habitat model that silly, your model will include factors that are correlated with each other (land cover is related to elevation, for instance). However, the problem is only serious when a variable in the model is over 90% correlated with the other variables. Furthermore, the main impact is on ecological interpretation of the model, not on accuracy of predictions. In general, predictions improve as variables (even highly redundant ones) are added to the model.