9  Spatial Data Manipulation

In the previous lesson, you learned how to access data on a server via a REST API. The Open Geospatial Consortium (OGC) has adapted the REST API paradigm for geospatial applications. A variety of OGC API standards have been implemented to provide and utilize geospatial data on the web. An overview of the current implementation status can be found here.

In this lesson, we will utilize online data resources in workflows that involve data cleaning, spatial queries, and analyses. Additionally, you will learn to carry out basic raster manipulation operations on a sample raster dataset.

9.1 Data Acquisition

We will work with vector data to correctly identify agricultural land parcels in European Union countries. The Austrian Agricultural Agency (AMA) provides access to Austrian agricultural parcels through an OGC Rest API - Feature interface. The OGC API Features standard is the successor to the OGC Web Feature Service (WFS) specification.

Tip

The R syntax used to interact with OGC APIs is the same as described in the “Data API” section of Lesson Read and Write Data.

Before loading the data into an R script, examine the API’s contents. Visit the web service’s landing page and proceed to the collection page. There you will find an overview of the available layers.

Enter the following URL into your browser:

https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_feldstuecke_2024_1/items?f=json&limit=10

This request returns the first ten features of the i009501:invekos_feldstuecke_2024_1 layer (agricultural land parcels as polygons) in GeoJSON format.

GeoJSON is a JSON-based standard for representing simple geographic features, along with their non-spatial attributes.

Upon inspecting the GeoJSON, you will find the coordinate vertices of the polygon features and attributes such as fs_flaeche_ha (parcel area in hectares) and fnar_bezeichnung (land use).

Use the bbox parameter to filter resources within a specific area:

https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_feldstuecke_2024_1/items?f=json&bbox=14,48,14.02,48.02

For a visual representation of the bounding box, enter the coordinates 14,48,14.02,48.02 into linestrings.com and click “Display box”.

To execute the request in R, use the following script:

library(httr2)
library(geojsonsf)
library(sf)

# Define the new API endpoint
full_url <- "https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_feldstuecke_2024_1/items?f=json&bbox=14,48,14.02,48.02"

# Make the request and convert the response to an sf object
invekos <- request(full_url) %>%   # Create request
  req_perform() %>%                # Execute request
  resp_body_string() %>%           # Extract JSON body as string
  geojsonsf::geojson_sf()          # Convert JSON string to sf object

The above code retrieves polygon features within the specified bounding box and stores them in an object named invekos.

9.2 Data Cleaning

Once the data is loaded into R, we can more closely examine its structure. The dplyr function glimpse, for instance, allows us to preview each column’s name and type:

library(dplyr)
dplyr::glimpse(invekos)
Rows: 100
Columns: 15
$ fart_id           <dbl> 1822, 1822, 1822, 1822, 1822, 1822, 1822, 1822, 1822…
$ gml_geom          <chr> "[B@71f27860", "[B@f69696e", "[B@3aef6174", "[B@775c…
$ gml_length        <dbl> 1086, 883, 786, 1599, 482, 718, 1432, 1094, 884, 116…
$ geom_date_created <chr> "2022-10-01T05:56:50+02:00", "2022-10-01T05:56:50+02…
$ log_pkey          <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ fnar_code         <chr> "A", "A", "G", "A", "G", "A", "A", "A", "A", "A", "G…
$ gml_identifier    <chr> "https://data.inspire.gv.at/0095/33078dd6-05e0-4263-…
$ fs_kennung        <dbl> 111714134, 111714135, 111714132, 111793656, 11179365…
$ geo_part_key      <dbl> 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, …
$ gml_id            <chr> "AT.0095.33078dd6-05e0-4263-a464-f35bd1607ecd.elu.Ex…
$ inspire_id        <chr> "https://data.inspire.gv.at/0095/33078dd6-05e0-4263-…
$ fnar_bezeichnung  <chr> "ACKERLAND", "ACKERLAND", "GRÜNLAND", "ACKERLAND", "…
$ fs_flaeche_ha     <dbl> 1.52789444, 1.76240452, 0.20852359, 7.88961123, 0.02…
$ geo_id            <dbl> 12316762, 12316764, 12316768, 12301276, 12301279, 12…
$ geometry          <POLYGON [°]> POLYGON ((14.01608 48.00954..., POLYGON ((14…

The abbreviation dbl stands for double, a data type that stores numbers with decimal points.

To create a subset of the invekos sf-object, use the following code:

invekos %>%
  head(2) %>%
  dplyr::select(fnar_bezeichnung, fs_flaeche_ha, geo_id, geometry) %>%
  knitr::kable(., format="html")
fnar_bezeichnung fs_flaeche_ha geo_id geometry
ACKERLAND 1.527894 12316762 POLYGON ((14.01608 48.00954...
ACKERLAND 1.762404 12316764 POLYGON ((14.01608 48.00931...

Field names and entries are in German. To rename them, use the base R function colnames():

# Subset of invekos object
invekos.sub <- invekos %>%
  dplyr::select(fnar_bezeichnung, fs_flaeche_ha, geo_id, geometry)

# Renaming fields
colnames(invekos.sub)[1] <- "land_use"
colnames(invekos.sub)[2] <- "area_ha"

Entries are renamed using dplyr’s mutate and case_when functions:

unique(invekos.sub$land_use)  # Unique entries in 'land_use'
[1] "ACKERLAND" "GRÜNLAND" 
# Renaming entries 'ACKERLAND' and 'GRÜNLAND'
invekos.sub <- dplyr::mutate(invekos.sub, land_use = 
                               case_when(land_use == 'ACKERLAND' ~ 'arable land',
                               land_use == 'GRÜNLAND' ~ 'grassland', TRUE ~ 'Other'))

# Display result
invekos.sub %>%
  head(2) %>%
  knitr::kable(., format="html")
land_use area_ha geo_id geometry
arable land 1.527894 12316762 POLYGON ((14.01608 48.00954...
arable land 1.762404 12316764 POLYGON ((14.01608 48.00931...

For an initial visual impression, plot the sf-object using the base R plot() function:

plot(invekos.sub[1], main="Land Use", key.pos = 1, key.width = lcm(1.8))

By default, the plot() function generates a multi-panel plot, with one sub-plot for each field of the object. However, specifying invekos.sub[1] restricts the output to a single plot, specifically showcasing the land_use field. The function parameter key.pos = 1 aligns the legend below the map (1=below, 2=left, 3=above and 4=right). key.width defines the width of the legend.

Next, we can validate the geometry of sf polygons using st_is_valid:

sf::st_is_valid(invekos) %>%
  summary()
   Mode    TRUE 
logical     100 

The st_is_valid function checks the validity of each geometry, returning a logical vector. The summary confirms that all geometries in our dataset are valid. If any invalid geometries are detected, they can be corrected using the st_make_valid function.

Exercise

Let’s closely investigate a single polygon geometry feature that is created by the code below. The validity check flags an invalid geometry. Can you identify the problem?

# Coordinates for the polygon
coords <- matrix(c(-1, -1, 1, -1, 1, 1, 0, -1, -1, -1), ncol = 2, byrow = TRUE)

# List of coordinate matrices
list_of_coords <- list(coords)

# Constructing the polygon
polygon <- st_polygon(list_of_coords)

# Converting to an sf object
error_sf <- st_sf(geometry = st_sfc(polygon))

# Validity check
sf::st_is_valid(error_sf)
[1] FALSE

When plotted, the polygon reveals a sliver polygon, which is invalid due to its shape.

plot(error_sf)

Another common cause of invalid geometries is self-intersection of lines.

Note: A valid polygon requires at least four coordinate points with the first and last points being identical. A polygon with only three points is considered to have an invalid geometry.

9.3 Vector operations

The structure of an sf object is similar to that of data frames, enabling attribute-based filtering operations using dplyr functions, as described in Lesson Data Manipulation.

For example, to extract grassland parcels, the filter() function is used:

invekos.sub %>%
  dplyr::filter(land_use=='grassland') %>%
  {plot(.[1], main="Land Use", key.pos = 1, key.width = lcm(1.8))}

Note the use of curly brackets in the plot() function. This is required because the dot notation (.) cannot directly index a piped value. For a deeper explanation, see this Stack Overflow discussion.

9.3.1 Geometrical operations

The sf package provides a range of geometric predicates such as st_within, st_contains, or st_crosses. A complete list is available here. While these functions are often used between pairs of simple feature geometry sets, they can also operate on a single sf object:

sf::st_intersects(invekos.sub, sparse = TRUE)
Sparse geometry binary predicate list of length 100, where the
predicate was `intersects'
first 10 elements:
 1: 1
 2: 2
 3: 3, 68
 4: 4, 38, 42
 5: 5, 71
 6: 6, 58, 64, 78
 7: 7, 39
 8: 8, 33, 75, 77
 9: 9, 12, 15, 66
 10: 10

This function returns a sparse matrix revealing intersections within features of the same sf object invekos.sub, indicating possible topology errors. For correcting these errors, GIS software like QGIS is recommended as R’s capabilities for handling topology errors are less advanced.

9.3.2 Binary operations

Binary operations are essential for analyzing spatial relationships between different datasets. Here, we demonstrate this by creating an sf object representing farmsteads using data from the AMA Rest API service:

# Define the new API endpoint for farmsteads
farms_url <- "https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_hofstellen_2024_1/items?f=json&bbox=14,48,14.02,48.02&limit=100"

# Make the request and convert the response to an sf object
farms <- httr2::request(farms_url) %>%   # Create request
  httr2::req_perform() %>%               # Execute request
  httr2::resp_body_string() %>%          # Extract JSON body as string
  geojsonsf::geojson_sf()                # Convert JSON string to sf object

# Plot land parcels with farm points overlayed
plot(invekos.sub[1], main = NULL, key.pos = NULL, reset = FALSE, col_graticule = "grey")
plot(farms[1], main = NULL, key.pos = NULL, pch = 7, col = 'red', add = TRUE, cex = 1)

Note: The plot() function is used sequentially to overlay farm points on top of the land parcel polygons. For advanced plotting techniques, refer to the plot() documentation.

We proceed to calculate the distances between farms and land parcels to determine proximity:

dist_m <- sf::st_distance(farms, invekos.sub)

The st_distance function calculates the shortest distance matrix between the geometries of the two sf objects, with distances returned in meters.

Exercise

Distances are computed in meters, which may seem unexpected since the reference system’s units are in degrees. To understand the underlying calculations, examine the coordinate reference system of the sf objects farms and invekos.sub using sf::st_crs(). This will reveal that objects use a projected coordinate system (EPSG:31287), which allows for metric distance calculations.

In this short exercise, we will take a closer look at the algorithm that is implemented in the function st_distance.

Open the documentation of st_distance to find out how metric distances are derived from geographic coordinates.

According to the documentation, metric distances are computed based on the Coordinate Reference System (CRS) of the sf objects. Since both farms and invekos.sub use EPSG:31287 (a projected CRS), the st_distance function calculates planar (Euclidean) distances in meters. This is possible because projected CRSs preserve distance relationships locally, unlike geographic CRSs which use degrees.

For more detailed information on distance calculations, refer to Algorithms for geodesics, Journal of Geodesy.

When plotting the complete distance matrix dist_m, we see that column 1 contains the distances between the first feature (parcel 1) in the invekos.sub sf object and the 100 farm features of the farms sf object. Accordingly, the matrix has 100 columns (one for every parcel) and 100 rows (one for every farm).

The following line returns the first column of the distance matrix as a vector:

dist_m[, 1]
Units: [m]
 [1]  143.1406  813.3666  825.8695  211.4067 1360.1150  964.9589  804.7171
 [8] 1025.2758  204.4024  810.4991  898.8973

To identify the farm that is located closest to parcel 1, we need to query the index of the minimum value in this vector:

which(dist_m[, 1] == min(dist_m[, 1]))
[1] 1

The which function, a base R utility, identifies the indices of elements that satisfy a given condition. In the example, it returns the index of the smallest value within the vector dist_m[, 1]. Consequently, it indicates which farm is the closest to parcel 1.

The demonstrated procedure for detecting closest farms can be executed for every parcel in a for-loop. The number of the closest farm is appended to a vector named closest. This vector is in turn appended as a new column to the sf object invekos.sub. Finally, invekos.sub is plotted together with farms.

There are now two approaches to perform this loop:

Approach 1: Dynamic Loop Using ncol(dist_m)

This method dynamically adjusts the loop to the number of parcels by using ncol(dist_m). It’s flexible and scalable, making it suitable for datasets where the number of parcels may change.

# Approach 1: Dynamic Loop Using ncol(dist_m)
closest <- c()

for (i in 1:ncol(dist_m)){
  
  out <- which(dist_m[, i] == min(dist_m[, i]))
  
  closest <- c(closest, out)
  
}

cbind(invekos.sub, closest) %>%
  {plot(.[4], main = NULL, key.pos = NULL, reset = FALSE)}

plot(farms[1], main = NULL, key.pos = NULL, pch = 7, col = 'red', add = TRUE, cex = 1)

Approach 2: Fixed Loop Using a Hardcoded Value (e.g., 100)

This method uses a fixed number of iterations (100) based on prior knowledge of the dataset size. It’s simpler but less flexible, as it requires manual adjustment if the number of parcels changes.

# Approach 2: Fixed Loop Using a Hardcoded Value (e.g., 100)
closest <- c()

for (i in 1:100){
  
  out <- which(dist_m[, i] == min(dist_m[, i]))
  
  closest <- c(closest, out)
  
}

cbind(invekos.sub, closest) %>%
  {plot(.[4], main = NULL, key.pos = NULL, reset = FALSE)}

plot(farms[1], main = NULL, key.pos = NULL, pch = 7, col = 'red', add = TRUE, cex = 1)

The output map visualizes the closest farm to each agricultural parcel, highlighting the practical application of sf Geometrical operations.

Tip

This chapter focuses on logical matrix outputs from geometric operations. For operations generating new geometries, such as st_union, st_buffer, and st_centroid, see operations returning a geometry. For network operations on sf objects, consider using sfnetworks or the igraph package.

9.4 Raster operations

Having previously discussed the structure of SpatRaster Objects in Lesson 4s and the reading and writing of such objects in Lesson 8, we now turn our attention to raster manipulation operations like resampling and cropping.

For the following examples, we will utilize a sample dataset from the terra package. You can download the sample data here.

9.4.1 Resampling

Resampling is crucial for working with raster datasets. It alters the spatial resolution, allowing you to align multiple rasters with different resolutions. Additionally, it adjusts the level of detail necessary for your analysis.

Let’s demonstrate resampling with the terra sample dataset:

library(terra)

r <- terra::rast("data/terra_sample.tif") # path may be different on your machine
plot(r, main='SpatRaster from file')

Before we change the raster resolution of SpatRaster Object, it is important to know the original resolution of the raster. You can use the res() function to check the original resolution:

res(r)
[1] 40 40

For resampling, we’ll create a target raster by copying the original and setting a new resolution:

r2 <- r
terra::res(r2) <- 80 # Assigning a new resolution of 80x80

Although this operation clears the raster values, r2 can still be used as a target for resampling:

r_resampled <- terra::resample(r, r2, method="bilinear")

res(r2) # Verify the new resolution
[1] 80 80
plot(r_resampled, main='Resampled 80x80')

The bilinear method is used for interpolating new values in r_resampled. Alternative methods are nearest or cubic. The choice of interpolation method for resampling raster data is a crucial consideration, discussed in further detail here.

9.4.2 Crop raster

Raster cropping allows you to select a specific area from a larger raster dataset for targeted analysis. Both SpatRaster and SpatExtent objects can be utilized for cropping operations.

To derive an extent from a SpatRaster, the ext function is used:

terra::ext(r)
SpatExtent : 178400, 181600, 329400, 334000 (xmin, xmax, ymin, ymax)

The output format is <xmin, xmax, ymin, ymax>. Knowing this, we can define a cropping extent and apply it using the crop function:

crop_ext <- terra::ext(180000, 180200, 331000, 331250)
subset_r <- terra::crop(r, crop_ext)
plot(subset_r, main='Subset of r')

If the resulting cropped area doesn’t match expectations due to cell alignment, adjust the cropping extent or raster resolution as necessary.

9.4.3 Raster algebra

Terra provides functionality for algebraic operations on rasters, supporting standard arithmetic and logical operators, and functions like abs and round. For a full list, visit the terra algebra documentation.

Here’s how you can perform some simple operations:

# Add a constant value to each cell
s <- r + 10
# Calculate the square root of cell values
s <- sqrt(r)
# Generate a logical raster based on cell value equality
s1 <- s == 15

plot(s1)

Operators can be applied between SpatRaster objects with matching resolutions and origins. The result covers their spatial intersection.

You can also perform in-place replacements:

r[r == 255] <- 2550
plot(r)

Here, all cells with the value 255 are replaced with 2550.

Tip

A broader range of vector and raster operations can be found in Chapter 5 - Geometry Operations, in the Book Geocomputation with R.

As of writing this module, “Geocomputation with R” uses the raster package, which is expected to be superseded by terra.