library(httr2)
library(geojsonsf)
library(sf)
# Define the new API endpoint
full_url <- "https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_feldstuecke_2024_1/items?f=json&bbox=14,48,14.02,48.02"
# Make the request and convert the response to an sf object
invekos <- request(full_url) %>% # Create request
req_perform() %>% # Execute request
resp_body_string() %>% # Extract JSON body as string
geojsonsf::geojson_sf() # Convert JSON string to sf object
9 Spatial Data Manipulation
In the previous lesson, you learned how to access data on a server via a REST API. The Open Geospatial Consortium (OGC) has adapted the REST API paradigm for geospatial applications. A variety of OGC API standards have been implemented to provide and utilize geospatial data on the web. An overview of the current implementation status can be found here.
In this lesson, we will utilize online data resources in workflows that involve data cleaning, spatial queries, and analyses. Additionally, you will learn to carry out basic raster manipulation operations on a sample raster dataset.
9.1 Data Acquisition
We will work with vector data to correctly identify agricultural land parcels in European Union countries. The Austrian Agricultural Agency (AMA) provides access to Austrian agricultural parcels through an OGC Rest API - Feature interface. The OGC API Features standard is the successor to the OGC Web Feature Service (WFS) specification.
The R syntax used to interact with OGC APIs is the same as described in the “Data API” section of Lesson Read and Write Data.
Before loading the data into an R script, examine the API’s contents. Visit the web service’s landing page and proceed to the collection page. There you will find an overview of the available layers.
Enter the following URL into your browser:
https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_feldstuecke_2024_1/items?f=json&limit=10
This request returns the first ten features of the i009501:invekos_feldstuecke_2024_1
layer (agricultural land parcels as polygons) in GeoJSON format.
GeoJSON is a JSON-based standard for representing simple geographic features, along with their non-spatial attributes.
Upon inspecting the GeoJSON, you will find the coordinate vertices of the polygon features and attributes such as fs_flaeche_ha
(parcel area in hectares) and fnar_bezeichnung
(land use).
Use the bbox
parameter to filter resources within a specific area:
https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_feldstuecke_2024_1/items?f=json&bbox=14,48,14.02,48.02
For a visual representation of the bounding box, enter the coordinates 14,48,14.02,48.02
into linestrings.com and click “Display box”.
To execute the request in R, use the following script:
The above code retrieves polygon features within the specified bounding box and stores them in an object named invekos
.
9.2 Data Cleaning
Once the data is loaded into R, we can more closely examine its structure. The dplyr
function glimpse
, for instance, allows us to preview each column’s name and type:
Rows: 100
Columns: 15
$ fart_id <dbl> 1822, 1822, 1822, 1822, 1822, 1822, 1822, 1822, 1822…
$ gml_geom <chr> "[B@71f27860", "[B@f69696e", "[B@3aef6174", "[B@775c…
$ gml_length <dbl> 1086, 883, 786, 1599, 482, 718, 1432, 1094, 884, 116…
$ geom_date_created <chr> "2022-10-01T05:56:50+02:00", "2022-10-01T05:56:50+02…
$ log_pkey <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ fnar_code <chr> "A", "A", "G", "A", "G", "A", "A", "A", "A", "A", "G…
$ gml_identifier <chr> "https://data.inspire.gv.at/0095/33078dd6-05e0-4263-…
$ fs_kennung <dbl> 111714134, 111714135, 111714132, 111793656, 11179365…
$ geo_part_key <dbl> 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, …
$ gml_id <chr> "AT.0095.33078dd6-05e0-4263-a464-f35bd1607ecd.elu.Ex…
$ inspire_id <chr> "https://data.inspire.gv.at/0095/33078dd6-05e0-4263-…
$ fnar_bezeichnung <chr> "ACKERLAND", "ACKERLAND", "GRÜNLAND", "ACKERLAND", "…
$ fs_flaeche_ha <dbl> 1.52789444, 1.76240452, 0.20852359, 7.88961123, 0.02…
$ geo_id <dbl> 12316762, 12316764, 12316768, 12301276, 12301279, 12…
$ geometry <POLYGON [°]> POLYGON ((14.01608 48.00954..., POLYGON ((14…
The abbreviation
dbl
stands fordouble
, a data type that stores numbers with decimal points.
To create a subset of the invekos
sf-object, use the following code:
invekos %>%
head(2) %>%
dplyr::select(fnar_bezeichnung, fs_flaeche_ha, geo_id, geometry) %>%
knitr::kable(., format="html")
fnar_bezeichnung | fs_flaeche_ha | geo_id | geometry |
---|---|---|---|
ACKERLAND | 1.527894 | 12316762 | POLYGON ((14.01608 48.00954... |
ACKERLAND | 1.762404 | 12316764 | POLYGON ((14.01608 48.00931... |
Field names and entries are in German. To rename them, use the base R function colnames()
:
Entries are renamed using dplyr
’s mutate
and case_when
functions:
unique(invekos.sub$land_use) # Unique entries in 'land_use'
[1] "ACKERLAND" "GRÜNLAND"
# Renaming entries 'ACKERLAND' and 'GRÜNLAND'
invekos.sub <- dplyr::mutate(invekos.sub, land_use =
case_when(land_use == 'ACKERLAND' ~ 'arable land',
land_use == 'GRÜNLAND' ~ 'grassland', TRUE ~ 'Other'))
# Display result
invekos.sub %>%
head(2) %>%
knitr::kable(., format="html")
land_use | area_ha | geo_id | geometry |
---|---|---|---|
arable land | 1.527894 | 12316762 | POLYGON ((14.01608 48.00954... |
arable land | 1.762404 | 12316764 | POLYGON ((14.01608 48.00931... |
For an initial visual impression, plot the sf-object using the base R plot()
function:
By default, the plot()
function generates a multi-panel plot, with one sub-plot for each field of the object. However, specifying invekos.sub[1]
restricts the output to a single plot, specifically showcasing the land_use
field. The function parameter key.pos = 1
aligns the legend below the map (1=below, 2=left, 3=above and 4=right). key.width
defines the width of the legend.
Next, we can validate the geometry of sf polygons using st_is_valid
:
sf::st_is_valid(invekos) %>%
summary()
Mode TRUE
logical 100
The st_is_valid
function checks the validity of each geometry, returning a logical vector. The summary confirms that all geometries in our dataset are valid. If any invalid geometries are detected, they can be corrected using the st_make_valid
function.
Let’s closely investigate a single polygon geometry feature that is created by the code below. The validity check flags an invalid geometry. Can you identify the problem?
# Coordinates for the polygon
coords <- matrix(c(-1, -1, 1, -1, 1, 1, 0, -1, -1, -1), ncol = 2, byrow = TRUE)
# List of coordinate matrices
list_of_coords <- list(coords)
# Constructing the polygon
polygon <- st_polygon(list_of_coords)
# Converting to an sf object
error_sf <- st_sf(geometry = st_sfc(polygon))
# Validity check
sf::st_is_valid(error_sf)
[1] FALSE
When plotted, the polygon reveals a sliver polygon, which is invalid due to its shape.
plot(error_sf)
Another common cause of invalid geometries is self-intersection of lines.
Note: A valid polygon requires at least four coordinate points with the first and last points being identical. A polygon with only three points is considered to have an invalid geometry.
9.3 Vector operations
The structure of an sf
object is similar to that of data frames, enabling attribute-based filtering operations using dplyr
functions, as described in Lesson Data Manipulation.
For example, to extract grassland parcels, the filter()
function is used:
invekos.sub %>%
dplyr::filter(land_use=='grassland') %>%
{plot(.[1], main="Land Use", key.pos = 1, key.width = lcm(1.8))}
Note the use of curly brackets in the
plot()
function. This is required because the dot notation (.) cannot directly index a piped value. For a deeper explanation, see this Stack Overflow discussion.
9.3.1 Geometrical operations
The sf
package provides a range of geometric predicates such as st_within
, st_contains
, or st_crosses
. A complete list is available here. While these functions are often used between pairs of simple feature geometry sets, they can also operate on a single sf
object:
sf::st_intersects(invekos.sub, sparse = TRUE)
Sparse geometry binary predicate list of length 100, where the
predicate was `intersects'
first 10 elements:
1: 1
2: 2
3: 3, 68
4: 4, 38, 42
5: 5, 71
6: 6, 58, 64, 78
7: 7, 39
8: 8, 33, 75, 77
9: 9, 12, 15, 66
10: 10
This function returns a sparse matrix revealing intersections within features of the same sf
object invekos.sub
, indicating possible topology errors. For correcting these errors, GIS software like QGIS is recommended as R’s capabilities for handling topology errors are less advanced.
9.3.2 Binary operations
Binary operations are essential for analyzing spatial relationships between different datasets. Here, we demonstrate this by creating an sf
object representing farmsteads using data from the AMA Rest API service:
# Define the new API endpoint for farmsteads
farms_url <- "https://gis.lfrz.gv.at/api/geodata/i009501/ogc/features/v1/collections/i009501:invekos_hofstellen_2024_1/items?f=json&bbox=14,48,14.02,48.02&limit=100"
# Make the request and convert the response to an sf object
farms <- httr2::request(farms_url) %>% # Create request
httr2::req_perform() %>% # Execute request
httr2::resp_body_string() %>% # Extract JSON body as string
geojsonsf::geojson_sf() # Convert JSON string to sf object
# Plot land parcels with farm points overlayed
plot(invekos.sub[1], main = NULL, key.pos = NULL, reset = FALSE, col_graticule = "grey")
plot(farms[1], main = NULL, key.pos = NULL, pch = 7, col = 'red', add = TRUE, cex = 1)
Note: The
plot()
function is used sequentially to overlay farm points on top of the land parcel polygons. For advanced plotting techniques, refer to theplot()
documentation.
We proceed to calculate the distances between farms and land parcels to determine proximity:
dist_m <- sf::st_distance(farms, invekos.sub)
The st_distance
function calculates the shortest distance matrix between the geometries of the two sf
objects, with distances returned in meters.
Distances are computed in meters, which may seem unexpected since the reference system’s units are in degrees. To understand the underlying calculations, examine the coordinate reference system of the sf
objects farms
and invekos.sub
using sf::st_crs()
. This will reveal that objects use a projected coordinate system (EPSG:31287), which allows for metric distance calculations.
In this short exercise, we will take a closer look at the algorithm that is implemented in the function st_distance
.
Open the documentation of st_distance
to find out how metric distances are derived from geographic coordinates.
According to the documentation, metric distances are computed based on the Coordinate Reference System (CRS) of the sf
objects. Since both farms
and invekos.sub
use EPSG:31287 (a projected CRS), the st_distance
function calculates planar (Euclidean) distances in meters. This is possible because projected CRSs preserve distance relationships locally, unlike geographic CRSs which use degrees.
For more detailed information on distance calculations, refer to Algorithms for geodesics, Journal of Geodesy.
When plotting the complete distance matrix dist_m
, we see that column 1 contains the distances between the first feature (parcel 1) in the invekos.sub
sf
object and the 100 farm features of the farms sf object. Accordingly, the matrix has 100 columns (one for every parcel) and 100 rows (one for every farm).
The following line returns the first column of the distance matrix as a vector:
dist_m[, 1]
Units: [m]
[1] 143.1406 813.3666 825.8695 211.4067 1360.1150 964.9589 804.7171
[8] 1025.2758 204.4024 810.4991 898.8973
To identify the farm that is located closest to parcel 1, we need to query the index of the minimum value in this vector:
The which
function, a base R utility, identifies the indices of elements that satisfy a given condition. In the example, it returns the index of the smallest value within the vector dist_m[, 1]
. Consequently, it indicates which farm is the closest to parcel 1.
The demonstrated procedure for detecting closest farms can be executed for every parcel in a for-loop. The number of the closest farm is appended to a vector named closest
. This vector is in turn appended as a new column to the sf
object invekos.sub
. Finally, invekos.sub
is plotted together with farms
.
There are now two approaches to perform this loop:
Approach 1: Dynamic Loop Using ncol(dist_m)
This method dynamically adjusts the loop to the number of parcels by using ncol(dist_m)
. It’s flexible and scalable, making it suitable for datasets where the number of parcels may change.
# Approach 1: Dynamic Loop Using ncol(dist_m)
closest <- c()
for (i in 1:ncol(dist_m)){
out <- which(dist_m[, i] == min(dist_m[, i]))
closest <- c(closest, out)
}
cbind(invekos.sub, closest) %>%
{plot(.[4], main = NULL, key.pos = NULL, reset = FALSE)}
plot(farms[1], main = NULL, key.pos = NULL, pch = 7, col = 'red', add = TRUE, cex = 1)
Approach 2: Fixed Loop Using a Hardcoded Value (e.g., 100)
This method uses a fixed number of iterations (100) based on prior knowledge of the dataset size. It’s simpler but less flexible, as it requires manual adjustment if the number of parcels changes.
# Approach 2: Fixed Loop Using a Hardcoded Value (e.g., 100)
closest <- c()
for (i in 1:100){
out <- which(dist_m[, i] == min(dist_m[, i]))
closest <- c(closest, out)
}
cbind(invekos.sub, closest) %>%
{plot(.[4], main = NULL, key.pos = NULL, reset = FALSE)}
plot(farms[1], main = NULL, key.pos = NULL, pch = 7, col = 'red', add = TRUE, cex = 1)
The output map visualizes the closest farm to each agricultural parcel, highlighting the practical application of sf
Geometrical operations.
This chapter focuses on logical matrix outputs from geometric operations. For operations generating new geometries, such as st_union
, st_buffer
, and st_centroid
, see operations returning a geometry. For network operations on sf
objects, consider using sfnetworks
or the igraph
package.
9.4 Raster operations
Having previously discussed the structure of SpatRaster Objects
in Lesson 4s and the reading and writing of such objects in Lesson 8, we now turn our attention to raster manipulation operations like resampling and cropping.
For the following examples, we will utilize a sample dataset from the
terra
package. You can download the sample data here.
9.4.1 Resampling
Resampling is crucial for working with raster datasets. It alters the spatial resolution, allowing you to align multiple rasters with different resolutions. Additionally, it adjusts the level of detail necessary for your analysis.
Let’s demonstrate resampling with the terra
sample dataset:
library(terra)
r <- terra::rast("data/terra_sample.tif") # path may be different on your machine
plot(r, main='SpatRaster from file')
Before we change the raster resolution of SpatRaster Object
, it is important to know the original resolution of the raster. You can use the res()
function to check the original resolution:
res(r)
[1] 40 40
For resampling, we’ll create a target raster by copying the original and setting a new resolution:
r2 <- r
terra::res(r2) <- 80 # Assigning a new resolution of 80x80
Although this operation clears the raster values, r2
can still be used as a target for resampling:
The bilinear
method is used for interpolating new values in r_resampled
. Alternative methods are nearest
or cubic
. The choice of interpolation method for resampling raster data is a crucial consideration, discussed in further detail here.
9.4.2 Crop raster
Raster cropping allows you to select a specific area from a larger raster dataset for targeted analysis. Both SpatRaster
and SpatExtent
objects can be utilized for cropping operations.
To derive an extent from a SpatRaster
, the ext
function is used:
terra::ext(r)
SpatExtent : 178400, 181600, 329400, 334000 (xmin, xmax, ymin, ymax)
The output format is <xmin, xmax, ymin, ymax>
. Knowing this, we can define a cropping extent and apply it using the crop
function:
crop_ext <- terra::ext(180000, 180200, 331000, 331250)
subset_r <- terra::crop(r, crop_ext)
plot(subset_r, main='Subset of r')
If the resulting cropped area doesn’t match expectations due to cell alignment, adjust the cropping extent or raster resolution as necessary.
9.4.3 Raster algebra
Terra
provides functionality for algebraic operations on rasters, supporting standard arithmetic and logical operators, and functions like abs
and round
. For a full list, visit the terra algebra documentation.
Here’s how you can perform some simple operations:
# Add a constant value to each cell
s <- r + 10
# Calculate the square root of cell values
s <- sqrt(r)
# Generate a logical raster based on cell value equality
s1 <- s == 15
plot(s1)
Operators can be applied between SpatRaster
objects with matching resolutions and origins. The result covers their spatial intersection.
You can also perform in-place replacements:
r[r == 255] <- 2550
plot(r)
Here, all cells with the value 255
are replaced with 2550
.
A broader range of vector and raster operations can be found in Chapter 5 - Geometry Operations, in the Book Geocomputation with R.
As of writing this module, “Geocomputation with R” uses the raster
package, which is expected to be superseded by terra
.