Lesson 6 Functions
cat()
, print()
) without delving deeply into their mechanics. This lesson aims to demystify functions by guiding you through the creation and understanding of custom functions. We’ll also explore the concepts of global and local variable scopes.
6.1 Defining Functions
Defining a function in R follows a syntax similar to variable assignments or conditional statements. We start with an identifier (e.g., add_one
) on the left of an assignment operator (<-
).
The body of the function follows, beginning with the keyword function, the parameter(s) enclosed in parentheses (e.g., input_value
), and the function’s body within curly braces. The value of the last executed statement is automatically returned:
add_one <- function (input_value) {
output_value <- input_value + 1
return(output_value) # Explicit return
}
## [1] 4
Call this function by specifying its identifier and the necessary parameter(s). For instance, in our example add_one(3)
returns 4
.
6.1.1 Understanding the return()
Function
In R, functions automatically return the value of the last expression evaluated. The decision to use the return()
function explicitly or rely on implicit return is often guided by the specific context of your work, especially in fields like GIS.
Implicit Return:
In R, if return()
is not explicitly used, the function returns the result of the last line of code executed. This is often sufficient, especially in simpler functions. For example:
## [1] 4
Explicit Return:
Using return()
, you can explicitly specify what the function should return. This approach is particularly useful when the function has complex logic, or when you need to return a value before reaching the end of the function. For example:
add_one <- function (input_value) {
result <- input_value + 1
return(result)
}
add_one(3) # Also returns 4
## [1] 4
In both examples, calling add_one(3)
returns 4. The explicit use of return()
in the second example provides clarity about the intended output of the function.
The choice between implicit and explicit return in R functions can depend on various factors, including the complexity of the function and the coding standards of your team or organization. In GIS and similar fields, it’s important to understand both approaches and adapt to the coding practices of your workplace. Always consider the context and ask, “What is the standard here?”
Early Return Example:
Sometimes, it’s necessary to exit a function early, for instance in error handling:
safe_divide <- function (numerator, denominator) {
if (denominator == 0) {
return("Error: Division by zero")
}
numerator / denominator
}
safe_divide(10, 0) # Returns "Error: Division by zero"
## [1] "Error: Division by zero"
So, in GIS and data analysis, adapting to the specific requirements of your project and following established coding standards are key to effective and collaborative work.
6.2 More Parameters
Functions can have multiple parameters, separated by commas.
A function expects as many input values as there are parameters specified in its definition; otherwise, an error is triggered.
The area_rectangle
function includes two parameters (height
and width
), calculates the area by multiplying the inputs, and returns the area as a numeric value:
## [1] 6
Default parameters can enhance a function’s flexibility. For instance, you can modify the area_rectangle
function parameters as (height, width = 3)
. Now, with only one input, the function should return YOUR INPUT * 3
. Specifying both parameters overrides the default width
. Here’s how it looks:
area_rectangle <- function (height, width = 3) {
return(height * width)
}
# Calling with one parameter uses the default width of 3
area_rectangle(3) # Returns 9
## [1] 9
## [1] 6
This approach allows the function to be more flexible and adaptable to different use cases.
6.3 Returning Multiple Values
To return multiple values from a function, encapsulate them in a list. For example, rectangle_metrics
calculates and returns both the area
and perimeter
of a rectangle:
rectangle_metrics <- function (height, width) {
area <- height * width
perimeter <- 2 * (height + width)
return(list(area = area, perimeter = perimeter))
}
Retrieve these values using list indices [[1]]
and [[2]]
, or by simply using their names:
metrics <- rectangle_metrics(3, 2)
# Using list indices
cat("Area (list index): ", metrics[[1]], "\n")
## Area (list index): 6
## Perimeter (list index): 10
## Area (name): 6
## Perimeter (name): 10
6.4 Functions and Control Structures
Functions can contain loops and conditional statements. Here’s a function that uses a loop to calculate the factorial of a number (e.g., factorial of 3 = 1 * 2 * 3 = 6):
factorial <- function (input_value) {
result <- 1
for (i in 1:input_value) {
cat("Current:", result, " | i:", i, "\n")
result <- result * i
}
return(result)
}
factorial(3)
## Current: 1 | i: 1
## Current: 1 | i: 2
## Current: 2 | i: 3
## [1] 6
The function takes a single numeric value as input, defines a variable named result
that is equal to ‘1’ and then creates a loop over all the numbers from 1 (variable result
) to the input_value
. In the loop, the current value of result
is multiplied by the value of the iterator i
.
6.5 Scope
Function parameters are internal variables acting as a bridge between the function and its environment. They are ‘visible’ only within the function’s scope.
The scope of a variable refers to the code region where the variable is accessible.
Understanding scope is crucial for several reasons:
- Avoiding Variable Conflicts: Variables with the same name can exist in different scopes without conflicting with each other. This prevents unexpected behavior caused by variable name overlaps.
- Predictable Behavior: Knowing whether a variable is global or local helps predict its behavior and influence within your script. It clarifies where and how variables can be accessed or modified.
- Resource Management: Local variables are often cleared from memory once the function execution is complete, aiding in efficient resource management in larger scripts or applications.
In R, the scope of variables is defined as follows:
- Global Variables: Defined outside of any function and accessible anywhere in the script, including within functions.
- Local Variables: Defined within a function and accessible only in that function’s scope. This includes function parameters, which act as internal variables bridging the function and its environment.
In the following example, x_value
is a global variable, while new_value
and input_value
are local to the times_x
function. Accessing new_value
or input_value
outside times_x
would result in an error, but x_value
can be referred to within the function:
x_value <- 10
times_x <- function (input_value) {
new_value <- input_value * x_value
return(new_value)
}
times_x(2)
## [1] 20
Understanding these distinctions is essential for writing robust and error-free R code, especially in more complex data analysis or GIS projects.
Referring to external global variables in a function is possible but can be dangerous. At the time of execution, one cannot be sure what the value of the global variable is. For instance, other processes might have changed its value, which affects the behavior of the function. In order to fix this problem, define the variable x_value
as a default attribute of function times_x
.
See solution!
times_x <- function (input_value, x_value = 10) { return(input_value * x_value) }
These lessons have introduced fundamental R programming concepts. The Base R Cheatsheet offers a concise summary of key operations.