All you need to know about working with vectors in R.
Introduction to Vectors
The vector is the basic data object in R, so a thorough understanding of its structure and basic functions will make it easier to work with more complex objects.
What classes of vectors do you know about?
List all the possible classes of vectors you know from R. What kind of information can each class contain?
The attributes of vectors are basically two:
class
length
Optionally, the elements in the vectors can be named using the function names(). In this case the names must be unique.
Classes of vectors
The basic classes for vectors in R are:
logical
numeric
double
integer
complex
character
Here are a few examples of vectors that belong to different classes.
Check the outputs of following functions applied to different vectors:
typeof()
mode()
class()
Alternatively you can use is.*() methods.
Now apply coercion with the functions with as.*() methods.
Comment vectors
Commenting vectors can be usefull to remember or inform collaborators about their content.
comment(letters) <-"These are all lower case letters."comment(letters)
[1] "These are all lower case letters."
A complex value is a number that includes an imaginary component.
# Square root of -1sqrt(-1)
Warning in sqrt(-1): NaNs produced
[1] NaN
# Coerce -1 to complexsqrt(as.complex(-1))
[1] 0+1i
Euler’s Identity
Euler’s identity is often described as a mathematical beauty. Its formula is expressed as
\(e^{i\pi} = -1\)
It is calculated in R as follows
exp(1i*pi)
[1] -1+0i
# in other wordsas.numeric(exp(1i*pi)) +1==0
[1] TRUE
There are a few special types of values. These can be useful, or they can result unexpectedly from various mathematical operations.
NULL (null value)
NA (not available value)
NaN (not a number)
Inf/-Inf (infinite value)
Index elements in vectors
To access or extract an element of a vector in R, use square brackets ([]). You can use either integers or logical values as an index. For named vectors, you can also use character values as indices.
# Index using integerletters[c(1, 5, 20)]
[1] "a" "e" "t"
# Index using logical valuesLETTERS[!LETTERS %in%c("A", "B", "M", "Z")]
# Index using charactersA <-seq_along(letters)names(A) <- lettersA[c("x", "l", "c")]
x l c
24 12 3
You can also use negative integers to exclude elements from a vector.
c(1:5)[-3]
[1] 1 2 4 5
Factors
The factor class is specifically designed to contain categorical variables. Coding categorical variables as factor instead of character allows for more meaningful summaries.
# Character vectorperformance <-sample(c("low", "medium", "high"), size =100,replace =TRUE)class(performance)
[1] "character"
summary(performance)
Length Class Mode
100 character character
# Transform to factorperformance <-as.factor(performance)summary(performance)
high low medium
42 29 29
In the previous case, the levels are sorted according to their alphabetical order, but this may not make sense, so we need to specify the levels (classes) for the categorical variable beforehand.
# back transformation to characterperformance <-as.character(performance)# Factors with preset levelsperformance <-factor(performance,levels =c("low", "medium", "high"))summary(performance)
low medium high
29 29 42
Ordinal variables are a special case of categorical variables. In this case, levels (classes) represent a hierarchy within a scale, but not in a metric sense. Such variables are represented by the ordered class, which is similar to factor but allows some additional operations.
Further Types
Variables related to the time are represented by the class Date for calendar dates and POSIX* (POSIXct/POSIXlt) for time. They will be dealt with in a separate post (see here).
To complete the R jargon, we need to consider the following elements:
A symbol represents a variable (object or vector) to be accessed, and is usually typed into the console without quotes.
An expression is a mathematical expression. It can be used to insert mathematical expressions into graphics.
A formula is usually written as a symbol (without any quotes) and uses a tilde to separate the left term (response) from the right terms (factors), for example response ~ factor1 + factor2.