R Vectors

All you need to know about working with vectors in R.

Introduction to Vectors

The vector is the basic data object in R, so a thorough understanding of its structure and basic functions will make it easier to work with more complex objects.

What classes of vectors do you know about?

List all the possible classes of vectors you know from R. What kind of information can each class contain?

The attributes of vectors are basically two:

  • class
  • length

Optionally, the elements in the vectors can be named using the function names(). In this case the names must be unique.

Classes of vectors

The basic classes for vectors in R are:

  • logical
  • numeric
    • double
    • integer
  • complex
  • character

Here are a few examples of vectors that belong to different classes.

# Logical vector
c(1:10) >= 7
 [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
# Integer
c(1:10)
 [1]  1  2  3  4  5  6  7  8  9 10
# Numeric
c(1:10)/2
 [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
# Character
letters[1:10]
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
Retrieve and coerce

Check the outputs of following functions applied to different vectors:

  • typeof()
  • mode()
  • class()

Alternatively you can use is.*() methods.

Now apply coercion with the functions with as.*() methods.

Comment vectors

Commenting vectors can be usefull to remember or inform collaborators about their content.

comment(letters) <- "These are all lower case letters."
comment(letters)
[1] "These are all lower case letters."

A complex value is a number that includes an imaginary component.

# Square root of -1
sqrt(-1)
Warning in sqrt(-1): NaNs produced
[1] NaN
# Coerce -1 to complex
sqrt(as.complex(-1))
[1] 0+1i
Euler’s Identity

Euler’s identity is often described as a mathematical beauty. Its formula is expressed as

\(e^{i\pi} = -1\)

It is calculated in R as follows

exp(1i*pi)
[1] -1+0i
# in other words
as.numeric(exp(1i*pi)) + 1 == 0
[1] TRUE

There are a few special types of values. These can be useful, or they can result unexpectedly from various mathematical operations.

  • NULL (null value)
  • NA (not available value)
  • NaN (not a number)
  • Inf/-Inf (infinite value)

Index elements in vectors

To access or extract an element of a vector in R, use square brackets ([]). You can use either integers or logical values as an index. For named vectors, you can also use character values as indices.

# Index using integer
letters[c(1, 5, 20)]
[1] "a" "e" "t"
# Index using logical values
LETTERS[!LETTERS %in% c("A", "B", "M", "Z")] 
 [1] "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "U" "V"
[20] "W" "X" "Y"
# Index using characters
A <- seq_along(letters)
names(A) <- letters
A[c("x", "l", "c")]
 x  l  c 
24 12  3 

You can also use negative integers to exclude elements from a vector.

c(1:5)[-3]
[1] 1 2 4 5

Factors

The factor class is specifically designed to contain categorical variables. Coding categorical variables as factor instead of character allows for more meaningful summaries.

# Character vector
performance <- sample(c("low", "medium", "high"), size = 100,
    replace = TRUE)
class(performance)
[1] "character"
summary(performance)
   Length     Class      Mode 
      100 character character 
# Transform to factor
performance <- as.factor(performance)
summary(performance)
  high    low medium 
    42     29     29 

In the previous case, the levels are sorted according to their alphabetical order, but this may not make sense, so we need to specify the levels (classes) for the categorical variable beforehand.

# back transformation to character
performance <- as.character(performance)

# Factors with preset levels
performance <- factor(performance,
    levels = c("low", "medium", "high"))
summary(performance)
   low medium   high 
    29     29     42 

Ordinal variables are a special case of categorical variables. In this case, levels (classes) represent a hierarchy within a scale, but not in a metric sense. Such variables are represented by the ordered class, which is similar to factor but allows some additional operations.

Further Types

Variables related to the time are represented by the class Date for calendar dates and POSIX* (POSIXct/POSIXlt) for time. They will be dealt with in a separate post (see here).

To complete the R jargon, we need to consider the following elements:

  • A symbol represents a variable (object or vector) to be accessed, and is usually typed into the console without quotes.
  • An expression is a mathematical expression. It can be used to insert mathematical expressions into graphics.
  • A formula is usually written as a symbol (without any quotes) and uses a tilde to separate the left term (response) from the right terms (factors), for example response ~ factor1 + factor2.