Creating Factors

We create factors using the factor() function. This function takes a vector of values (typically numeric or character) where each unique value represents a group and converts this input vector into a factor.

(x <- sample(1:3, 20, TRUE))
 [1] 1 3 2 1 3 1 3 3 2 2 2 3 3 3 2 1 3 3 1 2
(f1 <- factor(x))
 [1] 1 3 2 1 3 1 3 3 2 2 2 3 3 3 2 1 3 3 1 2
Levels: 1 2 3
(y <- rep(c("foo", "bar"), 10))
 [1] "foo" "bar" "foo" "bar" "foo" "bar" "foo" "bar" "foo" "bar" "foo" "bar"
[13] "foo" "bar" "foo" "bar" "foo" "bar" "foo" "bar"
(f2 <- factor(y))
 [1] foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo bar foo
[20] bar
Levels: bar foo

It may not seem like the factor() function did much in the examples above, but that’s far from true. Regardless of what type of input data we use to create a factor, the factor() function will always induce three consistent features in the resulting factor object.

# Factors have their own class
class(x)
[1] "integer"
class(f1)
[1] "factor"
class(y)
[1] "character"
class(f2)
[1] "factor"
# Factors have a levels attribute
attributes(x)
NULL
attributes(f1)
$levels
[1] "1" "2" "3"

$class
[1] "factor"
attributes(y)
NULL
attributes(f2)
$levels
[1] "bar" "foo"

$class
[1] "factor"
# Factors use integer vectors to map each observation to a group
typeof(x)
[1] "integer"
typeof(f1)
[1] "integer"
typeof(y)
[1] "character"
typeof(f2)
[1] "integer"

Factor Levels

The levels name the categories represented by the factor. By default, the factor() function will use the unique values of the input vector to name the levels of the resulting factor.

levels(f1)
[1] "1" "2" "3"
levels(f2)
[1] "bar" "foo"

We can assign our own level names through the labels argument.

(z <- sample(10:11, 10, TRUE))
 [1] 11 11 11 11 10 10 10 11 11 10
(f3 <- factor(z, labels = c("Group 10", "Group 11")))
 [1] Group 11 Group 11 Group 11 Group 11 Group 10 Group 10 Group 10 Group 11
 [9] Group 11 Group 10
Levels: Group 10 Group 11
levels(f3)
[1] "Group 10" "Group 11"
table(z, f3)
    f3
z    Group 10 Group 11
  10        4        0
  11        0        6

We can also modify the levels of an existing factor.

levels(f3) <- c("g10", "g11")
levels(f3)
[1] "g10" "g11"
table(z, f3)
    f3
z    g10 g11
  10   4   0
  11   0   6
Practice

Create a length-20 factor with two levels = {“yes”, “no”}.

As usual, there are myriad acceptable solutions. Here are a couple possibilities.

factor(sample(c("yes", "no"), 20, TRUE))
 [1] no  no  no  no  yes yes yes yes no  yes yes no  no  no  yes yes yes no  yes
[20] yes
Levels: no yes
factor(sample(1:2, 20, TRUE), labels = c("yes", "no"))
 [1] yes no  yes yes yes no  no  no  yes yes no  no  yes no  yes yes no  no  yes
[20] no 
Levels: yes no
Back to top