We create data frames using the data.frame() function. This function takes vectors as arguments, and each vector becomes a column in the resulting data frame.
In the following code, we create a data frame called d1 that contains 10 rows and 3 columns.
The first column is the integer sequence from 1 to 10
The second column alternates between -1 and 1
The third column contains the sequence \(\{0.1, 0.2, \ldots, 1.0\}\).
One of the greatest strengths of data frames is their ability to store heterogeneously typed columns. For example, in the following code, we create a data frame called d3 that comprises three uniquely typed columns.
A logical vector containing 10 randomly sampled boolean values.
A character vector containing 10 random selections from the set {“foo”, “bar”}.
A numeric vector containing 10 values randomly sampled from the interval [0, 1].
text integer complex
1 text 8 1+1i
2 text 1 2+2i
3 text 3 3+3i
4 text 2 4+1i
5 text 5 5+2i
6 text 9 1+3i
7 text 10 2+1i
Structure of Data Frames
Although data frames look like two-dimensional matrices, they are actually lists of equal-length vectors. Each column in a data frame corresponds to a slot in the underlying list, and each of these list slots hold some soft of vector that contains the data from the corresponding column. So, all data frames are lists.
d1 <-data.frame(a =sample(c(TRUE, FALSE), 10, replace =TRUE),b =sample(c("foo", "bar"), 10, replace =TRUE),c =runif(10) )d2 <-data.frame(a =1:10, b =c(-1, 1), c =seq(0.1, 1, 0.1))d3 <-data.frame(x =-5:4, y =c(1, 10), z =seq(1, 1.5, length.out =10))# d1, d2, and d3 are all data frame.is.data.frame(d1)
[1] TRUE
is.data.frame(d2)
[1] TRUE
is.data.frame(d3)
[1] TRUE
# But they're also listsis.list(d1)
[1] TRUE
is.list(d2)
[1] TRUE
is.list(d3)
[1] TRUE
All data frames are lists, but not all lists need be data frames.
l1 <-list()l2 <-list(a =sample(c(TRUE, FALSE), 10, replace =TRUE),b =sample(c("foo", "bar"), 10, replace =TRUE),c =runif(10) )l3 <-list(d2, d3)# l1, l2, and l3 are all listsis.list(l1)
[1] TRUE
is.list(l2)
[1] TRUE
is.list(l3)
[1] TRUE
# But they're not data framesis.data.frame(l1)
[1] FALSE
is.data.frame(l2)
[1] FALSE
is.data.frame(l3)
[1] FALSE
As you have probably surmised, data frames are not matrices. Although data frames and matrix share many superficial similarities, they’re almost entirely unrelated types of objects.
is.matrix(d1)
[1] FALSE
is.matrix(d2)
[1] FALSE
is.matrix(d3)
[1] FALSE
Consequently, you cannot apply matrix operations (such as matrix multiplication) to a data frame.
d2 %*%t(d3)
Error in d2 %*% t(d3): requires numeric/complex matrix/vector arguments
If you wanted to analyze a data frame like a matrix, you could type cast the data frame into a matrix using the as.matrix() function.
You should be carefully about any type conversions. If you’re data frame contains mixed column types, R will coerce every column into some compatible type.