mean
Basics
mean
is a function that calculates the average of a vector of values.
You will often find yourself using the na.rm
argument, short for NA value removal. Most real-life data will contain missing values somewhere, and na.rm = TRUE
will automatically remove those values from consideration during a function call or computation. na.rm = FALSE
is the default, so make sure to include na.rm = TRUE
if you’re unsure of your data’s composition.
As mentioned here, |
Examples
How do I get the average of the values in a vector when some of the values are: NA
, NaN
? What happens if I want to include those values?
Click to see solution
First, we show the implication of not including na.rm = TRUE
:
mean(c(1,2,3,NaN))
[1] NaN
That’s obviously not what we want. We would only ever want na.rm = F
if we were checking for null values being present in the data.
Now, the rest of the examples, executed properly:
mean(c(1,2,3,NaN), na.rm=TRUE)
[1] 2
mean(c(1,2,3,NA), na.rm=TRUE)
[1] 2
mean(c(1,2,NA,NaN,4), na.rm=TRUE)
[1] 2.333333