When we first open R, we see a console. We can type into this console and use it as a calculator. Try.
1 + 1
It adds the two numbers and prints the sum to the console. R can do all sorts of math and arithmetic; try several other operations. (i.e.)
2 - 3; 2 * 3; 4 / 2
You can copy-paste these into the terminal, but I would not recommend that. For most things in this tutorial, try typing them out. It’s like taking notes in class, it takes more effort but it helps to actually process what is happening. And, after-all, we are here to learn and practice, not just get the assignment done.
If you did copy-paste the snippet above, or if you typed it verbatim, you will notice that R ran all three operations independently from the single input. This is because in R-scripting a ;
semi-colon represents a line break so that you can put multiple lines of the script together. This is not a common practice as it does not add functionality, it is more difficult to read when scripts get more dense, and its only purpose is aesthetics. However, this is a good thing to be aware of and keep in-mind as you write your own scripts.
Try running the above snippet without the semi-colons.
2 - 3 2 * 3 4 / 2
## Error: <text>:1:7: unexpected numeric constant
## 1: 2 - 3 2
## ^
Error: unexpected numeric constant in “2 - 3 2”
Often errors in R (and most scripting/programming languages} are not easily interpreted by beginners. This one is fairly simple, “unexpected numeric constant” is referring to the second 2. R reads the line from left to right and after the 3 there is no operation to tell R how to handle the second 2, so it errors.
Everything in R is an object.
You may have heard the term “object-oriented programming”. That’s what R does, explicitly. An object is a representation of the information stored in the computer’s memory. Everything you store in R takes memory, this can become a limit to the ability of R, and we will touch on that as we progress through the tutorial.
“Whitespace”, that is spaces between commands, is generally ignored by R. It can handle any amount of whitespace in the middle of operations:
2 - 3
## [1] -1
is the same as
2-3
## [1] -1
is the same as
2- 3
## [1] -1
Because as it reads from left to right it knows what to expect after the ‘-’ and ignores the whitespace until it finds a numeric.
Where we had
2-3 4
It is looking for an operation after the 3 and finds only whitespace before a new numeric. Because R reads from left to right.
Alternatively, if you give an operation and don’t follow it up with an argument then R will wait for you to complete it:
2 -
If you don’t want to type out the remaining part and just want to complete the line in your script and rerun it you can hit the escape ‘Esc’ button to cancel the command. Escape is the universal ‘STOP’ button in R.
R reads from left to right but it reads the entire line before execution. This allows R to do mathematical order of operations. Try it.
2 + 2 * 3
R adheres to “PEMDAS” Parentheses, Exponents, Multiplication/Division, Addition/Subtraction.
( )
^
*
/
+
-
2 + 2^3
Is evaluate differently than
(2 + 2)^3
Using parentheses can be good practice for ensuring correct order of operations and also making long equations easier to read.
R prints large and small numbers in scientific notation by default
1/10000
## [1] 1e-04
If for any reason you really do not want R to use scientific notation, one way to prevent this for an entire session is to enter into the consoel
options(scipen=999)
In short, this is an option to the session to tell R how many digits (beyond the default 6) to allow before the use of scientific notation. To revert it within the same session use
options(scipen=0)
If you want just a particular command to not be in scientific notation (or to be) use
format(1/10000, scientific=FALSE)
## [1] "0.0001"
or
format(1/10000, scientific=TRUE)
## [1] "1e-04"
Note that this function does not return the number as a numeric (you can tell this because the output is contained in quotation marks) and R does not have intuitive data-type handling, but it can be told how to.
as.numeric(format(1/10000, scientific=FALSE))
## [1] 1e-04
R has many functions. Many are included with R itself (known as base-R), and many more in packages that can be installed. A function is a single expression that takes in arguments and perfroms pre-defined operations returning a single output. We just saw two functions above (format
and as.numeric
), calling these functions produces far more complex operations than seen in that single line.
Another, more simple, example is the function sum()
sum(2,2,4)
performs the operations
2 + 2 + 4
All functions are called with their name with the arguments in parentheses.
Another example is sin()
or log()
sin(1)
Gives us the trigonometric value for sine of 1.
More complex functions can take arguments that give options for the function, as seen before with format()
. sum()
can take option arguments also
sum(2,2,4,NA)
## [1] NA
Returns NA
, but we can tell the function to ignore missing data
sum(2,2,4,NA, na.rm=TRUE)
Sometimes functions need these options to be declared. The function runif()
(random-uniform), is a random number generator. It requires 3 arguments to run.
runif(1,0,10)
If we run it without any arguments
runif()
We get an error.argument "n" is missing, with no default
. What does this mean? Well, every function in R or a package for R has a help page and can be accessed by typing ?function
into the console (or searched online).
?runif
Under Arguments we see that n
is the number of times the function generate a random number. So if we correct the error and give a value to n.
runif(n=2)
It runs without error.
But I said it required 3 arguments.. and it does. Most functions are written so that only a single value is needed to be give, the other arguments are given default values. In the help pages these can be seen under Usage where it says
runif(n, min = 0, max = 1)
By default the function samples values between 1 and 0. We can change this by assigning min
and max
. You will notice when I first called the function that I did not declare those options explicitly. A function takes the arguments by default in the order that they ar assigned in the Usage so when we call
runif(1,0,10)
It knows that we want n = 1
with min = 0
and max = 10
. If we explicitly call the arguments then the order does not matter.
runif(max = 10 , n= 1, min = 0)
Best-practice is to always be as explicit as reasonable so that there is no ambiguity in reproducibility.
Functions can be called inside of other functions, and will be executed from the inner-most outward. We saw this before
as.numeric(format(1/10000, scientific=FALSE))
It formatted the number first, then it made it numeric. Nesting is never necessary, but often very practical, however if not used wisely can lead to some issues.
Another example of nesting
sin(sum(2,3,sum(1,NA,4,na.rm=TRUE)))
## [1] -0.5440211
Hopefully, you can see how this can get messy quickly, and very difficult to follow.. and a headache to debug if it does not work correctly. The more familiar you become with R, the easier long nested functions become to read, but if you are collaborating with someone who is not as well-versed it is just easier for everyone to keep it simple. The best way to do that is with the assignment of variables.
In R a variable is used to represent a value or an object. Like in algebra (probably not by coincidence) x
is a favorite variable for R-beginners. Variables can be assigned in two ways
x <- 5
or
x = 5
The use of <-
used to be the only way to assign variables in R, I do believe it is a remnant of R’s S herritage. More recently the use of =
has been implemented to follow suit with almost every other languages standard syntax. Arguably <-
is still more favored, either by convention and most people learning from people who have been using that their entire carreer, or simply because it’s uniquely R and distinguishes it. They are equivalent, however.
Naming variables is very important. In very few cases is x
actually a useful name. The name should be descriptive of what the variable is representing. The name can be any length in a combination of
Variable names can’t start with a number, underscore, or period.
Sometimes it takes a length variable name to describe what it is representing, it can be cumbersome to type but will help imensely in the scenario when reproducing your code or sharing with someone else.
vector_1_to_5 <- seq(1,5,1)
Is far more informative than x
.
It is also important not to assign variable with the same name as a function. You will very often see users define their dataset as the variable data
. data
is actually a function in R. If you have no need for that function in your session, it has no consequence, but it is something to keep in-mind.
A variable is not assigned a data-type, it inherits the data-type of its assignment, along with all other properties.
If a variable is assigned as the output of a function it take only that output value. i.e.
x <- sum(2,2,4)
Does not take the value sum(2,2,4)
, it takes the value of 8
, here x == 8
.
We can demonstrate this with a random number generator, runif
.
runif(1, 0, 10)
If we run this several times we are likely to get a different number each time. If we assign it to a variable
x <- runif(1, 0, 10)
We see that every time we call x
it has the same value, it is not executing the funtion every time. This can be exploited for dynamic-programming.
An operator is a character that represents an action.
We have seen some arithmatic operators already.
+
-
*
/
^
And there are two others:
%%
%/%
We have also seen assignment operators:
=
<-
<<-
use is outside the scope of the basic tutorial->
rarely, if ever, used->>
use is outside the scope of the basic tutorialLogical operators execute Boolean arguments.
!
Gives the binary opposite.x <- c(FALSE,TRUE,0,1)
!x
&
Checks binary values of each element-pair, returns TRUE if both are TRUE.y <- c(FALSE,TRUE,1,0)
x&y
&&
Checks if all elements are TRUE in both objects.x <- c(TRUE,1)
y <- c(TRUE,0)
x&&y
y <- c(TRUE,1)
x&&y
|
Checks if each element-pair has at least one TRUE value.y <- c(TRUE,0)
x|y
y <- c(TRUE,1)
x|y
||
Checks if either object has at least one TRUE value.x <- c(FALSE,0)
y <- c(FALSE,0)
x||y
y <- c(TRUE,0)
x||y
Relational operators compares two objects with one another.
==
!=
<
<=
>
>=
Operators have a lot of power individually, and even moreso when used together. As a very basic example we can check:
x <- 2; y = 1
x < y || x == y
It checks both conditions and evaluates if either is TRUE.. which is the same as
x <= y
But it demonstrates the point. {r eval=FALSE}l x < y || x == y || x > y #basically, does x exist?
R is an open-source program. This means that anyone is allowed to contribute to it, and most commonly this is done in the form of packages. Packages are are essentially a collection of custom functions. The need to install these packages is because sometimes they can be rather large and there are over 12,500 packages maintained in CRAN. To keep R lightweight these packages are installed and opened in each session on-demand, and only base-R is loaded into each session by default.
To demonstrate this, let’s call a commonly used function, melt()
. There is no need to use the function, so I will not explain its use in this tutorial, just try calling it in the console.
melt()
## Error in melt(): could not find function "melt"
The session has no function assigned to melt()
. Now let’s install the package that defines what melt()
is.
install.packages('reshape2')
When using install.packages()
the name of the package you want to install must be contained in quotations (R treats single and double-quote the same way). You will notice that R wrote a lot of messages to the console. Mostly this is just transparency in what is going on and typically unimportant unless the last thing it says is ERROR
.. then something went wrong. If you are positive that the package will install without an error and you don’t care for all that stuff printed to your console, there’s an option for that!
install.packages('reshape2', quiet = TRUE)
Once the package is installed we then need to load it into our session, which is done with
library(reshape2)
Notice quotations are not needed to use library
, however it will work with or without them.
Now lets try to call our function again.
melt()
## Error in melt(): argument "data" is missing, with no default
We have a new error! Now it is saying we didn’t supply the correct arguments to the function, so the new function from our package is loaded into our session! Every time you restart R, the packages you wish to use need to be library()
ed.
Packages make R super powerful because it’s a great way to share code, and one of the primary rules of coding is to ‘never write something that’s already been written’. I don’t know for sure if that is actually a something that’s taught.. but I think it’s a good philosophy.
Continue: Exercise 2: Data-Structures
Schuyler Smith
Ph.D. Student - Bioinformatics and Computational Biology
Iowa State University. Ames, IA.