R Programming Language.
82 lectures 68:52:55
Introduction to R Programming
R:
- R is a programming language
- Free software
- Statistical computing, graphical representation and reporting.
- Designed by: Ross Ihaka, Robert Gentleman, Developed at University of Aukland
- Derived from S and S-plus language (commercial product)
- Typing discipline: Dynamic
- Stable release: 3.5.2 ("Eggshell Igloo") / December 20, 2018; 58 days ago
- First appeared: August 1993; 25 years ago
- License: GNU GPL
- Functional based language
- Interpreted programming language
- Distributed by CRAN (Comprehensive R Archive Network)
- Open source product (R-Community)
- Functions are available as a package
- Default packages are already attached to the R-console eg base, utils, stats, graphics etc
- Attach the package to the R-application
- Install Add-on packages from CRAN Mirrors.
Write a program to print HELLO WORLD in C language:#include<stdio.h>
#include<conio.h>
void main()
{
printf("HELLO WORLD");
getch();
}
Write a program to print HELLO WORLD in Java:class Hello
{
public static void main(String args[])
{
System.out.println("HELLO WORLD");
}
}
Write a program to print HELLO WORLD in R:print("HELLO WORLD")
NOTE: R programming language is very simple to learn when compare to traditional programming languages (C, C++, C#, Java).
R Installation & Setting R Environment
How to Download & Install R:
- Once goto official website of R i.e., "R" in Google and click on first link (The R Project for Statistical Computing).
- Click on "Download R".
- Click on any one of the CRAN Mirror. Eg: https//cloud.r-project.org
- Click on Download R for Windows.
- Click on Install R for the first time.
- Finally click on Download R 3.5.1 for Windows (32/64 bit).
Setting R Environment:
- R come with a lot of packages.
- By default only some packages will be attached to the R environment.
search()
Displays the currently attached packages
installed.packages()
Displays the installed packages in the machine
library(package name) / require(package name)
Attaches the packages to the R application
install.packages("package name")
Installs the add-on packages from CRAN
detach(package:package name)
Detaches the packages from the R environmentPackage - Help
- library(help="package name")
Function - Help
- help(function name)
- or
- ?function name
Variables, Operators & Data types
Structures
Comments in R:
==============
--> Single comment is written using # in the beginning of the statement.
# Comments are like helping text in your R Program
--> Multi-line comments is written using if()
if(FALSE) {
"We put such comments inside, either
single or double quote" }
Variable Assignmet:
===================
1. print()
2. cat()print():
-------
--> print() function is used to print the value stored in variable
Ex:
a <- 10
print(a)cat():
-----
--> cat() function is used to combines multiples items into a continuous print output.
Ex:
a <- "DataHills"
cat("Welcome to ", a)
Datatype of a Variable:
=======================
1. typeof()
2. class()
3. mode()1. typeof(var_name/value)
-------------------------
--> typeof determines the (R internal) type or storage mode of any object
Ex:
typeof(a)
typeof(10)2.class(var_name/value)
-----------------------
--> R possesses a simple generic function mechanism which can be used for an object-oriented style of programming.
--> Method dispatch takes place based on the class of the first argument to the generic function.
Ex:
class(a)
class(10)3. mode(var_name/value)
-----------------------
--> Get or set the type or storage mode of an object.
Ex:
mode(a)
mode(10)
Displaying & Deleting Variables in R:
=====================================
1. ls()
2. rm()1. ls():
--------
--> ls() function is used to display all the variables currently availabe in the R environment.
Ex:
ls()--> ls() function is also used to display patterns to match the variables names by using pattern.
Ex:
# Display the variables starting with the pattern "a"
ls(pattern="a")--> ls() function is also used to display hidden variables i.e, the variable starting with dot(.) by using all.names=TRUE.
Ex: Display the variables which are hidden
ls(all.names=TRUE)--> rm() function is used to delete the variable.
Ex:
rm(a)--> rm() function is also used to delete all the variables by using rm() and ls() function together.
Ex: Remove all the variables at a time
rm(list=ls())
Structures/Objects in R:
========================
1. Vectors
2. Lists
3. Matrices
4. Data Frames
5. Arrays
6. Factors
Vectors
Vectors:
========
--> Single dimensional object with homogenous data types.
--> To create a vector use fucntion c()
--> Here "c" means combine
# if i try like this
a <- 10,20,30,40
it gives an error.
# then combine all these values by using c()
a <- c(10,20,30,40)# to check the internal storage of a
typeof(a)
# to check the internal storage of each value in a
lapply(a,FUN=typeof)
sapply(a,FUN=typeof)
or
lapply(a,typeof) # list of values
sapply(a,typeof) # vector of values--> Vectors are the most basic R structures/objects
--> The types of atomic vectors are in
1. logical
2. integer
3. double
4. complex
5. characterVector Creation:
================
--> We can create vectors with single element and multiple elements.
--> They are
1. Single Element Vector
2. Multiple Elements VectorSingle Element Vector:
======================
--> When we assign a single value into variable, it becomes a vector of length 1 and belongs to one of the above vector types.
Ex:
a <- 10
b <- 20L
c <- "DataHills"
d <- TRUE
e <- 2+3iMultiple Elements Vector:
=========================
--> When we assign multiple value into a variable, it becomes a vector of length n
and belongs to one of the above vector types.
Ex:
a <- c(10,20,30,40,50)
b <- c(20L,40L,60L,80L)
c <- c("Srinivas","DataHills","DataScience","MachineLearning")
d <- c(T,FALSE,TRUE,F,T,F)
e <- c(2+3i,4+4i,5+6i)# Heterogeneous data type values are converted into homogeneous data type values:
a <- c(10,20,30,40,"DataHills")
Output:
"10" "20" "30" "40" "DataHills"
# The double and character values are converted into characters.Observer with some examples:-
a <- c(10L,20)
a <- c(T,5)
a <- c(2+3i,"DataHills")
a <- c(9L,30,4+5i)Here data types having some priority, based on that they are converting.
i.e, Lower data types to higher data types
1. CHARACTER
2. COMPLEX
3. DOUBLE
4. INTEGER
5. LOGICALa <- c(TRUE,30,20L,2+3i,"DataHills")
a <- c(TRUE,30,20L,2+3i)
a <- c(TRUE,30,20L)
a <- c(TRUE,20L)To generate a sequence of numeric values
<Start_Value>:<End_Value>
1:10
10:1
3.5:10.5
10.5:3.5
# by using seq() function
Syntax: seq(from=VALUE,to=VALUE,by=VALUE)
Ex: seq(from=1,to=10,by=1)
seq(to=10,by=1,from=1)
seq(by=10,to=100,from=10)
seq(1,10,by=2)
seq(from=1,10,2)
seq(1,to=10,2)
seq(1,10,1)
seq(2,20,2)
seq(10,1,1) # Error
seq(10,1,-1)
seq(1,10,pi)
seq(10)
seq(-10)
seq(1:10)
Vector Manipulation & Sub-Setting
# length.out --> desired length of the sequence,
'length.out' must be a non-negative number.
seq_len is much faster.
seq(length.out=10)
seq_len(10)
seq(1,10,length.out=10)
seq(1,10,length.out=5)
seq(1,10,length.out=11)
# along.with --> take the length from the length of this argument,
it generates the integer sequence 1,2,....
seq_along is much faster.
seq(along.with=10)
seq(along.with=c(20,30,40))
seq(along.with=c("Data",T,2,3,4))
seq(along.with=c("Data",T,2,3,,4,5,6,7,8,9,10))
a <- seq(along.with=c("Data",T,2,3,,4,5,6,7,8,9,04))
seq(along.with=a)
seq_along(a)
Vector Manipulation:
====================
a <- c(4,7,9,12,8,3)
b <- c(2,3,5,7,8,5)
length(a)
length(b)
add <- a+b
sub <- a-b
mul <- a*b
div <- a/b# if we apply arithmetic operators to two vectors of unequal length, then the elements of the shorter vector are recycled to complete the operators.
a <- c(4,7,9,12,8,3)
b <- c(2,3)
add <- a+b
sub <- a-b
mul <- a*b
div <- a/b# Elements in a vector can be sorted using the sort() function.
a <- c(9,3,5,8,1,6,5)
sort <- sort(a)
rev_sort <- sort(a,decreasing=T)a <- c("Srinivas","DataHills","Analysis","MachineLearning")
sort <- sort(a)
rev_sort <- sort(a,decreasing=TRUE)Sub-setting the Data in Vectors:
================================
--> Extracting the required fields, rows from the R object.
vector[position/logical index/negative index/name]
---------------------------------------------------------------
a <- c("DataScience","DataAnalysis","MachineLearning","R","Python","Weka")# Accessing vector elements using position
# Here [ ] brackets are used for indexing.
# Indexing starts with position 1.
a[3]
a[2,4] # Error
a[c(2,4)]
a[c(1,4,5)]
course <- a[c(1,4,5)]# Accessing vector elements using negative indexing
a[-6]
a[-3,-5] # Error
a[c(-3,-5)]
a[-c(3,5)]
a[-c(4,5,6)]
course <- a[-c(4,5,6)]# Accessing vector elements using logical indexing
a[c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE)]
a[c(T,T,T,F,F,F)]
a[T]
a[F]
a[c(T,F)]
a[c(F,T)]# Accessing vector elements using name
a <- c(a="DataScience",b="DataAnalysis",c="MachineLearning",d="R",e="Python",f="Weka")
a[2]
a[b] # Error
a["b"]
a["d","e"] # Error
a[c("d","e")]
a[c("-d","-e")] # Error
a[c(-"d",-"e")] # Error
a[-c("d","e")] # Error
Constants
Constants:
==========
R has a small number of built-in constants.
The following constants are available:1. LETTERS: the 26 upper-case letters of the Roman alphabet;
2. letters: the 26 lower-case letters of the Roman alphabet;
3. month.abb: the three-letter abbreviations for the English month names;
4. month.name: the English names for the months of the year;
5. pi: the ratio of the circumference of a circle to its diameter.> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
> month.name
[1] "January" "February" "March" "April" "May" "June"
[7] "July" "August" "September" "October" "November" "December"
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> pi
[1] 3.141593But it is not good to rely on these, as they are implemented as variables whose values can be changed.
> pi
[1] 3.141593
seq(1,10,pi)
> pi <- 10
> pi
[1] 10
seq(1,10,pi)LETTERS[24]
LETTERS[2,3,4,5] # Error
LETTERS[c(2,3,4,5)]
LETTERS[seq(2,5,1)]
LETTERS[2:5]
LETTERS[c(10,11,12,13,14,15,16,17,18,19,20)]
LETTERS[10:20]
LETTERS[-10:-20]
LETTERS[-c(10:20)]a <- c(10,20,30,40,50,60)
names(a)
names(a) <- c("A","B","C","D","E")
b <- c(70,80,90,100,110,120)
names(b)
names(b) <- LETTERS[21:26]
sales_1 <- c(100,200,300)
names(sales_1) <- c("Jan","Feb","Mar")
names(sales_1) <- month.abb # Error
names(sales_1) <- month.abb[1:3]
names(sales_1) <- month.abb[10:12]
names(sales_1) <- month.abb[c(1,5,10)]
names(sales_1) <- month.abb[seq(1,12,4)]
sales_2 <- c(100,200,300,400,150,250,350,450,120,220,320,420)
names(sales_2) <- month.abb
names(sales_2) <- month.name
RStudio Installation & Lists Part 1
RStudio:
========
--> RStudio is a free and open-source integrated development environment (IDE) for R.
--> RStudio requires R 3.0.1+. If you don't already have R, download it.
--> RStudio makes R easier to use.
--> It includes a code editor, debugging & visualization tools.
--> RStudio is a separate piece of software that works with R to make R much more user friendly and also adds some helpful features.
--> RStudio was founded by JJ Allaire.
--> RStudio is written in the C++ programming language.
--> Initial release: 28 February 2011 - 7 years ago
--> Stable release: 1.1.456 / 19 July 2018 - 52 days ago
Downloading & Installation RStudio:
===================================
--> Goto official website of RStudio i.e., Click on RStudio Download
--> Click on RStudio Desktop Open Source License (FREE) Download
--> Click on RStudio 1.1.456 - Windows Vista/7/8/10 (85.8 MB Size)
--> Automatically file will be downloaded in our system
--> Installation is easy, it takes less than 2 min to install.
Lists:
======
--> Single dimensional object with hetrogeneous data types.
--> To create a list use function list().# Create a list containing character, complex, double, integer and logical.
a <- list("DataHills",2+3i,10,20L,TRUE)# to check the internal storage of a
typeof(a)
# to check the internal storage of each value in a
lapply(a,FUN=typeof)
sapply(a,FUN=typeof)
or
lapply(a,typeof) # list of values
sapply(a,typeof) # vector of values
Lists Part 2
--> Lists are the R objects which contain elements of different types like
Characters
Complex
Double
Integer
Logical
Vector
Matrix
Function and
another list inside it.# Create a list containing vectors
a <- list(c(1,2,3),c("A","B","C"),c("R","Python","Weka"),c(10000,8000,6000))
print(a)
typeof(a)
lapply(a,typeof)# Create a list containing characters, vector, double
b <- list("DataHills","Srinivas",c(10,20,30),15.5)
print(b)
typeof(b)
lapply(b,typeof)# Create a list containing a vector, matrix, fucntion and list.
c <- list(c(10,20,30),matrix(c(1,2,3,4),nrow=2),search(),list("DataHills",9292005440))
print(c)
typeof(c)
lapply(c,typeof)
Naming List Elements:
=====================
--> The list elements can be given names and they can be accessed using these names.
b <- list(Name1="DataHills",Name2="Srinivas",vector_values=c(10,20,30),single_value=15 ),nrow=2),search(),list("DataHills",9292005440))
names(c) <- c("values","mat","fun","inner_list")
List Manipulation, Sub-Setting & Merging
List to Vector & Matrix Part 1
Matrix Part 2
matrix(c(1,2,3,4,5,6,7,8,9,10), nrow=5)
matrix(1:10, nrow=5)
# Elements are arranged by row
matrix(1:10, nrow=5, byrow=TRUE)
# Elements are arranged by column
matrix(1:10, nrow=5, byrow=FALSE)
matrix(1:10, ncol=5, byrow=T)
# Create a matrix with row names and column names
matrix(1:10, ncol=5, byrow=TRUE, dimnames=list(c("A","B"),c("C","D","E","F","G")))
matrix(1:10, ncol=5, byrow=TRUE, dimnames=list(LETTERS[1:2],LETTERS[3:7]))# To check or define or update or delete the names of rows and columns,
we have to use the functions
rownames(var_name)
colnames(var_names)
a <- matrix(1:10, ncol=5, byrow=TRUE, dimnames=list(LETTERS[1:2],LETTERS[3:7]))
rownames(a)
colnames(a)
rownames(a) <- c("row1","row2")
colnames(a) <- c("col1","col2","col3","col4","col5")
rownames(a)
colnames(a)
print(a)
a <- matrix(1:10, ncol=5, byrow=TRUE)
rownames(a) <-...