What you will learn during this exercise:
A maximum of 3 pages of PDF on BlackBoard within the deadline. Put your last name in the file name. Make sure you have a clear layout. Use only one language - either Norwegian or English in the answer. NB Failure to do so will count negatively in evaluation.
Questions asked in the exercise text may be repeated in the test on BlackBoard. ** You do not need to hand-in the questions**
1- Create a folder on your home directory (go to ‘my computer’ and click into your ‘Home directory (M :)’) which you call, for example, Geog1004 Exercises. You may want to have individual subfolders - one for each exercise such as: M: Geog2015 exercises OV1
2- Download the World_90_13.xlsx file from Exercise 1 into LM1 on the BlackBoard and save it to Exercise 1 folder that you created in a).
3- Open R.
4- Load the World_90_13.xlsx dataset in R (as you learnt during the tutorials)
library(xlsx)
data <- read.xlsx('data/World_90_13.xlsx', sheetIndex = 1)
- Remember the tutorials, what are the main panes in RStudio, what are their functions?
If you need some refreshment watch this video
The data presented is taken from the World Bank’s development indicators web pages and contains information on life expectancy, annual population growth and purchasing power adjusted gross domestic product per capita for 1990 and 2013 respectively.
head(data) # Shows the 5 first rows
## NA. CntryName CntryCode LifeExp90 LifeExp13 PopGrowth90
## 1 1 Afghanistan AFG 48.56705 60.93141 3.879694
## 2 2 Albania ALB 71.95732 77.53724 1.799086
## 3 3 Algeria DZA 66.75117 71.00966 2.559032
## 4 4 American Samoa ASM NA NA 3.137301
## 5 5 Andorra ADO NA NA 3.856125
## 6 6 Angola AGO 41.13834 51.86617 2.796482
## PopGrowth13 GDP_PPP90 GDP_PPP13
## 1 3.1643363 NA 1876.191
## 2 -0.1077295 4303.374 9910.841
## 3 1.9748145 10289.019 13300.682
## 4 0.1357110 NA NA
## 5 -4.3996839 NA NA
## 6 3.3062046 NA NA
You can also visualize the entire dataset by clicking on the dataset name in your Environment
. This will give you the result shown below:
You can also learn about the type of variable (numeric, date, factor …) in your dataset using str()
:
str(data)
## 'data.frame': 219 obs. of 9 variables:
## $ NA. : chr "1" "2" "3" "4" ...
## $ CntryName : chr "Afghanistan " "Albania " "Algeria " "American Samoa " ...
## $ CntryCode : chr "AFG" "ALB" "DZA" "ASM" ...
## $ LifeExp90 : num 48.6 72 66.8 NA NA ...
## $ LifeExp13 : num 60.9 77.5 71 NA NA ...
## $ PopGrowth90: num 3.88 1.8 2.56 3.14 3.86 ...
## $ PopGrowth13: num 3.164 -0.108 1.975 0.136 -4.4 ...
## $ GDP_PPP90 : num NA 4303 10289 NA NA ...
## $ GDP_PPP13 : num 1876 9911 13301 NA NA ...
- How many variables do you have in your dataset? How many rows?
- What is the unit of LifeExp90 and PopGrowth90?
In R, constructing a histogram or other graph is generally very easy. You have 2 options for this:
hist()
ggplot2
which enhance significatively the aesthetic of your graph.Because ggplot2
is more complex we will stick to the base graphics. Nevertheless if you are curious about using ggplot2
to produce high quality graph I recommend you to read this website.
Here, we will draw a histogram of gross national product per capita in 2013.
hist(data$GDP_PPP13)
You can modify the graph to give a more relevant title and axis names:
hist(data$GDP_PPP13,
main = 'Histogram of the GDP per capita in 2013',
xlab = 'GDP per capita in 2013',
ylab = 'Frequency')
You can also add colors, change the size of the bins … I let you discover the possibilities by typing ?hist
.
- Where at the GDP per capita 2013 is Bostwana located? Between the minimum and the first quartile, between the first quartile and the median, between the median and the third quartile or between the third quartile and the maximum. (Hint: you can estimate this by using the function
summary()
)
- Where at the GDP per capita 1990 is Norway located?
- Where at the GDP per capita 2013 is Saudia Arabia located?
- Draw a new histogram with the variable: Population growth 2013 (annual%).
- By drawing a histogram for the other variables, can you see if there is which look normally distributed? Which one?
In step 4, we have given a static representation of the distribution for the variables gross national product and population growth for 2013.
We now want to look at the changes between 1990 and 2013 in the individual countries for these two variables. We also want to do the same for the variable life expectancy. We must therefore construct three new variables representing these changes.
To add the new variable in the dataset you can do as follow:
# Create a new column "changes_GDP" representing the difference between GDP in 2013 an GDP in 1990.
data$changes_GDP <- data$GDP_PPP13 - data$GDP_PPP90
We can check that the new variables has been created by for instance typing str(data)
str(data)
## 'data.frame': 219 obs. of 10 variables:
## $ NA. : chr "1" "2" "3" "4" ...
## $ CntryName : chr "Afghanistan " "Albania " "Algeria " "American Samoa " ...
## $ CntryCode : chr "AFG" "ALB" "DZA" "ASM" ...
## $ LifeExp90 : num 48.6 72 66.8 NA NA ...
## $ LifeExp13 : num 60.9 77.5 71 NA NA ...
## $ PopGrowth90: num 3.88 1.8 2.56 3.14 3.86 ...
## $ PopGrowth13: num 3.164 -0.108 1.975 0.136 -4.4 ...
## $ GDP_PPP90 : num NA 4303 10289 NA NA ...
## $ GDP_PPP13 : num 1876 9911 13301 NA NA ...
## $ changes_GDP: num NA 5607 3012 NA NA ...
- Add variables representing the changes in population growth and life expectancy.
Note 1: The variable you have constructed is not change in population, but change in population growth. A country that had population growth in 1990 and also had population growth, but somewhat lower, in 2013 will have a negative value for the change variable. Although the country is experiencing growth in both years, growth has slowed. A country that has received a positive value means that population growth in 2013 was greater than population growth in 1990.
Note 2: Note: The gross domestic product variables are adjusted in two ways. First, one it adjusted with regard to purchasing power. With a 1 dollar spent in India you can buy much more than with a 1 dollar spent in the United States (Check wikipedia). Secundly, the variable has been adjusted to dollar value as it was in 2011 (since 1 dollar in 1990 is different than a dollar in 2013).
Note 3: If one of the variables used to calculate the change variable is missing value for a country, the change variable will also be missing value for that country. Both variables used in the calculation must have value for the change variable to get value.
Create a histograms for your three new change variables as described in point 4. Study the data table and frequency histograms you have created and discuss with a fellow student:
- Which country had the least change in population growth?
*Tip: you can use which(max(data$...))
and which(min(data$...))
. This will give you the row number of the maximum and minimum of your variable. You can then search for the corresponding country.
- Has population growth increased or decreased for most countries between 1990 and 2013?
- Has life expectancy increased or decreased for most countries between 1990 and 2013?
- What is the trend in population growth overall?
Create three histograms for the three change variables. Label the axis with relevant title and add colors to the histogram.
You should end up with 3 histograms.
To save a graph you will need to select Plots
on the lower right pane. Then select Export > Save as image
.
Note: If you use Norwegian in the exercise, remember to change all text in the figures to Norwegian.
As a hand in exercise 1 you must deliver:
Provide brief, descriptive character texts for each chart. Put them under the figure.
Remember to be brief! Make sure that figure text and figure are on the same page. Remember that the answer must be a maximum of three pages long. Remember to convert the Word file to .pdf after you finish. To convert the file to PDF format, go to File and Save as Adobe PDF.
Questions | Level_1 | Level_2 | Level_3 |
---|---|---|---|
Histograms | 1 points: The hand-in only has 1 histogram | 2 points: The hand-in only has 2 histogram | 3 points: The hand-in has 3 histograms |
Figure caption | 1 points: The text describing the charts is not explicit enough | 2 points: The text describing the charts is explicit enough | 3 points: The text describing the charts is explicit enough & each histogram has a different color |
The submission must contain:
Now you are ready to submit your exercise: Upload the (max) 3-page PDF file on BlackBoard as a response to this exercise.
REMEMBER to include your last name in the file name and use only one language in the answer.