rstufio-ECO2008

ECO2008 Tutorial 2
Alan Fernihough
2021-02-05
Data
By definition, unofficial economic activity in what is called the black economy escapes being recorded in a
country’s GDP. It is conceivable that official and unofficial employment are substitutes: if one goes up, the
other comes down. Table 1 displays data on unemployment (%) and an estimate of the size of the black
economy (%) for 7 large economies from the year 1999.
Unemployment and the Black Economy
Country Unemployment (%) Black Economy (%)
United Kingdom 9.5 7
United States 6.5 7
Italy 11.5 20
Japan 3 4
France 12 8
Germany 9 9
Spain 23 25
Create Variables as Vectors
Country <- c("United Kingdom","United States","Italy","Japan","France","Germany","Spain") Unemply <- c(9.5,6.5,11.5,3,12,9,23) Blackeco <- c(7,7,20,4,8,9,25) Some Basic Statistics mean(Unemply) ## [1] 10.64286 sd(Unemply) ## [1] 6.256425 1 quantile(Unemply, probs = seq(0, 1, 0.2)) ## 0% 20% 40% 60% 80% 100% ## 3.0 7.0 9.2 10.7 11.9 23.0 summary(Unemply) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.00 7.75 9.50 10.64 11.75 23.00 Creating a Data Frame Data frames are useful structures in which to store data. Here we will create one using the data above. data <- data.frame(Country, Unemply, Blackeco) data ## Country Unemply Blackeco ## 1 United Kingdom 9.5 7 ## 2 United States 6.5 7 ## 3 Italy 11.5 20 ## 4 Japan 3.0 4 ## 5 France 12.0 8 ## 6 Germany 9.0 9 ## 7 Spain 23.0 25 summary(data) ## Country Unemply Blackeco ## Length:7 Min. : 3.00 Min. : 4.00 ## Class :character 1st Qu.: 7.75 1st Qu.: 7.00 ## Mode :character Median : 9.50 Median : 8.00 ## Mean :10.64 Mean :11.43 ## 3rd Qu.:11.75 3rd Qu.:14.50 ## Max. :23.00 Max. :25.00 Indexing the Data Frame Use the dollar sign $ to index, i.e. refer to a specific variable in the data frame. data$Unemply ## [1] 9.5 6.5 11.5 3.0 12.0 9.0 23.0 mean(data$Blackeco) ## [1] 11.42857 2 Create New Variable Let’s say we want to create a new variable that is a transformed version of an existing variable. Let’s get unemployment as a share rather than percentage point. data$UnemplyShare <- data$Unemply/100 data ## Country Unemply Blackeco UnemplyShare ## 1 United Kingdom 9.5 7 0.095 ## 2 United States 6.5 7 0.065 ## 3 Italy 11.5 20 0.115 ## 4 Japan 3.0 4 0.030 ## 5 France 12.0 8 0.120 ## 6 Germany 9.0 9 0.090 ## 7 Spain 23.0 25 0.230 Tables table(data$Country) ## ## France Germany Italy Japan Spain ## 1 1 1 1 1 ## United Kingdom United States ## 1 1 Europe Dummy Variable europe <- c("United Kingdom","Italy","France","Germany","Spain") data$Europe <- ifelse(data$Country %in% europe, 1, 0) summary(data) ## Country Unemply Blackeco UnemplyShare ## Length:7 Min. : 3.00 Min. : 4.00 Min. :0.0300 ## Class :character 1st Qu.: 7.75 1st Qu.: 7.00 1st Qu.:0.0775 ## Mode :character Median : 9.50 Median : 8.00 Median :0.0950 ## Mean :10.64 Mean :11.43 Mean :0.1064 ## 3rd Qu.:11.75 3rd Qu.:14.50 3rd Qu.:0.1175 ## Max. :23.00 Max. :25.00 Max. :0.2300 ## Europe ## Min. :0.0000 ## 1st Qu.:0.5000 ## Median :1.0000 ## Mean :0.7143 ## 3rd Qu.:1.0000 ## Max. :1.0000 3 Simple Scatterplot We are going to use ggplot to make beautiful graphics. library(ggplot2) ggplot(data, aes(x=Unemply, y=Blackeco)) + geom_point() 5 10 15 20 25 5 10 15 20 Unemply Bl ac ke co Let’s add loads of bells and whistles. Look at all of the different options you have here: https://ggplot2-book.org/index.html https://rdpeng.github.io/Biostat776/lecture-the-ggplot2-plotting-system-part-1.html https://tidyverse.github.io/ggplot2-docs/reference/ theme_af <- theme( legend.position = "bottom", panel.background = element_rect(fill = NA), panel.border = element_rect(fill = NA, color = "grey75"), axis.ticks = element_line(color = "grey85"), panel.grid.major = element_line(color = "grey95", size = 0.2), panel.grid.minor = element_line(color = "grey95", size = 0.2), legend.key = element_blank(), text = element_text(size = 14) ) 4 library(ggrepel) ggplot(data, aes(x=Unemply, y=Blackeco, label = Country))+ geom_smooth(method = "lm", se = F, col = "red")+ geom_point(col = "steelblue", alpha = 0.5, size = 4) + xlab("Black Economy (%)")+ ylab("Unemployment (%)")+ geom_text_repel()+ theme_af ## `geom_smooth()` using formula 'y ~ x' United Kingdom United States Italy Japan FranceGermany Spain 5 10 15 20 25 5 10 15 20 Black Economy (%) Un em pl oy m en t (% ) ggsave("blackeconunemploy.png", device = "png", width=6, height=4) ## `geom_smooth()` using formula 'y ~ x' Tasks for Students Let’s look at a dataset containing the album names, release year, and Pitchfork.com album review score for the American band Deerhunter. Deerhunter Albums and Pitchfork Reviews 5 Album Title Year Score Best New Music Fluorescent Grey EP 2007 8.8 Y Cryptograms 2007 8.9 Y Microcastle/Weird Era Cont. 2008 9.2 Y Rainwater Cassette Exchange 2009 7.5 N Halcyon Digest 2010 9.2 Y Itunes Live from Soho 2011 8.2 N Monomania 2013 8.3 Y Fading Frontier 2015 8.4 Y 1. What are the median, sd, max and min of the “Score” variable 2. Create a data.frame consisting of the year, score, and best new music variables. 3. Create a dummy variable indicating if the album was awarded a “Best New Music” tag 4. What is the average score for albums awarded “best new music” compared to the two albums that were not 5. Create a barplot that shows the individual album’s scores (i.e. [https://www.r-graph-gallery.com/218- basic-barplots-with-ggplot2.html] (https://www.r-graph-gallery.com/218-basic-barplots-with-ggplot2. html)). Please see the below for an example. Note the “fill” color is steelblue. 6 7