version: May 20, 2020

DOI GOES HERE


This is the analysis pipeline for FDEP Roving Diver Disease Surveys conducted throughout the Northern Florida Reef Tract


All analyses performed with R version 3.5.0

Basic setup of R environment


Loading required packages

For the following analyses we will require the use of a number of different R packages. Most of which can be sourced from CRAN, but some may be downloaded from GitHub. We can use the following code to load in the packages and install any packages not previously installed in the R console.

if (!require("pacman")) install.packages("pacman") 
pacman::p_load(ggplot2, stringr, gridExtra, ggpubr, Rmisc, FSA, rcompanion, RColorBrewer, dplyr, vegan, nparcomp, RVAideMemoire, MANOVA.RM, pairwiseAdonis, PMCMR, PMCMRplus, patchwork, plyr, gdata)
pacman::p_load_gh("pmartinezarbizu/pairwiseAdonis/pairwiseAdonis")



Loading data


Loading the data set into R. All analyses are conducted from a master spreadsheet dating back to November 2017. We are using the function read.xls from the package gdata to read the excel file straight into R. Note: this master spreadsheet only has 1 sheet.

This data set contains a count of “TMTC” or “too many to count” so we are adding a line of na.strings = "TMTC" which will take that string and treat it as an NA. This is so we can manipulate that column as integers. na.strings can be expanded to include other qualifiers if necessary. We included the code class(rd$TotalHealthy) to ensure that the column containing the “TMTC” input is being read as an integer and that the na.strings = "TMTC" worked. We are calling it rd for “roving diver” which is the type of surveys these are.

rd <- read.xls("../data/RovingDiverSurveys.xlsx", head = T, na.strings ="TMTC")
class(rd$Healthy)
## [1] "integer"
head(rd)
##         Date   DiverName SiteCode   TimeStart     TimeEnd Depth SurveyArea
## 1 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 2 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 3 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 4 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 5 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
## 6 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
##   SpeciesCode UK DS BB RB YB SCTLD WP WS P PB BL Healthy   Notes
## 1        MCAV  0  0  0  0  0     2  0  0 0  0  0       2  Dead 2
## 2        PCLI  0  0  0  0  0     9  0  0 0  0  0       5 Dead 26
## 3        ISIN  0  0  0  0  0     0  0  0 0  0  0       1        
## 4        PAST  0  0  0  0  0     0  0  0 0  0 14      13        
## 5        PCLI  0  0  0  0  0     2  0  0 0  0  0       4  Dead 2
## 6        SBOU  0  0  0  0  0     0  0  0 0  0  2       0  Dead 1

Data manipulation


One of the purposes of this pipeline is to manipulate the data set a number of ways in order to achieve specific outputs for reports. We are going to mainly do this using the package dplyr. We are doing these manipulations and calculations in R to prevent manipulation of the master database and to ensure continuity and consistency throughout this longterm monitoring project.

Our first step is to format the dates they can be read by R. Reading the data in from the excel spreadsheet, R recognizes the Date column as a Date format (if you read your data in as a csv file this won’t happen). The data is read as YYY/MM/DD, but often it is helpful to parse things out by Month or by Year so we are adding a column using a M-Y format. This same code can be changed to include a YYYY format as well.

rd$MonthYear = format(as.Date(as.Date(rd$Date), format = "%y/%m/%d"), "%m-%y")
rd$Date = as.Date(rd$Date)
class(rd$Date)
## [1] "Date"
head(rd)
##         Date   DiverName SiteCode   TimeStart     TimeEnd Depth SurveyArea
## 1 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 2 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 3 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 4 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 5 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
## 6 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
##   SpeciesCode UK DS BB RB YB SCTLD WP WS P PB BL Healthy   Notes MonthYear
## 1        MCAV  0  0  0  0  0     2  0  0 0  0  0       2  Dead 2     11-17
## 2        PCLI  0  0  0  0  0     9  0  0 0  0  0       5 Dead 26     11-17
## 3        ISIN  0  0  0  0  0     0  0  0 0  0  0       1             11-17
## 4        PAST  0  0  0  0  0     0  0  0 0  0 14      13             11-17
## 5        PCLI  0  0  0  0  0     2  0  0 0  0  0       4  Dead 2     11-17
## 6        SBOU  0  0  0  0  0     0  0  0 0  0  2       0  Dead 1     11-17

We see that we have MonthYear at the end of the dataset and Date is now in a YYYY-MM-DD format.



We also want to add a few more columns like County, Location, Total Observations, and Prevalence. We will start with adding County and Location.

Our dataset includes surveys across three counties, at four different locations, but they are entered into the data set at the individual site level (SiteCode). It is often helpful, especially when reporting, to report at the County level or even by location. SiteCode is the individual sites where surveys occur (ex: SEFL05, T328, BC1, etc), Location is the general area where operations occur (ex: Jupiter, Lauderdale-by-the-Sea), and County is the county where operations occur (ex: Martin County, Palm Beach County, and Broward County).

rd$County = if_else(rd$SiteCode %in% c("SEFL01", "SEFL02", "SEFL 03", "SLR South", "SLR Central", "SLR Ledge", "SLR North"), "Martin", 
                       if_else(rd$SiteCode %in% c("T328", "BC1", "FTL4"), "Broward", "Palm Beach"))

rd$Location= if_else(rd$SiteCode %in% c("SEFL01", "SEFL02", "SEFL03", "SLR South", "SLR Central", "SLR Ledge", "SLR North"), "SLR",
                       if_else(rd$SiteCode %in% c("T328", "BC1", "FTL4"), "PMP",
                       if_else(rd$SiteCode %in% c("SEFL04", "SEFL05", "SEFL06"), "JUP",
                       if_else(rd$SiteCode %in% c("SEFL08", "SEFL09", "SEFL10", "SEFL11", "SEFL12"), "WPB", ""))))
head(rd)
##         Date   DiverName SiteCode   TimeStart     TimeEnd Depth SurveyArea
## 1 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 2 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 3 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 4 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 5 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
## 6 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
##   SpeciesCode UK DS BB RB YB SCTLD WP WS P PB BL Healthy   Notes MonthYear
## 1        MCAV  0  0  0  0  0     2  0  0 0  0  0       2  Dead 2     11-17
## 2        PCLI  0  0  0  0  0     9  0  0 0  0  0       5 Dead 26     11-17
## 3        ISIN  0  0  0  0  0     0  0  0 0  0  0       1             11-17
## 4        PAST  0  0  0  0  0     0  0  0 0  0 14      13             11-17
## 5        PCLI  0  0  0  0  0     2  0  0 0  0  0       4  Dead 2     11-17
## 6        SBOU  0  0  0  0  0     0  0  0 0  0  2       0  Dead 1     11-17
##   County Location
## 1 Martin      SLR
## 2 Martin      SLR
## 3 Martin      SLR
## 4 Martin      SLR
## 5 Martin      SLR
## 6 Martin      SLR

We can see that the County and Location columns have been added to the end of the data set!



Now we are going to tabulate the all of the disease observations and healthy observations into a TotalObservations column. This data set has 11 different keys for “disease” from unknown (UK) to black band disease (BB), to stony coral tissue loss disease (SCTLD), as well as paling (P) and bleaching (BL). We are combinging all of those, along with the Healthy column to have a TotalObservations column.

rd$TotalObservations <- rd$UK + rd$DS + rd$BB + rd$RB + rd$YB + rd$SCTLD + rd$WP + rd$WS + rd$P + rd$PB + rd$BL + rd$Healthy
head(rd)
##         Date   DiverName SiteCode   TimeStart     TimeEnd Depth SurveyArea
## 1 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 2 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 3 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 4 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 5 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
## 6 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
##   SpeciesCode UK DS BB RB YB SCTLD WP WS P PB BL Healthy   Notes MonthYear
## 1        MCAV  0  0  0  0  0     2  0  0 0  0  0       2  Dead 2     11-17
## 2        PCLI  0  0  0  0  0     9  0  0 0  0  0       5 Dead 26     11-17
## 3        ISIN  0  0  0  0  0     0  0  0 0  0  0       1             11-17
## 4        PAST  0  0  0  0  0     0  0  0 0  0 14      13             11-17
## 5        PCLI  0  0  0  0  0     2  0  0 0  0  0       4  Dead 2     11-17
## 6        SBOU  0  0  0  0  0     0  0  0 0  0  2       0  Dead 1     11-17
##   County Location TotalObservations
## 1 Martin      SLR                 4
## 2 Martin      SLR                14
## 3 Martin      SLR                 1
## 4 Martin      SLR                27
## 5 Martin      SLR                 6
## 6 Martin      SLR                 2

Subsetting by Date


For the purposes of reporting, you might want to block certain dates out, and only report on certain dates. Here, as an example, we are subsetting everything before July 3, 2019. This code can be modified to capture any time period you may need. We are calling it rd.early because it encapsulates the early days of our surveys. I am also subsetting for only our longterm monitoring sites.

rd.early <- filter(rd, Date < "2019-07-03") %>%  
  subset(SiteCode %in% c("SEFL01", "SEFL02", "SEFL04", "SEFL05", "SEFL06", "SEFL08", "SEFL11", "SEFL12", "SLR North", "SLR South", "SLR Central", "SLR Ledge", "BC1", "T328", "FTL4")) 

head(rd.early)
##         Date   DiverName SiteCode   TimeStart     TimeEnd Depth SurveyArea
## 1 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 2 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 3 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 4 2017-11-09 Joshua Voss   SEFL01 14:40:00.00 15:12:00.00    15        700
## 5 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
## 6 2017-11-09 Joshua Voss   SEFL02 13:29:00.00 14:05:00.00    20        600
##   SpeciesCode UK DS BB RB YB SCTLD WP WS P PB BL Healthy   Notes MonthYear
## 1        MCAV  0  0  0  0  0     2  0  0 0  0  0       2  Dead 2     11-17
## 2        PCLI  0  0  0  0  0     9  0  0 0  0  0       5 Dead 26     11-17
## 3        ISIN  0  0  0  0  0     0  0  0 0  0  0       1             11-17
## 4        PAST  0  0  0  0  0     0  0  0 0  0 14      13             11-17
## 5        PCLI  0  0  0  0  0     2  0  0 0  0  0       4  Dead 2     11-17
## 6        SBOU  0  0  0  0  0     0  0  0 0  0  2       0  Dead 1     11-17
##   County Location TotalObservations
## 1 Martin      SLR                 4
## 2 Martin      SLR                14
## 3 Martin      SLR                 1
## 4 Martin      SLR                27
## 5 Martin      SLR                 6
## 6 Martin      SLR                 2
tail(rd.early)
##           Date   DiverName SiteCode   TimeStart     TimeEnd Depth SurveyArea
## 573 2019-06-21 Ryan Eckert      BC1 09:55:00.00 10:15:00.00    NA       1500
## 574 2019-06-21 Ryan Eckert      BC1 09:55:00.00 10:15:00.00    NA       1500
## 575 2019-06-21 Ryan Eckert      BC1 09:55:00.00 10:15:00.00    NA       1500
## 576 2019-06-21 Ryan Eckert      BC1 09:55:00.00 10:15:00.00    NA       1500
## 577 2019-06-21 Ryan Eckert      BC1 09:55:00.00 10:15:00.00    NA       1500
## 578 2019-06-21 Ryan Eckert      BC1 09:55:00.00 10:15:00.00    NA       1500
##     SpeciesCode UK DS BB RB YB SCTLD WP WS P PB BL Healthy Notes MonthYear
## 573        PAST  0  0  0  0  0     0  0  0 0  0  0      23           06-19
## 574        SSID  0  0  0  0  0     0  0  0 0  0  0      15           06-19
## 575        OFAV  0  0  0  0  0     0  0  0 0  0  0       3           06-19
## 576        SINT  0  0  0  0  0     0  0  0 0  0  0       3           06-19
## 577        ACER  0  0  0  0  0     0  0  0 0  0  0       2           06-19
## 578        OANN  0  0  0  0  0     0  0  0 0  0  0       1           06-19
##      County Location TotalObservations
## 573 Broward      PMP                23
## 574 Broward      PMP                15
## 575 Broward      PMP                 3
## 576 Broward      PMP                 3
## 577 Broward      PMP                 2
## 578 Broward      PMP                 1

Checking to see that we have our earliest date head(rd.early) and our cut-off date tail(rd.early) to confirm the subset worked.

Tabulating Data


Currently the data set has very high resolution, to species level, which may not be necessary for certain reports. Now, using dplyr we will tabulate the data into a more concise form. In this example we are assuming that we need to report on data from the time frame we subsetted earlier (November 2017 - June 2019), and the report only needs to know prevalence data by County. Sometimes packages mask other packages so I am specifying that I want to use the function select from the package dplyr by using the following: dplyr::select.

countyData <- rd.early %>% dplyr::select(c(SiteCode, County, MonthYear, SpeciesCode, SCTLD, TotalObservations)) %>% group_by(SiteCode,MonthYear,County) %>% summarise_if(is.integer, sum)
countyData
## # A tibble: 60 x 5
## # Groups:   SiteCode, MonthYear [60]
##    SiteCode MonthYear County  SCTLD TotalObservations
##    <fct>    <chr>     <chr>   <int>             <int>
##  1 BC1      03-19     Broward     6               154
##  2 BC1      06-19     Broward    16               307
##  3 BC1      11-18     Broward    26               236
##  4 BC1      12-18     Broward    23               315
##  5 FTL4     03-19     Broward     6                95
##  6 FTL4     05-19     Broward     7               128
##  7 FTL4     06-19     Broward     5               118
##  8 FTL4     11-18     Broward     9                65
##  9 SEFL01   11-17     Martin     11                46
## 10 SEFL02   11-17     Martin      3                10
## # … with 50 more rows

To summarize: I am selecting (dplyr::select()) from the data the columns of SiteCode, County, MonthYear, SpeciesCode, TLD, and TotalObservations. grouping them (group_by()) by SiteCode, MonthYear, and County; and adding (summarise_if(is.integer, sum)) all integers found in that subset (in this case TLD and TotalObservations).



# Calculating Prevalence Data * We want to calculate SCTLD disease prevalence at our monitoring sites. To do this, we will calculate prevalence by diving our SCTLD observations (TLD) from our total number of observations (TotalObservations). The syntax used here is very similar to the way we calculated TotalObservations**.

countyData$Prevalence = countyData$SCTLD/countyData$TotalObservations
head(countyData)
## # A tibble: 6 x 6
## # Groups:   SiteCode, MonthYear [6]
##   SiteCode MonthYear County  SCTLD TotalObservations Prevalence
##   <fct>    <chr>     <chr>   <int>             <int>      <dbl>
## 1 BC1      03-19     Broward     6               154     0.0390
## 2 BC1      06-19     Broward    16               307     0.0521
## 3 BC1      11-18     Broward    26               236     0.110 
## 4 BC1      12-18     Broward    23               315     0.0730
## 5 FTL4     03-19     Broward     6                95     0.0632
## 6 FTL4     05-19     Broward     7               128     0.0547

You can see that we added a Prevalence column to the data set!

Running PERMANOVA in R


Now our report needs some analyses. Here we want to see if our prevalence data varies between counties and through time. We will be using a PERMANOVA in R to assess this. We will use the adonis() function in vegan.

set.seed(999)
adonis(formula = Prevalence ~ County*MonthYear, data = countyData, method = "euclidian")
## 
## Call:
## adonis(formula = Prevalence ~ County * MonthYear, data = countyData,      method = "euclidian") 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##                  Df SumsOfSqs  MeanSqs F.Model      R2 Pr(>F)    
## County            2   0.15666 0.078328 12.2394 0.28648  0.001 ***
## MonthYear        12   0.10287 0.008572  1.3395 0.18811  0.249    
## County:MonthYear  4   0.02493 0.006233  0.9739 0.04559  0.443    
## Residuals        41   0.26239 0.006400         0.47982           
## Total            59   0.54684                  1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

County is significantly different, but Time is not.



Now lets conduct a pairwise comparison to determine which county (or counties) is driving the variation. We will be using pairwise.adonis() from pairwiseAdonis.

set.seed(999)
pairwise.adonis(countyData[c(6)],factors=countyData$County,
                             sim.method='euclidian',p.adjust.m='bonferroni', perm = 999)
##                   pairs Df   SumsOfSqs    F.Model          R2 p.value
## 1     Broward vs Martin  1 0.001822233  0.1137796 0.004922589   0.708
## 2 Broward vs Palm Beach  1 0.085256428 53.6098535 0.538198297   0.001
## 3  Martin vs Palm Beach  1 0.111946987 14.8666035 0.248328828   0.002
##   p.adjusted sig
## 1      1.000    
## 2      0.003   *
## 3      0.006   *

We see that Palm Beach County is significantly different from both Martin County and Broward County. Now its time to visualize this data!



.

Data Visualization


We will now make a number of plots to visualize this prevalence data. The first is plotting disease prevalence at each site over time, grouped by each county. But first we need to subset by County. Just a note, SEFL01 and SEFL02 are actually SLR Central and SLR South, respectively. For plotting purposes I am renaming them. I am also arranging the factor levels chronologically.

martin <- subset(countyData, County == "Martin")

class(martin$MonthYear)
## [1] "character"
martin$MonthYear <- as.factor(martin$MonthYear)
levels(martin$MonthYear)
## [1] "03-19" "04-19" "06-19" "11-17"
martin$MonthYear = factor(martin$MonthYear, levels(martin$MonthYear)[c(4, 1, 2, 3)])
levels(martin$MonthYear)
## [1] "11-17" "03-19" "04-19" "06-19"
levels(martin$SiteCode)[levels(martin$SiteCode) == "SEFL01"] <- "SLR Central"
levels(martin$SiteCode)[levels(martin$SiteCode) == "SEFL02"] <- "SLR South"
levels(martin$SiteCode)
##  [1] "BC1"         "FTL4"        "SLR Central" "SLR South"   "SEFL03"     
##  [6] "SEFL04"      "SEFL05"      "SEFL06"      "SEFL08"      "SEFL09"     
## [11] "SEFL10"      "SEFL11"      "SEFL12"      "SEFL13"      "SEFL14"     
## [16] "SEFL15"      "SEFL16"      "SEFL17"      "SEFL18"      "SEFL19"     
## [21] "SEFL19*"     "SEFL20"      "SEFL21"      "SEFL22"      "SLR Ledge"  
## [26] "SLR North"   "T328"
palm <- subset(countyData, County == "Palm Beach")
class(palm$MonthYear)
## [1] "character"
palm$MonthYear <- as.factor(palm$MonthYear)
levels(palm$MonthYear)
##  [1] "02-19" "03-19" "04-18" "05-18" "05-19" "06-18" "06-19" "08-18" "12-17"
## [10] "12-18"
palm$MonthYear = factor(palm$MonthYear, levels(palm$MonthYear)[c(9, 3, 4, 6, 8, 10, 1, 2, 5, 7)])
levels(palm$MonthYear)
##  [1] "12-17" "04-18" "05-18" "06-18" "08-18" "12-18" "02-19" "03-19" "05-19"
## [10] "06-19"
broward <- subset(countyData, County == "Broward")
class(broward$MonthYear)
## [1] "character"
broward$MonthYear <- as.factor(broward$MonthYear)
levels(broward$MonthYear)
## [1] "03-19" "05-19" "06-19" "11-18" "12-18"
broward$MonthYear = factor(broward$MonthYear, levels(broward$MonthYear)[c(4,5,1,2,3)])
levels(broward$MonthYear)
## [1] "11-18" "12-18" "03-19" "05-19" "06-19"



Plot for Martin County

martin.prev.plot1 <- 
  ggplot(martin, aes(x=MonthYear, y=Prevalence, color = SiteCode, group = SiteCode, label=SiteCode))+
  geom_line(size = 1)+
  geom_point(size = 2)+
  labs(y = "Prevalence")+
    annotate("label", x = 2.45, y = 0.45, label = paste ("Martin County"), size = 5)+
  scale_x_discrete(limits = c("11-17","12-17","01-18", "04-18", "05-18", "06-18", "08-18", "11-18", "12-18", "02-19", "03-19", "04-19", "05-19", "06-19"), labels = c("Nov 17","Dec 17","Jan 18", "Apr 18", "May 18", "Jun 18", "Aug 18", "Nov 18", "Dec 18", "Feb 19", "Mar 19", "Apr 19", "May 19", "Jun 19"))+
  labs(color = "Site")+
  ylim(0,0.5)

# takes the legend and saves it as a separate object (grob)
get_legend<-function(martin.prev.plot1){
  tmp <- ggplot_gtable(ggplot_build(martin.prev.plot1))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  martin.legend <- tmp$grobs[[leg]]
  return(martin.legend)
}
martin.legend=get_legend(martin.prev.plot1)



martin.prev.plot <- martin.prev.plot1 + 
  theme(panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "white"), 
        panel.background = element_rect(fill = '#F5F5F5'),
        plot.title = element_text(hjust = 0.5), 
        axis.line = element_line(colour = "black"), 
        axis.ticks = element_line(color="black"), 
        text = element_text(size=15, color="black"), 
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_text(size=12, color="black")
        )


martin.prev.plot 



Plot for Palm Beach County

palm.prev.plot1 <- 
  ggplot(palm, aes(x=MonthYear, y=Prevalence, color = SiteCode, group = SiteCode, label = SiteCode))+
  geom_line()+
  geom_point()+
  annotate("label", x = 2.45, y = 0.45, label = paste ("Palm Beach County"), size = 5)+
  scale_x_discrete(limits = c("11-17","12-17","01-18", "04-18", "05-18", "06-18", "08-18", "11-18", "12-18", "02-19", "03-19", "04-19", "05-19", "06-19"), labels = c("Nov 17","Dec 17","Jan 18", "Apr 18", "May 18", "Jun 18", "Aug 18", "Nov 18", "Dec 18", "Feb 19", "Mar 19", "Apr 19", "May 19", "Jun 19"))+
    labs(color = "Site")+
  ylim(0,0.5)
  

# takes the legend and saves it as a separate object (grob)
legend<-function(palm.prev.plot1){
  tmp <- ggplot_gtable(ggplot_build(palm.prev.plot1))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  martin.legend <- tmp$grobs[[leg]]
  return(palm.legend)
}
palm.legend=get_legend(palm.prev.plot1)



palm.prev.plot <- palm.prev.plot1+
  theme(panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "white"),
        panel.background = element_rect(fill = '#F5F5F5'),
        plot.title = element_text(hjust = 0.5),
        axis.line = element_line(colour = "black"),
        axis.ticks = element_line(color="black"),
        text = element_text(size=15, color="black"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.text.y=element_text(size=12, color="black")
        )

palm.prev.plot



Plot for Broward County

broward.prev.plot1 <- 
  ggplot(broward, aes(x=MonthYear, y=Prevalence, color = SiteCode, group = SiteCode, label = SiteCode))+
  geom_line()+
  geom_point()+
  scale_x_discrete(limits = c("11-17","12-17","01-18", "04-18", "05-18", "06-18", "08-18", "11-18", "12-18", "02-19", "03-19", "04-19", "05-19", "06-19"), labels = c("Nov 17","Dec 17","Jan 18", "Apr 18", "May 18", "Jun 18", "Aug 18", "Nov 18", "Dec 18", "Feb 19", "Mar 19", "Apr 19", "May 19", "Jun 19"))+
  ylim(0,0.5)+
  annotate("label", x = 2.45, y = 0.45, label = paste ("Broward County"), size = 5) +
  labs(y = "Prevalence", x = "Month", color = "Site")

 

legend<-function(broward.prev.plot1){
  tmp <- ggplot_gtable(ggplot_build(broward.prev.plot1))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  martin.legend <- tmp$grobs[[leg]]
  return(broward.legend)
}
broward.legend=get_legend(broward.prev.plot1)


broward.prev.plot <- broward.prev.plot1+
  theme(panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "white"),
        panel.background = element_rect(fill = '#F5F5F5'),
        plot.title = element_text(hjust = 0.5),
        axis.line = element_line(colour = "black"),
        axis.ticks = element_line(color="black"),
        text = element_text(size=15, color="black"), 
        axis.text.x=element_text(angle = 45, hjust = 1,size=12, color="black"),
        axis.text.y=element_text(size=12, color="black")
        )

broward.prev.plot



You might have noticed that only the Broward County plot had labels and titles along the x-axis. That is because we are going to stack the plots using patchwork and saving it in our figures folder.

stacked.plot <- (martin.prev.plot / palm.prev.plot / broward.prev.plot) 

ggsave("../figures/site.prevalence.png", plot= stacked.plot, width=12, height=8, units="in", dpi=600)

stacked.plot