CDP Data Project

Tidying the CDP Data - Part 1

My university purchased the CDP database for my academic research. These posts document the steps to convert data in the CDP spreadsheets in to a tidy format for data analysis. Hopefully, they will be useful to others starting with very messy data in multiple Excel spreadsheets. Load the R packages # install.packages('pacman') #uncomment if don't already have library(pacman) # package manager p_load(janitor, readxl, tidyverse) Create variables for emissions years Each year of public CDP responses is in a project year spreadsheet.

Tidying the CDP Data - Part 2

This post documents how to tidy CDP data that includes all companies that CDP asked to respond to its questionnaire. Load the packages library(pacman) p_load(dbplyr, janitor, readxl, tidyverse) path_resp is path to spreadsheet sheet_resp each project year 2011 to 2018 is on a sheet named by the project year path_resp <- "~/Dropbox/data/cdp/investor/cdp_response_status_2010_2018.xlsx" path_resp ## [1] "~/Dropbox/data/cdp/investor/cdp_response_status_2010_2018.xlsx" sheet_resp <- 2011:2018 %>% as.character sheet_resp ## [1] "2011" "2012" "2013" "2014" "2015" "2016" "2017" "2018" create clean names for each variable find_col <- function(path_in, sheet_in) { read_excel(path = path_in, sheet = sheet_in) %>% clean_names() } clean_col <- map2(path_resp, sheet_resp, find_col) clean_col ## [[1]] ## # A tibble: 6,557 x 15 ## account_number_… name company_name_co… industry_activi… isin_company ## <chr> <chr> <chr> <chr> <chr> ## 1 44 3i G… 3i Group Banks, Diverse … GB00B1YW4409 ## 2 46 3i I… 3i Infrastructu… Banks, Diverse … JE00B1RJLF86 ## 3 285 3M C… 3M Company Consumer Durabl… US88579Y1010 ## 4 30 888 … 888 Holdings Hotels, Restaur… GI000A0F6407 ## 5 22750 A & … A & J Mucklow G… Real Estate GB0006091408 ## 6 1535 A.

Tidying the CDP Data - Part 3

This post merges summary of public responses saved in df_summary.csv for the emissions years 2010-2017 and all requests saved in df_responses.csv into df_cdp.csv copy to SQL database library(pacman) p_load(tidyverse) read df_summary.csv and df_responses.csv into R df_summary <- read_csv("~/Dropbox/data/cdp/investor/derived-data/df_summary.csv") df_summary ## # A tibble: 14,551 x 6 ## account year company country sector response ## <dbl> <dbl> <chr> <chr> <chr> <chr> ## 1 1800 2010 Bharat Petroleum Corpor… India Energy public ## 2 5300 2010 EDP - Energias de Portu… Portugal Utilities public ## 3 29900 2010 Ernst & Young LLP UK United King… Banks public ## 4 10200 2010 Kizoo AG Germany Information Tec… public ## 5 22300 2010 Kungsleden AB Sweden Banks public ## 6 13400 2010 Nordea Bank Sweden Banks public ## 7 15400 2010 QLogic Corp.

Tidying the CDP Data - Part 4 Scope 1 Emissions

This post shows how to extract responses for CDP 2018 question q6_1 What were your organization’s gross global Scope 1 emissions in metric tons CO2e? The question number CDP years CDP Yr Question CDP Emissions Yr 2018 6.1 2017 2017 8.2 2016 2016 8.2 2015 2015 8.2 2014 2014 8.2 2013 2013 8.2 2012 2012 8.2 2011 2011 8.