How to Merge Multiple *.csv Files in R

Back to Learn R Language

This code is a script written in the R programming language that processes a folder of CSV files and combines the data into a single CSV file. The script starts by importing several libraries, including tidyverse, fs, dplyr, and stringr, which provide various functions for data manipulation and file handling. The user is then prompted to specify the path of the input folder and the output file. The script then reads all of the files in the input folder and combines their data into a single data frame, removing the first 300 rows from each file and adding a column indicating the file’s name. The resulting data frame is then converted from a “long” format to a “wide” format and exported as a CSV file at the specified output location. Finally, the script removes certain characters from the values in the sample name column and exports the resulting data frame as a CSV file.

How to Run the Code

To use the R script provided:

Install the necessary libraries by running the following lines of code: install.packages("tidyverse") install.packages("fs") install.packages("dplyr") install.packages("stringr")
Load the necessary libraries by running the following lines of code: library(tidyverse) library(fs) library(dplyr) library(stringr)
Set the folder_path and output_file variables by replacing the file paths in the following lines of code with the desired file paths: folder_path <- "C:\\Users\\barbi\\Desktop\\Acetone 0" output_file <- "C:\\Users\\barbi\\Desktop\\acetone 0.csv"
Run the entire script by highlighting all the lines of code and pressing “Run” or by typing source("filename.R") in the console, replacing “filename.R” with the name of the file containing the script.
The script will loop through all the files in the specified folder, remove the first 300 rows from each file, and write the remaining data to a new CSV file located at the specified output_file path.
The script will then read the newly created CSV file, rename the columns, and remove certain characters from the values in the “sample_name” column.
The script will then convert the data frame from long to wide format, using the values in the “sample_name” column as the new column names and the values in the “intensity” column as the new cell values.
The script will then export the resulting data frame as a CSV file at the specified output_file path, overwriting the original file.

Note: You may need to adjust the script to fit your specific needs, such as changing the number of rows to be removed or the file paths used.

Overall Code


#MERGE MULTIPLE *.csv FILES IN A FOLDER INTO ONE *.csv and *.xlsx FILE

#Imports the libraries needed to run the script.
library(tidyverse)
library(fs)
library(dplyr)
library(stringr)


#FILL THESE
folder_path <- "C:\\Users\\barbi\\Desktop\\Plasticos"
output_file <- "C:\\Users\\barbi\\Desktop\\plasticos.csv"
#Resolution 4 cm-1 = 300
#Resolution 1 cm-1 = 1156
file_date <- "2023-03-23_"


#Calculates the number of wavelengths to separate samples
#total_wavenumbers <- (upper_bound-lower_bound)/resolution + 1


#Creates one-column table with the path of every file per rows in the specified folder named "all_of_them".
all_of_them <- fs::dir_ls(folder_path)
file_names <- list.files(folder_path)


#Creates empty data frame with two columns: Wave Number and Intensity.
#write.table(data.frame('Wave Number','Intensity'), file = output_file, sep = ",",
#append = TRUE, quote = FALSE,
#col.names = FALSE, row.names = FALSE)


#Loop that reads every file in the folder ands saves the data in my_content.
for (i in seq_along(file_names)) {
  
  my_content <- read_csv(
    file = all_of_them[[i]], show_col_types = FALSE)
  
  #Creates data frame for my_content.
  df <- data.frame(my_content)
  
  #Removes rows 1 to 291 for each *.csv file.
  header_df <- file_names[i]
  new_data <- df[-c(1:291),] 
  new_df <- data.frame(new_data)
  new_df <- new_df%>%dplyr::mutate(sample=header_df)
  
  #Writes table with the data from all the *.csv files and 
  #exports it in the output_file path.
  write.table(new_df, file = output_file, sep = ",",
              append = TRUE, quote = FALSE,
              col.names = FALSE, row.names = FALSE)
}


#Export the created *.csv
created_csv <- read_csv(output_file)


#Creates a data frame for created_csv
collapsed_csv <- data.frame(created_csv)


#Creates columns "wave_number", "intensity", and "sample_name"
colnames(collapsed_csv) <- c("wave_number","intensity","sample_name")


#Removes characters in the sample_name values
collapsed_csv <- collapsed_csv %>% mutate_at("sample_name", str_replace, "Sample", "")
collapsed_csv <- collapsed_csv %>% mutate_at("sample_name", str_replace, file_date, "")
collapsed_csv <- collapsed_csv %>% mutate_at("sample_name", str_replace, ".csv", "")


#Converts the collapsed_csv data frame from long to wide
collapsed_csv <- pivot_wider(collapsed_csv, names_from = sample_name, values_from = intensity) 


#Exports the collapsed_csv data frame in an *.csv file
write.csv(collapsed_csv, output_file, row.names = FALSE)

References:

Wickham, Hadley; Averick, Mara; Bryan, Jennifer; Chang, Wins; D’Agostino McGowan, Lucy; Francois, Romain; Grolemund, Garrett; Hayes, Alex; Hendry, Lionel; Hester, Jim; Kuhn, Max; Lin Pedersen, Thomas; Miller, Evan, Milton Bache, Stephan; Muller, Kirill; O, H. Welcome to the {tidyverse}. The Journal of Open Source Software 2016, 4 (43), 1686. https://doi.org/10.21105/joss.01686.
Hester, Jim; Wickham, Hadley; Csardi, G. Fs: Cross-Platform File System Operations Based on “Libuv.” 2021. https://cran.r-project.org/package=fs.
Wickham, H. Dplyr: A Grammar of Data Manipulator. 2021.
Wickham, H. Stringr: Simple, Consistent Wrappers for Common String Operations. 2019. https://cran.r-project.org/package=stringr.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria 2021. https://www.r-project.org/.

How to Run the Code

Overall Code

References:

Related Posts