Bash Script
Bash Script to Split a csv File into n Number of Smaller Files
When you work with large CSV files it is sometimes useful to have a quick way to split the CSV file into smaller pieces so that another application/process / people can work on these smaller files in parallel. Here is a nifty bash script to split a CSV file into multiple pieces and retain the same header in all pieces.
#!/bin/bash
full_filename=$1
num_files=$2
split_folder=$3
mkdir -p $split_folder
filename=$(basename -- "$full_filename")
extension="${filename##*.}"
filename="${filename%.*}"
num_lines=`wc -l ${filename}.${extension} | awk '{print $1}'`
echo "$num_lines lines"
# Calculate lines per file
lines_per_file=`awk "BEGIN {print ($num_lines/$num_files)}"`
lines_per_file="${lines_per_file%.*}"
# Round up
lines_per_file=$((lines_per_file + 1))
echo "$lines_per_file lines per file"
echo "$num_lines total lines"
# Get the heading line.
heading=`sed -n '1p' < "$full_filename"`
echo "The following heading will be repeated in all files"
echo $heading
#num=$((num1 + num2))
from_line=2
# Write heading to a test file as well
echo $heading > "$split_folder/test_${filename}.${extension}"
for (( i=1; i<=$num_files; i++ ))
do
split_name="${filename}_${i}.${extension}"
if [ $i -eq $num_files ]
then
to_line=$num_lines
else
to_line=$((from_line + lines_per_file - 1))
fi
# Write heding to file
echo $heading > "$split_folder/$split_name"
# Write the split lines from from_line to to_line
sed -n "${from_line},${to_line}p" < "$full_filename" >> "$split_folder/$split_name"
# Write to a test file to confirm that the splits add up to the original
sed -n "${from_line},${to_line}p" < "$full_filename" >> "$split_folder/test_${filename}.csv"
from_line=$((to_line + 1))
echo $split_name
done
You can save this file as split-CSV.shin your bin folder and split files using
split-csv.sh csv_file N split_folder_name
That will split the csv_file into N pieces and drop them into split_folder_name. It will also create a folder if it does not exist.