Python: Splitting up a Large CSV File by Record Count or Column Value
Working with a two million record file
This is a follow-up to the article I wrote awhile back.
In that article, I wrote about loading a CSV file that is too large to load into Excel. Instead, it was loaded into an SQLite database and queries were run against it to create smaller extracts. I also used Pandas to create excel extracts.
Instead of doing this, other options are to split the file into two or more CSV files, or the file can be split by column value in a DataFrame. I’ve done both as various departments at my workplace want their data split up separately from a master spreadsheet report.
Splitting into Two Files
First, let’s demonstrate splitting up the spreadsheet into two files by using Panda’s DataFrame iloc property. I’ll be using the same publicly available dataset ‘2m Sales Records.csv’ that I…