Python: Splitting up a Large CSV File by Record Count or Column Value

Working with a two million record file

Rick McBride
3 min readApr 29, 2022

This is a follow-up to the article I wrote awhile back.

In that article, I wrote about loading a CSV file that is too large to load into Excel. Instead, it was loaded into an SQLite database and queries were run against it to create smaller extracts. I also used Pandas to create excel extracts.

Instead of doing this, other options are to split the file into two or more CSV files, or the file can be split by column value in a DataFrame. I’ve done both as various departments at my workplace want their data split up separately from a master spreadsheet report.

Splitting into Two Files

First, let’s demonstrate splitting up the spreadsheet into two files by using Panda’s DataFrame iloc property. I’ll be using the same publicly available dataset ‘2m Sales Records.csv’ that I…

--

--

Rick McBride
Rick McBride

Written by Rick McBride

I am a news junkie, writer, software developer, artist and dog dad. I have a wide variety of interests and have written for various media sites.

No responses yet