Easily Splitting Large Text Files

Steve Ellwood
2 min readNov 24, 2021

This is one of those things that crop up occasionally and you just need a quick way to resolve the issue. In my case I had a 60Gb table in SQL Server. I wanted to synchronise it with another table that held some historical data from the same table. RedGate Data Compare did a sterling job of providing a script to do what I wanted — although it did take a good few hours. The problem I had was that the generated script was 5Gb and none of my editors would open it comfortably — I didn’t want to learn vim or similar just for this one off task. I think EmEditor can do this but I don’t have a licence for that and VS Code/Azure Data Studio/SSMS etc. all gave up and DataGrip just showed it as a load of null characters which is probably some sort of encoding or related issue. The next suggestion was to use an ETL tool such as FME or Jitterbit but I don’t know them well enough to do it quickly.

Thankfully, Git of all things had an answer, more specifically Git Bash shell. If you don’t have this installed it’s part of the Git installation. This has a split command with two options, one to break a file into fixed size chunks and one to break a file down by a set number of lines.

This will split the file into 500Mb chunks

split myHugeCompareFile.sql -b 500m

and this will split it into files of 10000 lines

split myHugeCompareFile.sql -l 10000

The files created are autonamed xaa, xab, xac and so on. In my case it made life easier if they had the .sql extension so I modified my split command and added the parameter

--additional-suffix=.sql

which produced xaa.sql, xab.sql etc. which is what I wanted.

--

--

Steve Ellwood

Senior Integrations Officer at Doncaster Council Any views expressed are entirely my own.