Ways to Manage Large Excel Files (through zip, xls, xlsx, xlsb, pivot table caching)
Managing the file size of Excel files can seem a challenge, but there are some fairly simple steps you can take to make it a bit easier.
This post will take you through file management through File Formats, File Formatting and File Content, before finishing with some file sharing tips.
First, a bit of a history lesson…
Once upon a time, you would usually need to zip a file in order to see any notable reduction in file size.
As an example, I’ve got a large data dump, which I’ve saved in xls format, with a file size 31mb
If you then Zip this file you will see a significant file size reduction, around 9mb, so zipping files made a lot of sense (especially as using a proper zip program was also the only way to securely encrypt a file for sharing)
Then, with Office 2007, came the xlsx format. This is a significant change, as Microsoft introduced Office Open XML – an ‘open’ standard that other programs could read and interpret in the same way Excel itself would.
Essentially though, it’s a collection of XML formatted files, held in a file container. And what file container is used? Zip! So, an xlsx file is actually a zip file, and we can see this by renaming our *.xlsx file to *.zip, and then opening the file
Therefore, when we save a file as xlsx, we’re actually saving it as a zip file, and so zipping it further is of little benefit
The same file I had before, saved as XLSX file, is now 10.8mb
Of course, you could use a stronger zip compression (such as zipx format), but given that Microsoft has finally sorted out decent file encryption (with caveats: see my post on File Security here), there’s little benefit to be had zipping a file
Having said that, don’t think that we can’t easily make our file even smaller!
There is another excel format which is often overlooked – *.xlsb – Excel Binary Format
This saves the file in machine-ready binary code, so not only will it save smaller, but will open much faster as well, and there is no reduction in functionality (and is also Macro-Enabled). In theory, files saved in xlsb can’t be opened in other spreadsheet programs, but as I understand it, Microsoft have released the xlsb standards, so it could be implemented in other programs, but have not verified this. I’d stick to this format only if you know it only needs to be opened with Excel
So what does that mean for our file?
Well, our 30mb xls file, is now tiny in comparison
Ok, so you’ve saved your file in the most appropriate file format – but you still want to slim it down?
Pivot Tables / Charts
It’s no secret – pivot tables and charts are great!
But, they’re very data hungry, and by default, all that data is cached within the file.
In my xlsx file, I’ve added three small pivot tables from the three tabs of data in the file.
Suddenly, my file as ballooned from 10.8mb to 16.1mb
This is because it’s caching the data. You can demonstrate this by deleting the original data tabs, then adding additional fields to the pivot table – it still has all that data stored.
What do we do? We can do a few things…
Right click on your Pivot Table, and select PivotTable Options
Select the Data tab, and untick the ‘Save source data with file’ option
This immediately brings my file back down to 10.8mb
You see, the pivot table itself really doesn’t add anything to the file size.
Now, if you’re sharing the pivot table, it might be that you don’t actually need the data behind it anymore, so you could delete those worksheets feeding them giving you a teeny tiny file
End of the day, if you don’t need to send the underlying data, then why keep it? I tend to have the data tabs in a ‘working file’, and then remove them before sharing the outputs.
You could, of course, opt to keep the data cached within the pivot tables, and remove the data tabs. Personally, I wouldn’t recommend it, as you have no way of validating the data in the pivot tables, but it’s an option
Formatting is an interesting one.
If you apply a single set of formats to a spreadsheet, you might see a modest increase in file size.
But if you go crazy and use many different formats, you’ll quickly increase your file size.
Try to think of applying formatting (colours, fonts, number formats etc) as applying a set of instructions.
If you apply the same format to a range – whether it’s a small range or large range, it’s a single instruction so doesn’t make much difference.
If you start giving Excel multiple instructions; that is when you might start seeing a file increase substantially.
Keep formatting simple and consistent to manage file sizes – and it looks better too!
Slightly obvious, but worth mentioning – formulas increase file size!
If I add a column with a simple formula to each of my 3 data pulls (each has 65k row), my file size is now 13.9mb
Paste these as values, and I’m back down again
If you want to keep a formula for use again in the future, why not keep the formula in the first couple of rows, and paste values the rest?
Again it’s fairly obvious, but ensure you don’t have a lot of rows / columns at the end of your file which Excel believes are being used.
Best thing to do is find your last row, highlight the first empty row, [CTRL]+[SHIFT]+[Down Arrow] to the bottom, then delete entire rows.
Repeat with columns to remove any unused columns.
Sharing Large Files
Is your file still too large to email?
If the recipient uses Office 365, as we do, then even if they’re in a different organisation, you can still share files with them. Just save them to one drive, log into the web one drive, setup sharing, and enter their email address.
You can set read only or write access to the file, and they will receive