Donate to Remove ads

Got a credit card? use our Credit Card & Finance Calculators

Thanks to Wasron,jfgw,Rhyd6,eyeball08,Wondergirly, for Donating to support the site

Big disparity between "Size" and "Size on Disk"

Seek assistance with all types of tech. - computer, phone, TV, heating controls etc.
Julian
Lemon Quarter
Posts: 1389
Joined: November 4th, 2016, 9:58 am
Has thanked: 534 times
Been thanked: 677 times

Big disparity between "Size" and "Size on Disk"

#173800

Postby Julian » October 15th, 2018, 9:15 am

My new PC setup continues. I set up Microsoft's File History backup process last night to back upmy new PC's data drive to my new Linix (Raspbian) based NAS. File History finished making its initial backup copy overnight and I just looked at the dataset size as a sanity check to see if it went OK. Usually when I right click on a file or folder in Windows, either NTFS or ReFS, I see only minor disparities between what is displayed in the Properties/General tab for "Size" ("Size") and "Size on disk" ("SoD"). In this case however I see a big disparity.

Looking at the the MS File History save directory on the NAS from my Windows 10 PC if I right-click to get the properties I see a Size of 32.7 GB and a SoD of 59.9 GB. For comparison, if I look at the size of my local D: data drive (by selecting all the top-level directories in D:\ and pulling up the right-click properties pane) I see a Size of 31.2 GB and a SoD of 31.3 GB which is the sort of Size/SoD disparity that I would expect due to block sizes. (BTW - I'm not expecting the Size figures on my NAS and PC folders to be exactly the same because File History is also set to copy just a few extra folders from my PC's C: drive as well as all of my D: drive.)

My Windows D: drive is a ReFS file system (actually a Storage Spaces two-way-mirrored Storage Pool) with cluster size 4K and my target NAS file system is a mdadm RAID1 array formatted as Ext4 with 4K block size and advertised over my local wired LAN as a Samba share. If I do a 'du -sh' on the Linux system it returns "33G".

This isn't really an issue for me, I have tons of space on my NAS drive and doing the 'du' on the Linux system seems to indicate that the SoD value displayed in Windows is simply wrong but I'm curious and would like to know if anyone has an explanation. I'm guessing it's just some bug/glitch/weirdness in the Linux Samba/Win10 implementation and I'll probably never look at this info again so can easily ignore it but this is a brand new PC build so now is a good time to try and understand any anomalies in my new setup just to understand what's what with my new setup.

Any ideas?

P.S. - A big thanks to the site owners here. I've been doing what I always do and keeping a step-by-step log book of all the installation procedures done to set up my new PC, settings changes, what third party software is installed and from where, etc. (I think the logbook thing is a legacy of the summers I spent as a teenager doing part time computer operator jobs at big ICL and Honeywell mainframe sites.) I keep the logbook in a note in Evernote and it's been driving me mad that in Evernote every time I type "D:" it changes it to :smile: and a bit of googling seems to indicate that the behaviour can't be turned off. Thank goodness that doesn't happen here because it is seriously annoying! (The workaround is to type D<space>: and then come back later and delete the spaces so it's obviously just an input-time thing.)

- Julian

gryffron
Lemon Quarter
Posts: 3640
Joined: November 4th, 2016, 10:00 am
Has thanked: 557 times
Been thanked: 1616 times

Re: Big disparity between "Size" and "Size on Disk"

#173810

Postby gryffron » October 15th, 2018, 9:50 am

For each individual file, Size-on-Disk is always rounded up to a complete number of clusters. In your case, a multiple of 4k. So even if you store a 1 byte file, it will still occupy 4k. If you store a 5k file, it will occupy 8k, the lowest multiple it can fit in. A 1,000,001 byte file would occupy 1,004,000[*]. The maximum possible discrepancy between size and size-on-disk is always 4k(-1), regardless of the size of the file.

This means very big files, like movies and photos, result in a small discrepancy between size and size-on-disk. A 1Mbyte photo couldn't be wrong by more than 0.4%. A 100Mbyte movie couldn't be wrong by more than 0.004%.
Very small files can have huge discrepancies. A 1 byte file would be out by 400,000%

I don't use file history, so I don't know how it works. But I suspect your archive of file history contains a huge number of tiny files. One for each incremental change? Hence the huge discrepancy between size and size-on-disk. Whereas your data disk has a small number of big files. Look at C:\Windows if you can. That also has lots of small system files so I'd expect it to have a large discrepancy. Or, you can count the number of files anywhere by searching for *.*.

Gryff

[*] For pedants: Actually it wouldn't because 4k is really 4,096 bytes, not 4,000. Same for all the calculations here. I've left all the examples as 4,000 as a more readable illustration.

mc2fool
Lemon Half
Posts: 7893
Joined: November 4th, 2016, 11:24 am
Has thanked: 7 times
Been thanked: 3051 times

Re: Big disparity between "Size" and "Size on Disk"

#173833

Postby mc2fool » October 15th, 2018, 11:06 am

Julian wrote:This isn't really an issue for me, I have tons of space on my NAS drive and doing the 'du' on the Linux system seems to indicate that the SoD value displayed in Windows is simply wrong but I'm curious and would like to know if anyone has an explanation. I'm guessing it's just some bug/glitch/weirdness in the Linux Samba/Win10 implementation and I'll probably never look at this info again so can easily ignore it but this is a brand new PC build so now is a good time to try and understand any anomalies in my new setup just to understand what's what with my new setup.

Any ideas?

Samba implementations usually misreport actual size-on-disk for, I believe, performance reasons. Don't remember the details of why but if you google around I'm sure you'll find the answer.

My router reports the size on disk of files on the SMB served USB flash drive I have plugged into it rounded up to the next 1MB and my NAS drive does so up to the next 128MB! However, in both cases the actual cluster size is 4K.

Julian
Lemon Quarter
Posts: 1389
Joined: November 4th, 2016, 9:58 am
Has thanked: 534 times
Been thanked: 677 times

Re: Big disparity between "Size" and "Size on Disk"

#173838

Postby Julian » October 15th, 2018, 11:32 am

Thanks, you guys are great.

I thought it must be Samba weirdness so good to have support for that theory. I did consider block size issues on incremental backups but the data in question is currently frozen (as in I have copied it across from my old PC but am making sure that I don't make any modifications for another day or so until I have everything stable) so there are no deltas on the baseline initial snapshot yet.

Gryff's discussion of block sizes(*) just triggered a big flashback to my Unix days when I worked at AT&T in the Unix kernel development team and as it happens file systems was my main area although mostly at the abstraction layer level rather than implementing any of the specific file systems themselves. It was pretty much exactly 30 years ago (wow!). I now remember from the Unix file systems I worked on/with that the block size thing isn't always quite as straightforward as you (Gryff) describe. Some Unix file systems (I forget which) actually had the ability to sub-divide a raw FS block into a defined number of smaller chunks and could store the "tail" or a bigger file, or the entirety of a very small file, into one of those chunks of a full FS block. I have no idea if any of that technology made it across to the Linux world but thanks for prompting that trip down memory lane Gryff (and making me realise how old I'm getting!).

(*) One phase in my original post ("... which is the sort of Size/SoD disparity that I would expect due to block sizes") was intended to allude to exactly the block size effects that you describe but I see it wasn't 100% clear what I was referring to.

- Julian

chas49
Lemon Quarter
Posts: 1989
Joined: November 4th, 2016, 10:25 am
Has thanked: 221 times
Been thanked: 473 times

Re: Big disparity between "Size" and "Size on Disk"

#174003

Postby chas49 » October 15th, 2018, 9:20 pm

Julian wrote:P.S. - A big thanks to the site owners here. I've been doing what I always do and keeping a step-by-step log book of all the installation procedures done to set up my new PC, settings changes, what third party software is installed and from where, etc. (I think the logbook thing is a legacy of the summers I spent as a teenager doing part time computer operator jobs at big ICL and Honeywell mainframe sites.) I keep the logbook in a note in Evernote and it's been driving me mad that in Evernote every time I type "D:" it changes it to :smile: and a bit of googling seems to indicate that the behaviour can't be turned off. Thank goodness that doesn't happen here because it is seriously annoying! (The workaround is to type D<space>: and then come back later and delete the spaces so it's obviously just an input-time thing.)


This (
https://help.evernote.com/hc/en-us/arti ... formatting) suggests you can switch it off...


Return to “Technology - Computers, TV, Phones etc.”

Who is online

Users browsing this forum: No registered users and 30 guests