Size on disk issue
-
- Posts: 12
- Joined: 2009/10/26 10:21:14
Size on disk issue
I have a problem understanding why a bunch of files that occupied a certain size on one drive occupy much more on another. Here are the details: we have a file sever that until today stored the files on a RAID 0 device formed of 2 drives, one of 750 Gb and the other of 250 Gb. The total amount of space available on the md0 device formed of the 2 drives, after formatting (ext3), was somewhere around 900 Gb.
The RAID device was almost full, so we purchased a 2Tb drive to replace it. After formatting (ext4 this time) the available space was 1.8 Tb.
I copied with rsync the files from the old RAID to the new drive.
Surprise! The Space occupied by the files copied from the RAID drive is 1.3 TB!!! How come there is such a difference in the space by the same files on the old RAID drive and on the new 2Tb drive?
What can I check?
I ran rsync several times with the delete option enabled in order to be sure that the files from the RAID device are mirrored exactly, without any duplicates.
But still, the occupied size reported by df H is the same: 1.3 Tb.
The RAID device was almost full, so we purchased a 2Tb drive to replace it. After formatting (ext4 this time) the available space was 1.8 Tb.
I copied with rsync the files from the old RAID to the new drive.
Surprise! The Space occupied by the files copied from the RAID drive is 1.3 TB!!! How come there is such a difference in the space by the same files on the old RAID drive and on the new 2Tb drive?
What can I check?
I ran rsync several times with the delete option enabled in order to be sure that the files from the RAID device are mirrored exactly, without any duplicates.
But still, the occupied size reported by df H is the same: 1.3 Tb.
-
- Posts: 10642
- Joined: 2005/08/05 15:19:54
- Location: Northern Illinois, USA
Size on disk issue
A RAID 1 formed from a 250G drive and a 750G drive will be less than 250G, not 900G.
Your 1.8T filesystem likely uses a bigger block size than your older filesystem, so files will
take up more disk space.
Your 1.8T filesystem likely uses a bigger block size than your older filesystem, so files will
take up more disk space.
Re: Size on disk issue
I'd suspect that you had some sparse files on the original file system and you didn't tell rsync to use --sparse to save space on the destination.
-
- Posts: 12
- Joined: 2009/10/26 10:21:14
Re: Size on disk issue
Sorry, I corrected that. Was RAID 0 not 1.
-
- Posts: 12
- Joined: 2009/10/26 10:21:14
Re: Size on disk issue
[quote]
TrevorH wrote:
I'd suspect that you had some sparse files on the original file system and you didn't tell rsync to use --sparse to save space on the destination.[/quote]
I see ... maybe, I didn't think about that. Any way to solve the issue?
TrevorH wrote:
I'd suspect that you had some sparse files on the original file system and you didn't tell rsync to use --sparse to save space on the destination.[/quote]
I see ... maybe, I didn't think about that. Any way to solve the issue?
Re: Size on disk issue
If you still have the old disks then you could recopy the data but I suspect you either haven't got them or the data on the new disk has changed since the original copy. I did find a perl script online that purports to find files that occupy less blocks than are indicated by the size of the file but I am not entirely sure that it's bug free :-) However it looks like this
[code]
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Find;
sub process_file {
my $f=$File::Find::name;
(my $dev,my $ino,my $mode,my $nlink,my $uid,my $gid,my $rdev,my $size,my $atime,my $mtime,my $ctime,my $blksize,my $blocks) = stat($f);
if ($blocks * $blksize < $size) {
printf "\t$f => SZ: %u BLKSZ: %u BLKS: %u = %u \n", $size, $blksize, $blocks, $blksize * $blocks;
}
}
find(\&process_file,("/mnt/whereever"));
[/code]
You'd need to change the last line to start the search where your old disks are mounted. That would find any sparse files under /mnt/whereever and show you the size and then how many bytes were actually used. Once identified you would then need to work out if the file had now changed or if you can recopy it using something that respects sparse files (cp --sparse=auto -p oldfile newfile?)
[code]
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Find;
sub process_file {
my $f=$File::Find::name;
(my $dev,my $ino,my $mode,my $nlink,my $uid,my $gid,my $rdev,my $size,my $atime,my $mtime,my $ctime,my $blksize,my $blocks) = stat($f);
if ($blocks * $blksize < $size) {
printf "\t$f => SZ: %u BLKSZ: %u BLKS: %u = %u \n", $size, $blksize, $blocks, $blksize * $blocks;
}
}
find(\&process_file,("/mnt/whereever"));
[/code]
You'd need to change the last line to start the search where your old disks are mounted. That would find any sparse files under /mnt/whereever and show you the size and then how many bytes were actually used. Once identified you would then need to work out if the file had now changed or if you can recopy it using something that respects sparse files (cp --sparse=auto -p oldfile newfile?)
-
- Posts: 12
- Joined: 2009/10/26 10:21:14
Re: Size on disk issue
Thank you very much for your reply. Fortunatelly, I have a spare 1.5 Tb drive, so I think I am going to copy there all the files and then copy them back with the correct parameters this time.
Ragarding block size, as someone sugested earlier, what do you think? I formatted the drive with default settings. I didn't set the block size. Should I change it?
Thanks again.
Ragarding block size, as someone sugested earlier, what do you think? I formatted the drive with default settings. I didn't set the block size. Should I change it?
Thanks again.
Re: Size on disk issue
I just tested copying a file containing nothing but binary zeroes and using --sparse-always with cp does make the new output file a sparse one.
You can find out about block size if the filesystem is ext3 by running e.g.
[code]
tune2fs -l /dev/mapper/vg500-LogVolTmp | grep Block
[/code]
You can find out about block size if the filesystem is ext3 by running e.g.
[code]
tune2fs -l /dev/mapper/vg500-LogVolTmp | grep Block
[/code]