Degraded IO performance on kernel 2.6.32-504.3.3
Degraded IO performance on kernel 2.6.32-504.3.3
Hi all
It seems that somehow I'm haunted by performance degradation issues so here my latest one.
Since kernel kernel 2.6.32-504.3.3 I've had a drastic drop in disk io performance.
as example
dd if=/dev/md1 of=/dev/null bs=64k count=1000000
give me about 210mb/sec under 2.6.32-431.29.2
while it gives me less than 120mb/sec under 2.6.32-504.3.3.
This reflect highly on tape backup time. on 431 it used to reach the top tape speed while on 2.6.32-504.3.3 dump never surpass 90mb/s, and backup take twice the time!
I've also drop in performance on VM io (and of course boot times).
All my HDD are under software raid 5 (mdadm).
What's happening?
Thank You
K.
It seems that somehow I'm haunted by performance degradation issues so here my latest one.
Since kernel kernel 2.6.32-504.3.3 I've had a drastic drop in disk io performance.
as example
dd if=/dev/md1 of=/dev/null bs=64k count=1000000
give me about 210mb/sec under 2.6.32-431.29.2
while it gives me less than 120mb/sec under 2.6.32-504.3.3.
This reflect highly on tape backup time. on 431 it used to reach the top tape speed while on 2.6.32-504.3.3 dump never surpass 90mb/s, and backup take twice the time!
I've also drop in performance on VM io (and of course boot times).
All my HDD are under software raid 5 (mdadm).
What's happening?
Thank You
K.
Re: Degraded IO performance on kernel 2.6.32-504.3.3
So am I the only one this time?
Re: Degraded IO performance on kernel 2.6.32-504.3.3
kernel 2.6.32-504.8.1.el6.x86_64 still has the same issue for me
Re: Degraded IO performance on kernel 2.6.32-504.3.3
I'm following this thread but unfortunately I don't have anything relevant to add.
I still experience slower I/O than expected from time to time, but quite frankly I don't feel like debugging it again. I couldn't find any definitive test case to reproduce the problem, I don't even know where to start... I don't have time for that, so I upgraded a few servers to SSDs and hope for the best.
Just the other day I got a (host to VM) ping timeout warning from one of my VMs (now residing on SSD) while copying a (unrelated) large file over the network to the internal md-RAID6 array. Either the machine is still swapping when the page cache is full or something else has to be going wrong...but as I've said, I haven't checked it in detail.
I still experience slower I/O than expected from time to time, but quite frankly I don't feel like debugging it again. I couldn't find any definitive test case to reproduce the problem, I don't even know where to start... I don't have time for that, so I upgraded a few servers to SSDs and hope for the best.
Just the other day I got a (host to VM) ping timeout warning from one of my VMs (now residing on SSD) while copying a (unrelated) large file over the network to the internal md-RAID6 array. Either the machine is still swapping when the page cache is full or something else has to be going wrong...but as I've said, I haven't checked it in detail.
Re: Degraded IO performance on kernel 2.6.32-504.3.3
I had a problem that looked similar to that the one you was talking about take a look at this topic
viewtopic.php?f=14&t=6838&start=20
for the degraded performance the check that really shows the difference to me is dump+pv
I currently use dump of ext4 partition to backup to LTO5 a rsnapshot backup volume that resides on a dedicated 3 disk raid 5 array.
It is a good benchmark because
the partition is unmounted during the dump process (it mounted only during the rsnapshot process at night)
almost all happens on dedicated hardware (the raid is on 3 disk that host only that volume those disks are on a sas controller that has only those and the LTO tape)
so this is how dump behave with 2.6.32-504.8.1:
DUMP: Date of this level 0 dump: Tue Feb 24 09:05:54 2015
DUMP: Dumping /dev/md1p1 (/mnt/OnlineBackup) to standard output
DUMP: Label: OnlineBackup
DUMP: Writing 64 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 1337167688 blocks.
DUMP: Volume 1 started with block 1 at: Tue Feb 24 09:06:45 2015
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: 0.82% done at 36439 kB/s, finished in 10:06
DUMP: 2.53% done at 56476 kB/s, finished in 6:24
DUMP: 4.37% done at 64944 kB/s, finished in 5:28
DUMP: 6.40% done at 71297 kB/s, finished in 4:52
DUMP: 8.47% done at 75510 kB/s, finished in 4:30
It seems better than previous kernel, but the tape drive still have an huge number of slowdowns and at least a stop every minute! (that is really too stressful for the LTO tape)
pv shows that the speed never become higher than 90-95mb/s and it goes to an averange of about 80mb/s
This is how dump behave with 2.6.32-431.29.2
DUMP: Date of this level 0 dump: Tue Feb 24 09:56:56 2015
DUMP: Dumping /dev/md1p1 (/mnt/OnlineBackup) to standard output
DUMP: Label: OnlineBackup
DUMP: Writing 64 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 1337167688 blocks.
DUMP: Volume 1 started with block 1 at: Tue Feb 24 09:57:52 2015
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: 1.03% done at 45972 kB/s, finished in 7:59
DUMP: 3.56% done at 79339 kB/s, finished in 4:30
DUMP: 6.54% done at 97188 kB/s, finished in 3:34
DUMP: 9.46% done at 105427 kB/s, finished in 3:11
DUMP: 12.72% done at 113380 kB/s, finished in 2:51
DUMP: 16.37% done at 121606 kB/s, finished in 2:33
and pv says that that the speed it mostly a stable 135mb/s but sometimes it goes up to 195mb/s (this should happen when compressible data is encountered)
so about 30% slower due to the speed limit of LTO5 but actually it is about 50% slower! and no clue on WHY!
I know that now there is Centos 7 but I cannot upgrade in the short term and I don't know if upgrading would solve this issue
So I hope that somehow the issue will be solved (like the one i had with ISCSI)
Cya
K.
viewtopic.php?f=14&t=6838&start=20
for the degraded performance the check that really shows the difference to me is dump+pv
I currently use dump of ext4 partition to backup to LTO5 a rsnapshot backup volume that resides on a dedicated 3 disk raid 5 array.
It is a good benchmark because
the partition is unmounted during the dump process (it mounted only during the rsnapshot process at night)
almost all happens on dedicated hardware (the raid is on 3 disk that host only that volume those disks are on a sas controller that has only those and the LTO tape)
so this is how dump behave with 2.6.32-504.8.1:
DUMP: Date of this level 0 dump: Tue Feb 24 09:05:54 2015
DUMP: Dumping /dev/md1p1 (/mnt/OnlineBackup) to standard output
DUMP: Label: OnlineBackup
DUMP: Writing 64 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 1337167688 blocks.
DUMP: Volume 1 started with block 1 at: Tue Feb 24 09:06:45 2015
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: 0.82% done at 36439 kB/s, finished in 10:06
DUMP: 2.53% done at 56476 kB/s, finished in 6:24
DUMP: 4.37% done at 64944 kB/s, finished in 5:28
DUMP: 6.40% done at 71297 kB/s, finished in 4:52
DUMP: 8.47% done at 75510 kB/s, finished in 4:30
It seems better than previous kernel, but the tape drive still have an huge number of slowdowns and at least a stop every minute! (that is really too stressful for the LTO tape)
pv shows that the speed never become higher than 90-95mb/s and it goes to an averange of about 80mb/s
This is how dump behave with 2.6.32-431.29.2
DUMP: Date of this level 0 dump: Tue Feb 24 09:56:56 2015
DUMP: Dumping /dev/md1p1 (/mnt/OnlineBackup) to standard output
DUMP: Label: OnlineBackup
DUMP: Writing 64 Kilobyte records
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 1337167688 blocks.
DUMP: Volume 1 started with block 1 at: Tue Feb 24 09:57:52 2015
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: 1.03% done at 45972 kB/s, finished in 7:59
DUMP: 3.56% done at 79339 kB/s, finished in 4:30
DUMP: 6.54% done at 97188 kB/s, finished in 3:34
DUMP: 9.46% done at 105427 kB/s, finished in 3:11
DUMP: 12.72% done at 113380 kB/s, finished in 2:51
DUMP: 16.37% done at 121606 kB/s, finished in 2:33
and pv says that that the speed it mostly a stable 135mb/s but sometimes it goes up to 195mb/s (this should happen when compressible data is encountered)
so about 30% slower due to the speed limit of LTO5 but actually it is about 50% slower! and no clue on WHY!
I know that now there is Centos 7 but I cannot upgrade in the short term and I don't know if upgrading would solve this issue
So I hope that somehow the issue will be solved (like the one i had with ISCSI)
Cya
K.
Re: Degraded IO performance on kernel 2.6.32-504.3.3
Hi all,
I experience the same problem and will have to switch to the old kernel, first i thought i lost some tuning options but nothing helped.
the storage is used as backup to disk to tape buffer and was able to write 10 streams at a sum of 200MB/s and one stream reading at least at 140MB/s
Now i can read oly with 60MB/s when writing. without write io it is still fast enough to feed the tape effective.
The write performance seems unchanged.
I am using EXT4 on a HW Raid 50.
It seems not related to a fileystem, what are you using?
Thank you
I experience the same problem and will have to switch to the old kernel, first i thought i lost some tuning options but nothing helped.
the storage is used as backup to disk to tape buffer and was able to write 10 streams at a sum of 200MB/s and one stream reading at least at 140MB/s
Now i can read oly with 60MB/s when writing. without write io it is still fast enough to feed the tape effective.
The write performance seems unchanged.
I am using EXT4 on a HW Raid 50.
It seems not related to a fileystem, what are you using?
Thank you
Re: Degraded IO performance on kernel 2.6.32-504.3.3
ext4 on mdraid
Re: Degraded IO performance on kernel 2.6.32-504.3.3
Is it possible the interrupts are not being properly balanced?
https://bugzilla.redhat.com/show_bug.cgi?id=911649
My megasas driver is all on cpu0 judging from /proc/interrupts.
It would seem that 6.7 better be out fast...
edit: I have downgraded to irqbalance-1.0.4-9.el6_5.x86_64 and the counters are now rising on other cpus...I'll see if io performance is improved...
https://bugzilla.redhat.com/show_bug.cgi?id=911649
My megasas driver is all on cpu0 judging from /proc/interrupts.
It would seem that 6.7 better be out fast...
edit: I have downgraded to irqbalance-1.0.4-9.el6_5.x86_64 and the counters are now rising on other cpus...I'll see if io performance is improved...
Re: Degraded IO performance on kernel 2.6.32-504.3.3
I've three controller
one is intel sata, another is asmedia sata
and the latest is an atto sas LBA
A this moment I cannot test on the latest kernel cause.
Hope to have the time to make a test soon
cya
K.
one is intel sata, another is asmedia sata
and the latest is an atto sas LBA
A this moment I cannot test on the latest kernel cause.
Hope to have the time to make a test soon
cya
K.
Re: Degraded IO performance on kernel 2.6.32-504.3.3
do someone know if the regression has been resolved?