[SOLVED]High CPU usage somewhat related to SFTP

General support questions
carbonfiber636
Posts: 11
Joined: 2019/07/04 11:21:23

Re: High CPU usage somewhat related to SFTP

Post by carbonfiber636 » 2019/07/04 16:57:18

I tried disabling hpet, I think it's disabled given this:

Code: Select all

[mi@scientific-centos ~]$ dmesg | grep -i hpet
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.21.3.el7.x86_64 root=/dev/mapper/centos_scientific--centos-root ro nouveau.modeset=0 rd.driver.blacklist=nouveau crashkernel=auto rd.lvm.lv=centos_scientific-centos/root rd.lvm.lv=centos_scientific-centos/swap rhgb quiet nouveau.modeset=0 hpet=disable
[    0.000000] ACPI: HPET 00000000000fcb99 00038 (v01 DELL    B10K    00000015 ASL  00000061)
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.21.3.el7.x86_64 root=/dev/mapper/centos_scientific--centos-root ro nouveau.modeset=0 rd.driver.blacklist=nouveau crashkernel=auto rd.lvm.lv=centos_scientific-centos/root rd.lvm.lv=centos_scientific-centos/swap rhgb quiet nouveau.modeset=0 hpet=disable
[    0.889842] hpet_acpi_add: no address or irqs in _CRS

also:

Code: Select all

[mi@scientific-centos ~]$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm 
The problem is still there, in /proc/interrupts there is no more mention of hpet interrupts, I'll retry the 10 min test.
But anyway you say I should try a BIOS update, I'll look into it when I have the time.
Is there anything else that could be connected with the SFTP process, like what could it do to stop all those interrupts?
Also could there be a way as a workaround to have an idle sftp process that starts at boot without any connected computer?

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: High CPU usage somewhat related to SFTP

Post by TrevorH » 2019/07/04 18:03:17

I am pretty sure that it's not sftp per se that's doing this. I would guess that it just happens to use timers and that alleviates the problem a bit. Even your numbers for those hpet interrupts when you don't have the problem look high to me. On a fairly idle machine that's been up for 14 days, I get this lot

Code: Select all

  90:       4250          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet2
  91:          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet3
  92:          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet4
  93:          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet5
  94:          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet6
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

carbonfiber636
Posts: 11
Joined: 2019/07/04 11:21:23

Re: High CPU usage somewhat related to SFTP

Post by carbonfiber636 » 2019/07/04 18:42:22

ok yes I get what you're saying.
However I would not say "a bit", I mean when I have SFTP running even if the interrupts are higher than you would expect I get kernel space usage based on top about 0.2-0.6% I can't see it go past 1%(which seems normal to me for an idling machine) while without SFTP running I get 26% kernel space usage which is basically a 2 order of magnitude increase so whatever SFTP is doing it's practically almost killing the problem.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: High CPU usage somewhat related to SFTP

Post by TrevorH » 2019/07/04 20:07:08

Still, your hpet timer interrupts are way too high even when you're not seeing the problem. This is from another machine that I have, this one in production, been up 14 days and has sent and received ~6TB via the network in that time and has half a dozen fairly active VMs on it:

Code: Select all

  74:       4268          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet2
  75:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet3
  76:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet4
  77:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet5
  78:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0  IR-HPET_MSI-edge      hpet6
So my active production machine that's been up for 2 weeks+, has done 12TB of network traffic+ has taken 4268 hpet interrupts yet yours is taking 900,000 in 10 minutes, 90,000 a minute, 1500 a second. So in my 2 weeks uptime, I've taken the same number of interrupts for this as yours does every 3 seconds. And that's when it's behaving itself!
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

carbonfiber636
Posts: 11
Joined: 2019/07/04 11:21:23

Re: High CPU usage somewhat related to SFTP

Post by carbonfiber636 » 2019/07/05 15:06:53

Yes I agree, so then now that HPET is disabled I did the /proc/interrupts test again with a timebase of 25 minutes after reboot.

Without SFTP:

Code: Select all

[mi@scientific-centos ~]$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       
  0:        188          0          0          0   IO-APIC-edge      timer
  1:          2          1          1          1   IO-APIC-edge      i8042
  7:          0          0          0          0   IO-APIC-edge      parport0
  8:          0          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          1          1          2          2   IO-APIC-edge      i8042
 16:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 17:         14         19         21         20   IO-APIC-fasteoi   uhci_hcd:usb4, uhci_hcd:usb7, snd_hda_intel:card1
 18:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb8, i801_smbus
 22:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb5
 23:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
 27:         85       3584         83     779877   PCI-MSI-edge      enp0s25
 28:       9067       1643      59560       1631   PCI-MSI-edge      0000:00:1f.2
 29:        158        150        157        155   PCI-MSI-edge      snd_hda_intel:card0
 30:       2989         29         22          2   PCI-MSI-edge      nvidia
NMI:        584       1071        574       1091   Non-maskable interrupts
LOC:    1525470    1635084    1657928    1781457   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:        584       1071        574       1091   Performance monitoring interrupts
IWI:        156        215        110        126   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:     523166     446232     619187     437062   Rescheduling interrupts
CAL:       4626       4276       5395       5093   Function call interrupts
TLB:      37078      38531      34598      33057   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          6          6          6          6   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0   Posted-interrupt wakeup event
[mi@scientific-centos ~]$ uptime
 16:28:23 up 25 min,  1 user,  load average: 1.45, 1.57, 1.42

with SFTP:

Code: Select all

[mi@scientific-centos ~]$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       
  0:        186          0          0          0   IO-APIC-edge      timer
  1:          2          1          1          1   IO-APIC-edge      i8042
  7:          0          0          0          0   IO-APIC-edge      parport0
  8:          0          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          1          1          2          2   IO-APIC-edge      i8042
 16:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 17:         17         15         26         16   IO-APIC-fasteoi   uhci_hcd:usb4, uhci_hcd:usb7, snd_hda_intel:card1
 18:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb8, i801_smbus
 22:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb5
 23:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
 27:        102       3578         93     754521   PCI-MSI-edge      enp0s25
 28:       9775       1978      59671       1973   PCI-MSI-edge      0000:00:1f.2
 29:        149        153        150        158   PCI-MSI-edge      snd_hda_intel:card0
 30:       3354         44         17          4   PCI-MSI-edge      nvidia
NMI:        152        203        118        179   Non-maskable interrupts
LOC:     254632     293738     298788     809650   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:        152        203        118        179   Performance monitoring interrupts
IWI:       2186       2506       4277      39026   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:     187839     167326     186857     147548   Rescheduling interrupts
CAL:       5436       5581       5572       6060   Function call interrupts
TLB:      14595      11398      13565      11335   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          6          6          6          6   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0   Posted-interrupt wakeup event
[mi@scientific-centos ~]$ uptime
 16:55:47 up 25 min,  1 user,  load average: 0.04, 0.05, 0.16
 
As far as I can tell Local timer interrupts are much higher in number, as well as Rescheduling interrupts.
TLB shootdowns are also a bit higher.
What would you suggest to do now? I looked into the new BIOS versions for my machine but in the list of fixes there doesn't appear to be something related to timers, considering that I've never done a BIOS update i'd like to try something else first.
Is there any way to get more detailed information on these interrupts?

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: High CPU usage somewhat related to SFTP

Post by TrevorH » 2019/07/05 15:15:12

So since the problem now moved from hpet to local timer interrupts, I am suspecting that you either have a hardware problem (maybe fixed by a BIOS update) or that you have something else running which is generating all those timer interrupts. I still have no idea why running sftp would appear to make it better, especially when you say that running ssh instead does not since sftp uses ssh as its transport.

Can you try booting in single user mode so that nothing except the essential system tasks are running? There wouldn't be any network active so that also may be a good diagnostic - see if the interrupt counts are increasing in single user mode, if they aren't then try starting just the network (and maybe firewalld/iptables first !) and see if that kicks the problem off. If neither make it do it then try starting the things that would normally be active in your usual runlevel and see which of them makes it do it.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

carbonfiber636
Posts: 11
Joined: 2019/07/04 11:21:23

Re: High CPU usage somewhat related to SFTP

Post by carbonfiber636 » 2019/07/05 15:26:51

TrevorH wrote:
2019/07/05 15:15:12
So since the problem now moved from hpet to local timer interrupts, I am suspecting that you either have a hardware problem (maybe fixed by a BIOS update) or that you have something else running which is generating all those timer interrupts. I still have no idea why running sftp would appear to make it better, especially when you say that running ssh instead does not since sftp uses ssh as its transport.

Can you try booting in single user mode so that nothing except the essential system tasks are running? There wouldn't be any network active so that also may be a good diagnostic - see if the interrupt counts are increasing in single user mode, if they aren't then try starting just the network (and maybe firewalld/iptables first !) and see if that kicks the problem off. If neither make it do it then try starting the things that would normally be active in your usual runlevel and see which of them makes it do it.
Just to be sure before doing that(because if I don't have network connection I have to pull the server out and attach a monitor), do you think that a "screenshot" of top could help? There are several kernel processes that I can see in top and htop but I don't know if any of them shouldn't be running or if there is something off there.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: High CPU usage somewhat related to SFTP

Post by TrevorH » 2019/07/05 15:44:14

A screenshot wouldn't hurt and might help but I am pretty sure that your cpu usage is coming from those interrupts. Now it's "just" a matter of finding out what is causing them and if it's something broken or expected. Even if a piece of hardware has no driver and is not in use, it can generate interrupts which the system has to handle. Yours appears to be generating something like 1500 a second which is quite a lot - one every 0.000666s - so I am not surprised that one core is running flat out. What's still a mystery is why it stops when you have sftp running.

What machine is this? I can see it's a Dell but that only narrows it down to one of about 10,000 types :)
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

carbonfiber636
Posts: 11
Joined: 2019/07/04 11:21:23

Re: High CPU usage somewhat related to SFTP

Post by carbonfiber636 » 2019/07/05 15:57:31

TrevorH wrote:
2019/07/05 15:44:14
A screenshot wouldn't hurt and might help but I am pretty sure that your cpu usage is coming from those interrupts. Now it's "just" a matter of finding out what is causing them and if it's something broken or expected. Even if a piece of hardware has no driver and is not in use, it can generate interrupts which the system has to handle. Yours appears to be generating something like 1500 a second which is quite a lot - one every 0.000666s - so I am not surprised that one core is running flat out. What's still a mystery is why it stops when you have sftp running.

What machine is this? I can see it's a Dell but that only narrows it down to one of about 10,000 types :)
Ok so I think I found the problem, I'm sorry as it turns out it's all my fault and thinking back to it maybe I should have tried it as soon as I saw the connection with SFTP:

I had a script that started at boot that changed the niceness of any sftp-server process from 0 to -14, and once I killed it cpu usage dropped to normal levels.

I realized that I should assume it's a problem I created so I tried thinking of custom scripts I had running so I immediately tried the one that was sftp related.
I don't get why still because I had another one related to ssh virtually identical:

SFTP:

Code: Select all

#!/bin/bash
while true
do
  PID=$(ps -ef | grep sftp-server | grep -v grep | sed s/[a-z\ ]*\ // | sed s/\ .*//)
  if [ "$PID" != "" ]
  then
    renice -14 $PID
    sleep 4m
  fi
done
SSH:

Code: Select all

#!/bin/bash
while true
do
  PID=$(ps -ef | grep sshd | grep -v grep | sed s/[a-z\ ]*\ // | sed s/\ .*//)
  if [ "$PID" != "" ]
  then
    renice -4 $PID
    sleep 4m
  fi
done
they are identical except the name of the process and the niceness value but the ssh one wasn't triggering interrupts like the sftp one.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: High CPU usage somewhat related to SFTP

Post by TrevorH » 2019/07/05 15:59:34

If there's no sftp process running then it doesn't do the sleep, so it's looping. There is almost always an sshd running so it never does that in the other script.

Glad it's explained!
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

Post Reply