polkit failure on about 1/3 of reboots

General support questions
Post Reply
mathog
Posts: 258
Joined: 2008/07/09 23:52:06

polkit failure on about 1/3 of reboots

Post by mathog » 2019/07/18 20:20:07

Today I needed to reboot a Centos 7 (fully updated) system many times to debug an LSB script which would not start at boot. (Side story - the LSB script needed an account to be present which came in from ypbind. It was always available on any ssh login, so manual testing always worked. However, none of the Required-Start parameters in the script enforced that dependency. A wait loop on 'ypcat passwd' giving a success status solved the problem.) Of these reboots, about 1/3 of the time polkit failed, and the system is basically unusable without that.

Here are some snippets from /var/log/messages

Code: Select all

#a boot where polkit came up normally
Jul 18 13:01:45 server polkitd[2399]: Started polkitd version 0.112
Jul 18 13:01:55 server dbus[2390]: [system] Reloaded configuration

#a boot where it failed
Jul 18 11:56:47 server dbus[2385]: [system] Connection has not authenticated soon enough, closing it (auth_timeout=30000ms, elapsed: 30003ms)
Jul 18 11:57:00 server dbus[2385]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Jul 18 11:57:04 server dbus[2385]: [system] Failed to activate service 'org.freedesktop.login1': timed out
Jul 18 11:57:25 server dbus[2385]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Jul 18 11:57:47 server systemd: polkit.service start operation timed out. Terminating.
Jul 18 11:57:47 server systemd: Unit polkit.service entered failed state.
Jul 18 11:57:47 server systemd: polkit.service failed.
When it failed the only way out was:

Code: Select all

/usr/lib/polkit-1/polkitd --no-debug &
shutdown -r now
System is using kernel 3.10.0-957.21.3.el7.x86_64. Any ideas how to figure out why polkit (or dbus) is going belly up?

mathog
Posts: 258
Joined: 2008/07/09 23:52:06

Re: polkit failure on about 1/3 of reboots

Post by mathog » 2019/07/18 23:06:33

We also have a small compute cluster of 9 nodes. Rebooted those today too and 1 of the 9 came up without polkit. All of these machines are Dell PowerEdge T110 or T110 II. But it doesn't "feel" like a hardware issue, it "feels" like a race condition or a similar sort of software problem.

mathog
Posts: 258
Joined: 2008/07/09 23:52:06

Re: polkit failure on about 1/3 of reboots

Post by mathog » 2020/04/13 22:11:13

This is still going on, and is still a PITA. In the interim I have learned the fastest way to fix this, without a reboot is after a very slow ssh remote login do:

Code: Select all

/usr/lib/polkit-1/polkitd --no-debug &
systemctl restart dbus.service
systemctl start polkit.service
When polkitd fails to start at boot lots of other things also fail. The system is up, but only barely. Here is part
of /var/log/messages following a reboot on a compute node where polkitd did this:

Code: Select all

Apr 13 13:36:47 monkey29 systemd: polkit.service start operation timed out. Terminating.
Apr 13 13:36:47 monkey29 systemd: Failed to start Authorization Manager.
Apr 13 13:36:47 monkey29 systemd: Dependency failed for Dynamic System Tuning Daemon.
Apr 13 13:36:47 monkey29 systemd: Job tuned.service/start failed with result 'dependency'.
Apr 13 13:36:47 monkey29 systemd: Unit polkit.service entered failed state.
Apr 13 13:36:47 monkey29 systemd: polkit.service failed.
Apr 13 13:36:52 monkey29 systemd: tuned.service start operation timed out. Terminating.
Apr 13 13:36:53 monkey29 systemd: Unit tuned.service entered failed state.
Apr 13 13:36:53 monkey29 systemd: tuned.service failed.
Apr 13 13:37:03 monkey29 dbus[1157]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Apr 13 13:37:03 monkey29 systemd-logind: Failed to enable subscription: Failed to activate service 'org.freedesktop.systemd1': timed out
Apr 13 13:37:03 monkey29 systemd-logind: Failed to fully start up daemon: Connection timed out
Apr 13 13:37:03 monkey29 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
Apr 13 13:37:03 monkey29 systemd: Failed to start Login Service.
It would be really nice to find out how to prevent this!

pjsr2
Posts: 614
Joined: 2014/03/27 20:11:07

Re: polkit failure on about 1/3 of reboots

Post by pjsr2 » 2020/04/14 09:29:28

ypbind failures make polkitd fail and then a lot of other things fail ....
Happened to me and luckily I did not need ypbind anymore and simply disabled it.

Check, double check and triple check your ypbind/nis configuration.

Anything helpful for you in this thread: https://access.redhat.com/discussions/3959351 ?
That thread suggests a solution may be in 7.8. CentOS 7.8 not being released yet, you may want to try the CR-repos.
See viewtopic.php?f=47&t=73927#p311626

mathog
Posts: 258
Joined: 2008/07/09 23:52:06

Re: polkit failure on about 1/3 of reboots

Post by mathog » 2020/04/14 17:56:51

pjsr2 wrote:
2020/04/14 09:29:28
ypbind failures make polkitd fail and then a lot of other things fail ....
Happened to me and luckily I did not need ypbind anymore and simply disabled it.

Check, double check and triple check your ypbind/nis configuration.
Nothing wrong with the configuration as far as I can tell. They work most of the time, it just fails 10% (rough estimate) of the time on a reboot for no apparent reason. (If I had to guess it would be that the asynchronous way systemd works somehow results in RPC not being fully functional when ypbind needs it, since it doesn't connect when the server and network are fully functional.) Failure seems to be randomly on one machine or the other, and they are all built from the same image in any case. I'm not upgrading any software on the production systems at the moment since I'm working from home and if anything hangs badly on reboot there would be no way to regain control. None would have been rebooted, but there was a brief power failure at work, and those nine machines are not on a UPS, so there was no way around it.

Post Reply