CentOS 7: Complete Freeze at seemingly random times

Issues related to applications and software problems
mschwart
Posts: 6
Joined: 2019/08/23 20:17:17

CentOS 7: Complete Freeze at seemingly random times

Post by mschwart » 2019/08/26 16:37:05

Hello

I am experiencing crashes/freezes at random intervals and I am struggling to diagnose it. At random times, my computer will freeze completely: the display freezes, mouse/keyboard inputs will not register, can't SSH into the machine, and what I found interesting is that the red light under the mouse will go out and hitting num/cap/scrl lock on the keyboard will no longer affect the lights on the keyboard (how a green light will come on/off depending on if caps lock is on/off if that makes sense). The only way I can make the machine respond again is a complete restart (and I have to do it by the power button on the case since I can't input anything). Sometimes it will freeze after working on it for a few hours, sometimes it lasts 10 minutes, and sometimes it won't even let me login before freezing. Reading /var/log/messages does not provide any information, as there are no messages near the time the computer crashes. How would I go about diagnosing this issue? I am not the most experienced user.

Unfortunately, I inherited this machine from the last person at this desk, so I am not sure how long this has been going on. Since I have had the machine, all I have done is update Nvidia drivers. I am not sure if that is the cause if I am being honest. When I received the machine, it would not recognize a second monitor, but updating the Nvidia drivers fixed that.

I appreciate anyone's time/suggestions/help! Thanks!

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7: Complete Freeze at seemingly random times

Post by TrevorH » 2019/08/26 18:16:35

Are the keyboard LEDs flashing on and off when it freezes?

Are you up to date? Run yum update as root to check if you are. If you are not then you may want to say N to the prompt and go read the Release Notes for each release that's come out since your current one before you continue.

What's the output from uname -r ?

Oh, and how did you install the nvidia drivers? ELRepo?
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

mschwart
Posts: 6
Joined: 2019/08/23 20:17:17

Re: CentOS 7: Complete Freeze at seemingly random times

Post by mschwart » 2019/08/26 19:47:01

First, thanks for your reply!

1.) the keyboard LEDs are not flashing. They stay constantly on. They will not toggle even if pressings caps/num/scrl lock.

2.)I am up to date.

3.) 3.10.0-957.27.2.el7.x86_64

4.) I followed this guide. ( https://www.advancedclustering.com/act_ ... -centos-7/ ).

I may have found something in messages this crash. I get several messages like this:

"

ug 26 12:57:55 chewie kernel: ------------[ cut here ]------------
Aug 26 12:57:55 chewie kernel: WARNING: CPU: 10 PID: 15211 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
Aug 26 12:57:55 chewie kernel: list_del corruption. prev->next should be ffffeef240e97aa0, but was dead000000000100
Aug 26 12:57:55 chewie kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter mpt2sas mptctl mptbase rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fuse cachefiles fscache sunrpc dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi snd_hda_codec_hdmi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer sg snd mei_me soundcore mei
Aug 26 12:57:55 chewie kernel: lpc_ich ioatdma i2c_i801 ipmi_si acpi_pad ip_tables xfs libcrc32c nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) sr_mod cdrom sd_mod crc_t10dif crct10dif_generic nouveau video mxm_wmi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci crct10dif_pclmul crct10dif_common crc32c_intel libahci ixgbe drm megaraid_sas libata mpt3sas mdio ptp raid_class pps_core scsi_transport_sas dca wmi drm_panel_orientation_quirks ipmi_devintf ipmi_msghandler
Aug 26 12:57:55 chewie kernel: CPU: 10 PID: 15211 Comm: watch Tainted: P OE ------------ 3.10.0-957.27.2.el7.x86_64 #1
Aug 26 12:57:55 chewie kernel: Hardware name: Supermicro Super Server/X10QBL-4CT, BIOS 3.0a 11/23/2016
Aug 26 12:57:55 chewie kernel: Call Trace:
Aug 26 12:57:55 chewie kernel: [<ffffffffad764147>] dump_stack+0x19/0x1b
Aug 26 12:57:55 chewie kernel: [<ffffffffad098848>] __warn+0xd8/0x100
Aug 26 12:57:55 chewie kernel: [<ffffffffad0988cf>] warn_slowpath_fmt+0x5f/0x80
Aug 26 12:57:55 chewie kernel: [<ffffffffad1c1082>] ? __free_memcg_kmem_pages+0x22/0x50
Aug 26 12:57:55 chewie kernel: [<ffffffffad396991>] __list_del_entry+0xa1/0xd0
Aug 26 12:57:55 chewie kernel: [<ffffffffad3969cd>] list_del+0xd/0x30
Aug 26 12:57:55 chewie kernel: [<ffffffffad1c08ed>] free_pcppages_bulk+0x1bd/0x3a0
Aug 26 12:57:55 chewie kernel: [<ffffffffad1c0d91>] free_hot_cold_page+0x141/0x160
Aug 26 12:57:55 chewie kernel: [<ffffffffad1c0df6>] free_hot_cold_page_list+0x46/0xa0
Aug 26 12:57:55 chewie kernel: [<ffffffffad1c657e>] release_pages+0x24e/0x430
Aug 26 12:57:55 chewie kernel: [<ffffffffad20098d>] free_pages_and_swap_cache+0xad/0xd0
Aug 26 12:57:55 chewie kernel: [<ffffffffad1e582c>] tlb_flush_mmu.part.76+0x8c/0xe0
Aug 26 12:57:55 chewie kernel: [<ffffffffad1e70d5>] tlb_finish_mmu+0x55/0x60
Aug 26 12:57:55 chewie kernel: [<ffffffffad1f42fb>] exit_mmap+0xdb/0x1a0
Aug 26 12:57:55 chewie kernel: [<ffffffffad095247>] mmput+0x67/0xf0
Aug 26 12:57:55 chewie kernel: [<ffffffffad09ee65>] do_exit+0x285/0xa40
Aug 26 12:57:55 chewie kernel: [<ffffffffad771628>] ? __do_page_fault+0x228/0x4f0
Aug 26 12:57:55 chewie kernel: [<ffffffffad09f69f>] do_group_exit+0x3f/0xa0
Aug 26 12:57:55 chewie kernel: [<ffffffffad09f714>] SyS_exit_group+0x14/0x20
Aug 26 12:57:55 chewie kernel: [<ffffffffad776ddb>] system_call_fastpath+0x22/0x27
Aug 26 12:57:55 chewie kernel: ---[ end trace 871273491ee98154 ]---
Aug 26 12:57:56 chewie sh: abrt-dump-oops: Found oopses: 1
Aug 26 12:57:56 chewie sh: abrt-dump-oops: Creating problem directories
Aug 26 12:57:56 chewie sh: abrt-dump-oops: Not going to make dump directories world readable because PrivateReports is on
Aug 26 12:57:56 chewie abrt-server: '/var/spool/abrt/oops-2017-12-13-18:05:07-3035-0' is not a problem directory
Aug 26 12:57:57 chewie kernel: ------------[ cut here ]------------



These messages will continue until there starts to be messages like this:

"
Aug 26 12:58:54 chewie kernel: WARNING: CPU: 10 PID: 13109 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
Aug 26 12:58:54 chewie kernel: list_del corruption, ffffeef240e99560->next is LIST_POISON1 (dead000000000100)
Aug 26 12:58:54 chewie journal: Missed 28 kernel messages
"

My suspicion is now that the issue is caused by mounting a network drive that my office uses for its applications. I will mount it and be able to continue using my machine for a while before it freezes. I do not know how to diagnose that issue though or confirm that that is the case.
"

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7: Complete Freeze at seemingly random times

Post by TrevorH » 2019/08/26 22:39:55

Is it mounted via nfs? If so, perhaps specifying a different nfs_vers= on the mount might bypass the problem. You should be able to see current mount options in /proc/mounts.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

vankata
Posts: 2
Joined: 2019/08/27 14:08:15
Location: Bristol, UK
Contact:

Re: CentOS 7: Complete Freeze at seemingly random times

Post by vankata » 2019/08/27 14:11:06

Hi,
Try to disable Hyper-Threading from BIOS and test again.

Regards,

mschwart
Posts: 6
Joined: 2019/08/23 20:17:17

Re: CentOS 7: Complete Freeze at seemingly random times

Post by mschwart » 2019/08/28 16:16:23

TrevorH wrote:
2019/08/26 22:39:55
Is it mounted via nfs? If so, perhaps specifying a different nfs_vers= on the mount might bypass the problem. You should be able to see current mount options in /proc/mounts.
Unfortunately, it is not related to mounting. I am still experiencing the freezing with no network drives mounted.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS 7: Complete Freeze at seemingly random times

Post by TrevorH » 2019/08/28 16:28:00

Then I would suggest downloading and running memtest86+ on the machine. It needs to run for several hours.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

tunk
Posts: 1205
Joined: 2017/02/22 15:08:17

Re: CentOS 7: Complete Freeze at seemingly random times

Post by tunk » 2019/08/29 09:49:24

You could try to take out the memory modules and reseat them a few times.
Ditto for all cables.

mschwart
Posts: 6
Joined: 2019/08/23 20:17:17

Re: CentOS 7: Complete Freeze at seemingly random times

Post by mschwart » 2019/09/03 16:29:17

TrevorH wrote:
2019/08/28 16:28:00
Then I would suggest downloading and running memtest86+ on the machine. It needs to run for several hours.
I just wanted to quickly say I have not abandoned this issue, I am still working it and I greatly appreciate your help. Unfortunately, running memtest overnight doesn't finish, so I am waiting to run it over the weekend since I need the machine during the day for work. I forgot to start it before I left on Friday..... I will return the results of running memtest hopefully on Monday.


I doubt it is related, but I figured I would post anything odd I find in case it brings anything to light, but the machine responds slowly... Running gnome I believe, but if I want to switch workspaces through the keyboard shortcut, it takes 2-3 seconds before it will switch. If for example, I switch down several workspaces and then back up, my machine can lock up for 20-30 seconds as it executes all of the commands. Same for if I press "tab" too many times trying to autocomplete while typing. I have to wait for the terminal to execute all those commands before it will respond again. Or if I hit "backspace" too many times and try to clear text that isn't there, depending on how many extra inputs I give it, the computer will freeze up again. Hopefully, all of that makes sense.

Again, thank you for your time, I know it is a difficult issue to troubleshoot, so I greatly appreciate all of your help. It is very frustrating for me and difficult to get any work done.

mschwart
Posts: 6
Joined: 2019/08/23 20:17:17

Re: CentOS 7: Complete Freeze at seemingly random times

Post by mschwart » 2019/09/03 16:30:35

tunk wrote:
2019/08/29 09:49:24
You could try to take out the memory modules and reseat them a few times.
Ditto for all cables.
If it is worth anything, the machine was put together by a third-party company. They supposedly assemble it and run an extensive stress test for 24 hours. This was two years ago I am told. Do you think this is still something to be done? My supervisors are hesitant to let me fiddle around with the machine, but if it may fix this issue, I may be able to argue for it.

Post Reply