CentOS server keeps hanging/crashing

General support questions including new installations
Post Reply
brotsalami
Posts: 5
Joined: 2016/09/20 09:39:46

CentOS server keeps hanging/crashing

Post by brotsalami » 2016/09/20 10:16:32

Hello,

hope somebody can help me at least in some direction here...

I have a server running at our provider (offering DDNS and FTP for some of our devices and employees). It worked for many years with almost no problem, or nothing a reboot couldn't fix.

We have installed a CentOS 5.11 (back when we started this was the newest version!)

CentOS release 5.11 (Final)
Kernel 2.6.18-348.6.1.el5 on an x86_64
kernel-2.6.18-412.el5
kernel-headers-2.6.18-412.el5

Since ca. 2 weeks the system gets unusable after 10-15 minutes of usage and I have no idea, why. I didn't touch it in a long time, so either I suspect a HW issue or somebody else did something without my knowledge.

I have added some error messages I get when logged in via SSH. But it basically repeats following with different tasks.

Code: Select all

 EDT 2013c version 4.1.2 20080704 (Red Hat 4.1.2-54)) #1 SMP Tue May 21 15:29:55
-bash: syntax error near unexpected token `('
[root@s15412833 ~]# 20534 20533 20532 20527 18465 2528 OK
df -H
Filesystem             Size   Used  Avail Use% Mounted on
/dev/sda1              4.3G   549M   3.8G  13% /
/dev/mapper/vg00-usr    52G   893M    49G   2% /usr
/dev/mapper/vg00-var   100G    62G    33G  66% /var
/dev/mapper/vg00-home
                        52G   148M    50G   1% /home
none                   4.3G   1.9M   4.3G   1% /tmp
[root@s15412833 ~]# INFO: task jbd2/dm-1-8:1719 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-1-8   D ffff810214c11c80     0  1719     91          1720  1711 (L-TLB)
 ffff81023bdd9d60 0000000000000046 ffff81023bdd9d00 ffff81023a79de98
 0000000000000001 000000000000000a ffff81023fb79040 ffff8100284d97b0
 000000993c740af7 00000000000020e9 ffff81023fb79228 000000028008d87a
Call Trace:
 [<ffffffff8002e4b9>] __wake_up+0x38/0x4f
 [<ffffffff884f5030>] :jbd2:jbd2_journal_commit_transaction+0x198/0x10a8
 [<ffffffff800a3ccf>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8003dda8>] lock_timer_base+0x1b/0x3c
 [<ffffffff8004b37e>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff884f90fa>] :jbd2:kjournald2+0x9a/0x1ec
 [<ffffffff800a3ccf>] autoremove_wake_function+0x0/0x2e
 [<ffffffff800a3ab7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff884f9060>] :jbd2:kjournald2+0x0/0x1ec
 [<ffffffff800a3ab7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032c23>] kthread+0xfe/0x132
 [<ffffffff8005dfc1>] child_rip+0xa/0x11
 [<ffffffff800a3ab7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032b25>] kthread+0x0/0x132
 [<ffffffff8005dfb7>] child_rip+0x0/0x11

INFO: task proftpd:2437 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
proftpd       D ffff81023c458000     0  2437      1          2445  2419 (NOTLB)
 ffff81023a823b88 0000000000000086 ffff81023a823db8 0000000000000001
 0000000000000000 000000000000000a ffff81023f74b080 ffff810009eb7830
 000000a06888ccff 000000000001bfbe ffff81023f74b268 0000000300000000
Call Trace:
 [<ffffffff8008f455>] default_wake_function+0x0/0xe
 [<ffffffff884f3ccc>] :jbd2:start_this_handle+0x2ed/0x3b7
 [<ffffffff800a3ccf>] autoremove_wake_function+0x0/0x2e
 [<ffffffff884f3e39>] :jbd2:jbd2_journal_start+0xa3/0xda
 [<ffffffff885167b0>] :ext4:ext4_dirty_inode+0x1a/0x46
 [<ffffffff80013de5>] __mark_inode_dirty+0x29/0x16e
 [<ffffffff8000c59b>] do_generic_mapping_read+0x347/0x359
 [<ffffffff8000d28d>] file_read_actor+0x0/0x159
 [<ffffffff8000c6f9>] __generic_file_aio_read+0x14c/0x198
 [<ffffffff80016edc>] generic_file_aio_read+0x36/0x3b
 [<ffffffff8000cf51>] do_sync_read+0xc7/0x104
 [<ffffffff800a3ccf>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8003a37d>] fcntl_setlk+0x243/0x273
 [<ffffffff80063af9>] mutex_lock+0xd/0x1d
 [<ffffffff8000b72f>] vfs_read+0xcb/0x171
 [<ffffffff8009b627>] recalc_sigpending+0xe/0x25
 [<ffffffff80011da1>] sys_read+0x45/0x6e
 [<ffffffff8005d116>] system_call+0x7e/0x83

The system start looks like this:

Code: Select all

Press any key to continue.
Press any key to continue.
Press any key to continue.
Press any key to continue.
Press any key to continue.

    GNU GRUB  version 0.97  (639K lower / 3144640K upper memory)

 +-------------------------------------------------------------------------+
 | CentOS (2.6.18-412.el5)                                                 |
 | CentOS (2.6.18-348.6.1.el5)                                             |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 |                                                                         |
 +-------------------------------------------------------------------------+
      Use the ^ and v keys to select which entry is highlighted.
      Press enter to boot the selected OS, 'e' to edit the
      commands before booting, 'a' to modify the kernel arguments
      before booting, or 'c' for a command-line.

   The highlighted entry will be booted automatically in 1 seconds.
  Booting 'CentOS (2.6.18-348.6.1.el5)'

root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.18-348.6.1.el5 ro root=/dev/sda1 console=tty0 console=
ttyS0,57600
   [Linux-bzImage, setup=0x1e00, size=0x20515c]
initrd /boot/initrd-2.6.18-348.6.1.el5.img
   [Linux-initrd @ 0x37cd5000, 0x31a198 bytes]

Linux version 2.6.18-348.6.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)) #1 SMP Tue May 21 15:29:55 EDT 2013
Command line: ro root=/dev/sda1 console=tty0 console=ttyS0,57600
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000010000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
 BIOS-e820: 00000000bfff0000 - 00000000bfffe000 (ACPI data)
 BIOS-e820: 00000000bfffe000 - 00000000c0000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec03000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000240000000 (usable)
DMI 2.3 present.
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 0-c0000000
SRAT: Node 0 PXM 0 0-240000000
Bootmem setup node 0 0000000000000000-0000000240000000
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
ACPI: PM-Timer IO Port: 0x508
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
Processor #2 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
Processor #3 15:1 APIC version 16
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 4, version 17, address 0xfec00000, GSI 0-15
ACPI: IOAPIC (id[0x05] address[0xfec01000] gsi_base[16])
IOAPIC[1]: apic_id 5, version 17, address 0xfec01000, GSI 16-31
ACPI: IOAPIC (id[0x06] address[0xfec02000] gsi_base[32])
IOAPIC[2]: apic_id 6, version 17, address 0xfec02000, GSI 32-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 000000000009f000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 00000000000e0000
Nosave address range: 00000000000e0000 - 0000000000100000
Nosave address range: 00000000bfff0000 - 00000000bfffe000
Nosave address range: 00000000bfffe000 - 00000000c0000000
Nosave address range: 00000000c0000000 - 00000000fec00000
Nosave address range: 00000000fec00000 - 00000000fec03000
Nosave address range: 00000000fec03000 - 00000000fee00000
Nosave address range: 00000000fee00000 - 00000000fee01000
Nosave address range: 00000000fee01000 - 0000000100000000
Allocating PCI resources starting at c4000000 (gap: c0000000:3ec00000)
SMP: Allowing 4 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 2063254
Kernel command line: ro root=/dev/sda1 console=tty0 console=ttyS0,57600
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Checking aperture...
CPU 0: aperture @ e8000000 size 128 MB
CPU 1: aperture @ e8000000 size 128 MB
ACPI: DMAR not present
Memory: 8236992k/9437184k available (2628k kernel code, 151100k reserved, 1680k data, 224k init)
Calibrating delay loop (skipped), value calculated using timer frequency.. 4788.06 BogoMIPS (lpj=2394033)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/0 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
Detected 12.468 MHz APIC timer.
SMP alternatives: switching to SMP code
Booting processor 1/4 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 4787.34 BogoMIPS (lpj=2393671)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
Dual-Core AMD Opteron(tm) Processor 2216 HE stepping 02
CPU 1: Syncing TSC to CPU 0.
CPU 1: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 756 cycles)
SMP alternatives: switching to SMP code
Booting processor 2/4 APIC 0x2
Initializing CPU#2
Calibrating delay using timer specific routine.. 4787.39 BogoMIPS (lpj=2393696)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 2/2 -> Node 0
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 0
Dual-Core AMD Opteron(tm) Processor 2216 HE stepping 02
CPU 2: Syncing TSC to CPU 0.
CPU 2: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 1092 cycles)
SMP alternatives: switching to SMP code
Booting processor 3/4 APIC 0x3
Initializing CPU#3
Calibrating delay using timer specific routine.. 4787.39 BogoMIPS (lpj=2393695)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 3/3 -> Node 0
CPU: Physical Processor ID: 1
CPU: Processor Core ID: 1
Dual-Core AMD Opteron(tm) Processor 2216 HE stepping 02
CPU 3: Syncing TSC to CPU 0.
CPU 3: synchronized TSC with CPU 0 (last diff 0 cycles, maxerr 1092 cycles)
Brought up 4 CPUs
NMI watchdog testing PASSED.
Disabling vsyscall due to use of PM timer
time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
time.c: Detected 2394.033 MHz processor.
migration_cost=977,484
checking if image is initramfs... it is
Freeing initrd memory: 3176k freed
MCE: In-kernel MCE decoding enabled.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Enabling HT MSI Mapping on 0000:00:01.0
PCI: Ignoring BAR0-3 of IDE controller 0000:00:02.1
pci 0000:00:01.0: PCI bridge to [bus 01-03]
pci 0000:01:0d.0: PCI bridge to [bus 02-03]
pci 0000:02:03.0: PCI bridge to [bus 03-03]
ACPI: PCI Interrupt Link [LN00] (IRQs 3 4 5 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN01] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN02] (IRQs 1 3 4 5 6 7 9 *11 12 14 15)
ACPI: PCI Interrupt Link [LN03] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN04] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN05] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN06] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN07] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN08] (IRQs 1 3 4 5 6 7 *9 11 12 14 15)
ACPI: PCI Interrupt Link [LN09] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN10] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN11] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN12] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN13] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN14] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN15] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN16] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN17] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN18] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN19] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN20] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN21] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN22] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN23] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN24] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN25] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN26] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN27] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN28] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN29] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LN30] (IRQs 1 3 4 5 6 7 9 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNUS] (IRQs *10)
ACPI: PCI Interrupt Link [LNSA] (IRQs 11) *0
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 11 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
ACPI: DMAR not present
PCI-DMA: Disabling AGP.
PCI-DMA: aperture base @ e8000000 size 131072 KB
PCI-DMA: using GART IOMMU.
PCI-DMA: Reserving 128MB of IOMMU area in the AGP aperture
system 00:07: ioport range 0x800-0x87f could not be reserved
system 00:07: ioport range 0xa80-0xa8f has been reserved
system 00:09: ioport range 0x580-0x58f has been reserved
system 00:09: ioport range 0x590-0x593 has been reserved
system 00:09: ioport range 0x700-0x703 has been reserved
system 00:09: ioport range 0xca0-0xcaf has been reserved
system 00:09: iomem range 0xfec00000-0xfec00fff has been reserved
system 00:09: iomem range 0xfec01000-0xfec01fff has been reserved
system 00:09: iomem range 0xfec02000-0xfec02fff has been reserved
system 00:09: iomem range 0xfee00000-0xfee00fff has been reserved
system 00:0a: iomem range 0x0-0x9ffff could not be reserved
system 00:0a: iomem range 0xe0000-0xfffff could not be reserved
system 00:0a: iomem range 0x100000-0xbfffffff could not be reserved
pci 0000:02:03.0: PCI bridge to [bus 03-03]
pci 0000:02:03.0:   bridge window [io  disabled]
pci 0000:02:03.0:   bridge window [mem 0xff400000-0xff4fffff]
pci 0000:02:03.0:   bridge window [mem 0xf6300000-0xf6afffff 64bit pref]
pci 0000:01:0d.0: PCI bridge to [bus 02-03]
pci 0000:01:0d.0:   bridge window [io  disabled]
pci 0000:01:0d.0:   bridge window [mem 0xff400000-0xff4fffff]
pci 0000:01:0d.0:   bridge window [mem 0xf6300000-0xf6afffff 64bit pref]
pci 0000:00:01.0: PCI bridge to [bus 01-03]
pci 0000:00:01.0:   bridge window [io  disabled]
pci 0000:00:01.0:   bridge window [mem 0xff400000-0xff4fffff]
pci 0000:00:01.0:   bridge window [mem 0xf6300000-0xf6afffff 64bit pref]
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
audit: initializing netlink socket (disabled)
type=2000 audit(1474366918.320:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
alg: No test for crc32c (crc32c-generic)
ksign: Installing public key data
Loading keyring
- Added public key 6D774ACD8167CA19
- User ID: CentOS (Kernel Module GPG key)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Detected use of extended apic ids on hypertransport bus
Detected use of extended apic ids on hypertransport bus
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
▒serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:06: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
brd: module loaded
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks HT1000: IDE controller at PCI slot 0000:00:02.1
SvrWks HT1000: chipset revision 0
SvrWks HT1000: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
ACPI: (supports S0 S1 S4 S5)
Initalizing network drop monitor service
Freeing unused kernel memory: 224k freed
Write protecting the kernel read-only data: 533k
Red Hat nash version 5.1.19.6 starting
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Setting up hotplug.
Creating block device nodes.
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading libata.ko module
Loading ahci.ko module
Loading 3w-sas.ko module
LSI 3ware SAS/SATA-RAID Controller device driver for Linux v3.26.00.028-2.6.18RH.
Loading 3w-xxxx.ko module
3ware Storage Controller device driver for Linux v1.26.03.000-2.6.18RH.
Loading 3w-9xxx.ko module
3ware 9000 Storage Controller device driver for Linux v2.26.08.007-2.6.18RH.
Loading arcmsr.ko module
GSI 16 sharing vector 0xA9 and IRQ 16
ACPI: PCI Interrupt 0000:03:0e.0[A] -> GSI 18 (level, low) -> IRQ 169
Areca RAID Controller0: F/W V1.49 2010-12-02 & Model ARC-1110
scsi0 : Areca SATA Host Adapter RAID Controller
 v1.20.00.15.rhel5 2010/07/27
  Vendor: Areca     Model: ARC-1110-VOL#00   Rev: R001
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 976562176 512-byte hdwr sectors (500000 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
SCSI device sda: 976562176 512-byte hdwr sectors (500000 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
sd 0:0:0:0: Attached scsi disk sda
  Vendor: Areca     Model: RAID controller   Rev: R001
  Type:   Processor                          ANSI SCSI revision: 00
Loading ata_piix.ko module
Loading ehci-hcd.ko module
ACPI: PCI Interrupt 0000:00:03.2[A] -> GSI 10 (level, low) -> IRQ 10
ehci_hcd 0000:00:03.2: EHCI Host Controller
ehci_hcd 0000:00:03.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:03.2: irq 10, io mem 0xff6b6000
ehci_hcd 0000:00:03.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
Loading ohci-hcd.ko module
ACPI: PCI Interrupt 0000:00:03.0[A] -> GSI 10 (level, low) -> IRQ 10
ohci_hcd 0000:00:03.0: OHCI Host Controller
ohci_hcd 0000:00:03.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:03.0: irq 10, io mem 0xff6b4000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:03.1[A] -> GSI 10 (level, low) -> IRQ 10
ohci_hcd 0000:00:03.1: OHCI Host Controller
ohci_hcd 0000:00:03.1: new USB bus registered, assigned bus number 3
ohci_hcd 0000:00:03.1: irq 10, io mem 0xff6b5000
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
Loading uhci-hcd.ko module
USB Universal Host Controller Interface driver v3.0
Loading jbd.ko module
Loading ext3.ko module
Loading raid1.ko module
md: raid1 personality registered for level 1
Loading mptbase.ko module
Fusion MPT base driver 3.04.20rh1
Copyright (c) 1999-2008 LSI Corporation
Loading scsi_transport_spi.ko module
Loading mptscsih.ko module
Loading mptspi.ko module
Fusion MPT SPI Host driver 3.04.20rh1
Loading dm-mod.ko module
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.11.6-ioctl (2011-02-18) initialised: dm-devel@redhat.com
Loading dm-log.ko module
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
Loading dm-mem-cache.ko module
Loading dm-region_hash.ko module
Loaddevice-mapper: dm-raid45: initialized v0.2594l
ing dm-message.ko module
Loading dm-raid45.ko module
Loading sata_nv.ko module
Loading pata_smd: linear personality registered for level -1
is.ko module
Loading sata_sis.kmd: multipath personality registered for level -4
o module
Loading linear.ko moduraid5: automatically using best checksumming function: generic_sse
le
   generic_sse:  7352.000 MB/sec

Loading xor.ko raid5: using function: generic_sse (7352.000 MB/sec)
module
Loading raid456.ko module
raid6: int64x1   2164 MB/s
raid6: int64x2   2804 MB/s
raid6: int64x4   2375 MB/s
raid6: int64x8   2105 MB/s
raid6: sse2x1    3292 MB/s
raid6: sse2x2    4246 MB/s
raid6: sse2x4    4585 MB/s
raid6: using algorithm sse2x4 (4585 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
Loading xfs.ko module
SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
Waiting for driver initialization.
Scanning and conmd: Autodetecting RAID arrays.
figuring dmraid md: autorun ...
supported devicemd: ... autorun DONE.
s
Trying to resume from /dev/sda2
No suspend signature on swap, not resuming.
Creating root device.
Mounting root filesystem.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
Setting up other filesystems.
Setting up new root fs
no fstab.sys, mounting internal defaults
Switching to new root and running init.
unmounting old /dev
unmounting old /proc
unmounting old /sys
SELinux:  Disabled at runtime.
type=1404 audit(1474366944.144:2): selinux=0 auid=4294967295 ses=4294967295
INIT: version 2.86 booting
                Welcome to  CentOS release 5.11 (Final)
                Press 'I' to enter interactive startup.
Setting clock  (utc): Tue Sep 20 10:22:26 UTC 2016 [  OK  ]
Starting udev: [  OK  ]
Loading default keymap (us): [  OK  ]
Setting hostname localhost.localdomain:  [  OK  ]
mdadm: /dev/sda1 has no superblock - assembly aborted
mdadm: /dev/sda3 has no superblock - assembly aborted
No devices found
Setting up Logical Volume Management:   3 logical volume(s) in volume group "vg00" now active
[  OK  ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda1
root: clean, 6427/262944 files, 142324/1050241 blocks
[/sbin/fsck.ext4 (1) -- /usr] fsck.ext4 -a /dev/vg00/usr
usr: recovering journal
usr: clean, 22099/3211264 files, 419365/12845056 blocks
[/sbin/fsck.ext4 (1) -- /var] fsck.ext4 -a /dev/vg00/var
var: recovering journal
var: Clearing orphaned inode 90 (uid=0, gid=0, mode=0140777, size=0)
var: clean, 60384/6160384 files, 15443583/24641536 blocks
[/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a /dev/vg00/home
home: recovering journal
home: clean, 11/3211264 files, 237603/12845056 blocks
[  OK  ]
Remounting root filesystem in read-write mode:  [  OK  ]
Mounting local filesystems:  [  OK  ]
Enabling /etc/fstab swaps:  [  OK  ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Enabling ondemand cpu frequency scaling: [  OK  ]
[  OK  ] iSCSI daemon: [  OK  ]
[  OK  ]
Applying iptables firewall rules: [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:
Determining IP information for eth0... done.
[  OK  ]
Starting system logger: [  OK  ]
Starting kernel logger: [  OK  ]
iscsid (pid  2041) is running...
Setting up iSCSI targets: iscsiadm: No records found
[  OK  ]
Starting named: You need to implement a remote task_setrlimit in your security module and call it directly from this functionWARNING: at security/security.c:51 security_ops_task_setrlimit()

Call Trace:
 [<ffffffff8012f113>] security_ops_task_setrlimit+0x87/0x96
 [<ffffffff8009dcd6>] do_prlimit+0xd7/0x1d2
 [<ffffffff8009ee1f>] sys_setrlimit+0x36/0x43
 [<ffffffff8005d116>] system_call+0x7e/0x83

[  OK  ]
Starting system message bus: [  OK  ]
Mounting other filesystems:  [  OK  ]
Starting monitoring for VG vg00:   3 logical volume(s) in volume group "vg00" monitored
[  OK  ]
Starting sshd: [  OK  ]
Starting ntpd: [  OK  ]
Starting exim: JBD: barrier-based sync failed on dm-0-8 - disabling barriers
[  OK  ]
Starting proftpd: [  OK  ]
Starting crond: [  OK  ]
Starting anacron: [  OK  ]
Starting HAL daemon: [  OK  ]
Starting WRS(Webgate Registration System):
[  OK  ]
sending warning mail.
Plz. wait
         2016-09-20 12:22:50 1bmICQ-0000eM-2B Cannot open main log file "/var/log/exim/main.log": Permission denied: euid=93 egid=93
                                                                                                                                    2016-09-20 12:22:50 1bmICQ-0000eM-2B <= root@s15412833.onlinehome-server.info U=root P=local S=423
                                             2016-09-20 12:22:50 1bmICQ-0000eM-2B Cannot open main log file "/var/log/exim/main.log": Permission denied: euid=93 egid=93
                                                                                                                                                                        exim: could not open panic log - aborting: see message(s) above
                                              wrsd: no process killed
                                                                     WRS_FTPD Port : 21
                                                                                       WRS Addr : 127.0.0.1
                                                                                                           WRS UDP Port : 6837
                                                                                                                              WRS TCP Port : 6836

CentOS release 5.11 (Final)
Kernel 2.6.18-348.6.1.el5 on an x86_64

As I am not a trained admin and have only limited knowledge I appreciate any help, where to start to look for the problem.

I already tried to update everything via yum, but the problem continues.
One other issue I noticed that somebody filled up the volume I mounted for the FTP usage to reach 100%, but I deleted that already and talk very stern with my colleague not to use more then 70G on a 100G that is shared with multiple other people. So this should not be the problem anymore.

So if anybody has any idea, I appreciate the help

thank you very much
Last edited by brotsalami on 2016/09/20 10:36:22, edited 1 time in total.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS server keeps hanging/crashing

Post by TrevorH » 2016/09/20 10:31:40

Kernel 2.6.18-348.6.1.el5 on an x86_64
That's a very old kernel and not the one that should come with 5.11. The current 5.11 kernel is 2.6.18-412.el5 - look at /boot/grub/grub.conf and see if it contains an entry for the -412 kernel and if the default= line points to it (it counts entries starting at 0 for the first etc).
-bash: syntax error near unexpected token `('
Not part of the problem but if you get that when you login then it;s likely that you have an error in one of the bash startup files like ~/.bash_profile or ~/.bashrc. If it does this for all users then it could also be a problem in one of the scripts in /etc/profile.d or in /etc/profile itself.
[root@s15412833 ~]# INFO: task jbd2/dm-1-8:1719 blocked for more than 120 seconds.
But this does look like it might be part of the problem. That's telling you that an i/o operation took more than 2 minutes to complete and that's about a century in computer terms! This is likely to either be telling you that it's a hardware problem with your disk subsystem or that your system is so busy that the disks just cannot keep up. If you have opened ftp to the entire world and someone has discovered it and worked out how to login then perhaps they are uploading/downloading massive quantities of data and using it as their own private dropbox but they'd have to be doing an awful lot to kill the disks like that.

Do you have a hardware RAID controller on the machine? lspci -nn might tell you if you're not sure. Do you have a /proc/mdstat file and what are its contents? Try running smartctl -a against all of your hard disks and see if any are reporting problems.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

brotsalami
Posts: 5
Joined: 2016/09/20 09:39:46

Re: CentOS server keeps hanging/crashing

Post by brotsalami » 2016/09/20 10:50:36

Dear TrevorH,

thank you for your immediate help. that is amazingly fast.

I have had updated the kernel already, but as this didn't change I went back to the original one, that I knew was working before. i will change the grub.conf to the newer version again.

lspci -- nn shows

Code: Select all

00:01.0 PCI bridge [0604]: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge [1166:0036                                   ]
00:02.0 Host bridge [0600]: Broadcom BCM5785 [HT1000] Legacy South Bridge [1166:                                   0205]
00:02.1 IDE interface [0101]: Broadcom BCM5785 [HT1000] IDE [1166:0214]
00:02.2 ISA bridge [0601]: Broadcom BCM5785 [HT1000] LPC [1166:0234]
00:03.0 USB controller [0c03]: Broadcom BCM5785 [HT1000] USB [1166:0223] (rev 01                                   )
00:03.1 USB controller [0c03]: Broadcom BCM5785 [HT1000] USB [1166:0223] (rev 01                                   )
00:03.2 USB controller [0c03]: Broadcom BCM5785 [HT1000] USB [1166:0223] (rev 01                                   )
00:04.0 Ethernet controller [0200]: Intel Corporation 82541GI Gigabit Ethernet C                                   ontroller [8086:1076] (rev 05)
00:06.0 VGA compatible controller [0300]: XGI Technology Inc. (eXtreme Graphics                                    Innovation) Z7/Z9 (XG20 core) [18ca:0020]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] Miscellaneous Control [1022:1103]
00:19.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] HyperTransport Technology Configuration [1022:1100]
00:19.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] Address Map [1022:1101]
00:19.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] DRAM Controller [1022:1102]
00:19.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opte                                   ron] Miscellaneous Control [1022:1103]
01:0d.0 PCI bridge [0604]: Broadcom BCM5785 [HT1000] PCI/PCI-X Bridge [1166:0104                                   ] (rev c0)
02:03.0 PCI bridge [0604]: Intel Corporation 80331 [Lindsay] I/O processor (PCI-                                   X Bridge) [8086:0335] (rev 0a)
03:0e.0 RAID bus controller [0104]: Areca Technology Corp. ARC-1110 4-Port PCI-X                                    to SATA RAID Controller [17d3:1110]
the content of /proc/mdstat is not much

Code: Select all

[root@s15412833 proc]# vi mdstat
Personalities :
unused devices: <none>
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"mdstat" [readonly] 2L, 40C
smartctl doesn't seem to installed as I get a command not found response.

I stopped the FTP service for now, to check if this could be the problem. There are only two users actually allowed to write something on FTP but I will check more deeply on this.

Thank you so far.

brotsalami
Posts: 5
Joined: 2016/09/20 09:39:46

Re: CentOS server keeps hanging/crashing

Post by brotsalami » 2016/09/20 10:58:24

installed smartctl, but have some problems to get it working properly

Code: Select all

[root@s15412833 proc]# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
[root@s15412833 proc]# smartctl -a sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-412.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl open device: sda failed: No such device
[root@s15412833 proc]# smartctl -a /dev/sda

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS server keeps hanging/crashing

Post by TrevorH » 2016/09/20 10:59:55

So you have an Areca hardware RAID controller and linux software RAID is not in use. I did a quick google and found https://wiki.debian.org/LinuxRaidForAdmins#arcmsr which might be useful though it's for Debian systems not CentOS but still may be helpful. I'd check with the controller to make sure that all its disks are OK and that none have failed.

For smartctl, read man smartctl and search for areca - it shows you how to interrogate the disks behind the controller.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

brotsalami
Posts: 5
Joined: 2016/09/20 09:39:46

Re: CentOS server keeps hanging/crashing

Post by brotsalami » 2016/09/20 13:08:10

Took me a while to get the core of the controller. The Areca CLI. FYI: command is arcmsr_cli64

smart info shows foollowing

Code: Select all

CLI> disk info drv=1
Drive Information
===============================================================
IDE Channel                        : 1
Model Name                         : ST3500630AS
Serial Number                      : 5QG0GYGL
Firmware Rev.                      : 3.AAE
Disk Capacity                      : 500.1GB
Device State                       : NORMAL
Timeout Count                      : 0
Media Error Count                  : 0
Device Temperature                 : 30 C
SMART Read Error Rate              : 119(6)
SMART Spinup Time                  : 93(0)
SMART Reallocation Count           : 100(36)
SMART Seek Error Rate              : 83(30)
SMART Spinup Retries               : 100(97)
SMART Calibration Retries          : N.A.(N.A.)
===============================================================
GuiErrMsg<0x00>: Success.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: CentOS server keeps hanging/crashing

Post by TrevorH » 2016/09/20 15:17:00

I'm afraid that means nothing to me :-(
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

brotsalami
Posts: 5
Joined: 2016/09/20 09:39:46

Re: CentOS server keeps hanging/crashing

Post by brotsalami » 2016/09/21 06:20:25

Dear TrevorH,

yeah. I was assuming that. The info doesn't give much away.

There is another SMART connected command inside the Areca Rescue CLI, but I get only "Segmentation Failure" as response.

So maybe this is the problem, while this would be strange, as I didn't touch any of the Volumes within the last 4-5 years. So why should it happen now.

OK. I have a few more leads that I can investigate, and I will let you know if I can find it.

Thank you very much for your help. You helped me a lot.

brotsalami

Post Reply