very odd stability problem

If it doesn't fit in another category, ask it here.
Post Reply
bjoyner
Posts: 7
Joined: 2011/06/02 12:24:21

very odd stability problem

Post by bjoyner » 2011/06/02 13:22:02

Let me begin by describing our setup. Before I arrived our systems were setup on HP small form factor PCs. The system admin at the time due to the complexity of the install and the custom POS system that was in use, designed copies to be made by mirroring the drive and then breaking the mirror, thus creating 2 drives. The new disk would be used in the new system. There are no records of how the POS system works or is installed or how the server itself is configured making a fresh install not possible. Right before I started here the company started switching from the HP PCs to clearcube blades we previously used the R3150 blades and the copies work on them, however occasionally there seems to be some stability issues and I'm not sure if its due to CentOS not having the proper drivers or more likely the fact that is a broken mirror from another computer. However there are 16 R3150 blades successfully using this method of install. The problem arises when making copies of those blades for use in the new R3080D blades. during the boot process the system always hangs after starting Bluetooth services and before starting netfs. This will also occasionally happen on the R3150 blades as well. I can enter the operating system via single user mode and turn off netfs and the system will finish the boot process but never show the login screen just the blue background( no we dont have any drives that netfs is actualy loading or needed for). My current kernel is 1.6.18-164.el5PAE. When creating the mirrored copy the drive has to be placed into another computer I cant create it from the blade as it only has 1 sata port.


I have tried building a clonezilla box but since the disks have a software raid on them it is not compatable. I cannot remove the raid

I have tried installing a fresh system but due to the complexity of the POS system I cant get that up and running.

I tried upgrading the system first then just the kernel by itself, both times it broke the POS system. This system was written in ruby to use the legacy drivers and software on the servers which even when i reinstall still wont work.

I have tried copies from both working blades and working HP PCs.

I have tried adjusting bios settings, boot options and everything else i can think of.

Does anybody know of a way I can possibly stabilize these install to work on the new hardware, or a way to remove the MD software raid from teh disks without loosing any data. if I could remove the raid I could clone the boxes using clonezilla right onto the blades.

Cheile
Posts: 29
Joined: 2009/07/29 17:44:02

Re: very odd stability problem

Post by Cheile » 2011/06/03 21:11:49

The first thing I would suggest would to be to disable any services that you don't need. If you're not using netfs or bluetooth then just turn them off. You can get a list of services that you're currently running by issuing the following command:


[code]
chkconfig --list | grep `runlevel | awk '{print $2}'`:on
[/code]

Then anything you see that you don't need can be turned off with

[code]
chkconfig ${servicename} off
[/code]

Also I would do a

[code]
sed -e 's/rhgb//' -e 's/quiet//' -i /boot/grub/grub.conf
[/code]

Which will get rid of CentOS hiding what is going on as you boot.

One thing to check if it is hanging right before netfs is the network setup/drivers. That might be at least a place to start.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

very odd stability problem

Post by pschaff » 2011/06/05 22:59:24

Welcome to the CentOS fora. Reading [url=https://www.centos.org/modules/newbb/viewforum.php?forum=47]FAQ & Readme First[/url] is recommended for new users.

You can use mdadm to zero the superblock, but I cannot guarantee that will not make the disk unbootable, not having tried it:[code]mdadm --zero-superblock /dev/sda (or /dev/hda or whatever)[/code]

If more help is needed then please [url=http://www.centos.org/modules/newbb/viewtopic.php?topic_id=25128&forum=47]provide more information about your system[/url] by showing the output file from "./getinfo.sh".

bjoyner
Posts: 7
Joined: 2011/06/02 12:24:21

Re: very odd stability problem

Post by bjoyner » 2011/06/06 12:22:06

As I stated before i turned off netfs, but when the computer finished loading, rather than getting the login prompt i just got the blue background. I could set the runtime to lvl 3 and go strait to a console, but when i run the startx command i still get just the blue screen. My employees need to use the gui.

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: very odd stability problem

Post by pschaff » 2011/06/06 15:21:29

Have you tried running system-config-display as root? See [url=http://wiki.centos.org/HowTos/ConfigureNewVideoCard]HowTos/ConfigureNewVideoCard[/url]. Be sure to note any errors on the console or in /var/log/Xorg.*.

Is the problematic KVM switch from your other thread in use?

If more help is needed please supply the information requested in my last post.

Cheile
Posts: 29
Joined: 2009/07/29 17:44:02

Re: very odd stability problem

Post by Cheile » 2011/06/06 20:56:05

[quote]
bjoyner wrote:
As I stated before i turned off netfs, but when the computer finished loading, rather than getting the login prompt i just got the blue background. I could set the runtime to lvl 3 and go strait to a console, but when i run the startx command i still get just the blue screen. My employees need to use the gui.[/quote]

You may be able to find some information about your issue by looking at /var/log/Xorg.0.log. Specifically grep EE /var/log/Xorg.0.log which will find the errors.

bjoyner
Posts: 7
Joined: 2011/06/02 12:24:21

Re: very odd stability problem

Post by bjoyner » 2011/06/07 12:19:20

I have resolved this issue, In order to get the server to load properly I need to disable Bluetooth, HIDD, AHCPI, Firstboot, NetFS, and comment out the scripts in the rc.local file. Now I just need to find a way to run those rc.local scripts in a different way since they are required for our POS. Odd that they would cause problems on this particular model of blade but not one that is almost identical only 1 version earlier. I still have the issue with the keyboard and mouse becoming disabled after kernel loads, but that is posted separately in the hardware section.

Post Reply