Is your ODA disk shelf randomly disconnecting from your compute nodes?

I’ve been doing a lot of work with ODAs (Oracle Database Appliance) lately and I have to say that I’m VERY impressed. It’s amazing how smooth Oracle can make things when they know exactly what hardware and software will be on a machine. I can’t imagine how long it would take to patch an ILOM, BIOS, SSDs, HDs, OS, various controllers, Grid Infrastructure, and Database Software, etc. if you were doing everything by scratch. With an ODA all of that can be done by downloading two files and running a few commands.

We did run into an ‘interesting’ situation where the disk shelf would randomly disconnect from the compute nodes. A restart of ohasd would bring everything back for a while (weeks, to sometimes hours), but it was really troubling to say the least. After trying a whole bunch of things, Oracle finally asked us to take a picture of the back of our disk shelves… See those service ports below? Someone, at some point in time, had plugged Ethernet cables into those ports. And that was the issue..

Those ports are only used during initial machine configuration and should NOT be used on a running machine. What would happen is a buffer would fill up and/or get some kind of packet that it didn’t know what to do with and the controller would reset. Of course the same thing would happen to the other controller at the same time. If you lose both controllers to your disk shelf simultaneously, bad things tend to happen to your database…

As far as we can tell those cables were plugged in during installation and it took over a year before the resets happened!

The below is from an X4-2, your ODA might be a bit different.

X4-2ODA

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: