Monthly Archives: August 2015

Is your ODA disk shelf randomly disconnecting from your compute nodes?

I’ve been doing a lot of work with ODAs (Oracle Database Appliance) lately and I have to say that I’m VERY impressed. It’s amazing how smooth Oracle can make things when they know exactly what hardware and software will be on a machine. I can’t imagine how long it would take to patch an ILOM, BIOS, SSDs, HDs, OS, various controllers, Grid Infrastructure, and Database Software, etc. if you were doing everything by scratch. With an ODA all of that can be done by downloading two files and running a few commands.

We did run into an ‘interesting’ situation where the disk shelf would randomly disconnect from the compute nodes. A restart of ohasd would bring everything back for a while (weeks, to sometimes hours), but it was really troubling to say the least. After trying a whole bunch of things, Oracle finally asked us to take a picture of the back of our disk shelves… See those service ports below? Someone, at some point in time, had plugged Ethernet cables into those ports. And that was the issue..

Those ports are only used during initial machine configuration and should NOT be used on a running machine. What would happen is a buffer would fill up and/or get some kind of packet that it didn’t know what to do with and the controller would reset. Of course the same thing would happen to the other controller at the same time. If you lose both controllers to your disk shelf simultaneously, bad things tend to happen to your database…

As far as we can tell those cables were plugged in during installation and it took over a year before the resets happened!

The below is from an X4-2, your ODA might be a bit different.

X4-2ODA

Advertisement

Convert your database from a single instance to RAC using rconfig… Beware of CFS choice if you are using ACFS!

Oracle provides a really cool utility called rconfig that uses a very simple XML file as input. It assumes you’ve already got a clustered file system in place, two compute nodes (at least) with user equivalency set up (the same user exists on both systems with the same user id and typically the same password).

You can find sample rconfig templates in your $ORACLE_HOME/assistants/rconfig/sampleXML directory. If you edit the file it’s all pretty self explanatory, or at least you think so…

In the file you’ll find the following:

<!--Specify the type of storage to be used by rac database. Allowable values are CFS|ASM.
    The non-rac database should have same storage type.
    ASM credentials are no needed for conversion. --> 
 <n:SharedStorage type="ASM">

Hmmmm you think. Yes, ACFS is based on ASM. Which should I choose? Well ASM is still being used under the covers, but ACFS is the new thing that Oracle wants us to use, so I should probably update the SharedStorage type of ASM to CFS. (And yes, Oracle did misspell not as no.)

The next section says this:

<!--Specify Database Area Location to be configured for rac database.
    If this field is left empty, current storage will be used for rac database.
    For CFS, this field will have directory path. --> 
 <n:TargetDatabaseArea>+ASMDG</n:TargetDatabaseArea> 

And you may think that the Target Database Area should be something like this /u02/app/oracle/oradata/datastore/.ACFS/orcl since the file system was ACFS. This all seems to be logical…

However when you run rconfig convert.xml with the above entries YOU WILL DELETE ALL YOUR DATABASE DATA FILES! (You did take a backup, didn’t you?).

[main] [ 2015-08-05 13:24:06.174 EDT ] [StorageManagement.deleteDataArea:1774]
Deleting new Storage Location:/u02/app/oracle/oradata/datastore/.ACFS/snaps/orcl/ORCL

I’ll let you guess how (and when) I figured this out.

I’ve asked Oracle to log a bug against this, as it really does have the potential to be confusing and the consequences of the mistake are so bad. I did bet the Oracle Support engineer a $25 Kiva donation that Oracle would come back and say “Not a bug”. He didn’t take me up on it…

So what should you do? If you are using ACFS then just leave the SharedStorage type at ASM and go ahead and empty out the TargetDatabaseArea if your data files are already. Everything will convert just fine and life will be good.

Happy RAC’ing!

Update: We’ve got the first step underway! A bug has been logged. Let’s see if it makes it into the next release:

Bug 21644512 – UNCLEAR COMMENTS IN RCONFIG SAMPLE XML FILES CAN LEAD TO DELETION OF ALL DATABAS

Here’s what I’ve asked them to do:

This comment (in all the sample files):

<!–Specify the type of storage to be used by rac database. Allowable values are CFS|ASM. The non-rac database should have same storage type. ASM credentials are no needed for conversion. –>
<n:SharedStorage type=”ASM”>

Should be this:

<!–Specify the type of storage to be used by the RAC database. Allowable values are CFS|ASM. ASM should be used for both native ASM deployments and ACFS on ASM deployments. The non-RAC database should have same storage type. ASM credentials are not needed for conversion. –>
<n:SharedStorage type=”ASM”>

Two fixes above: Clarification that ACFS is NOT CFS and then fixed the spelling mistake (“not” instead of “no”) in the last sentence. If you want to be really pedantic then rac should really be RAC.