Monthly Archives: January 2019

Names can be up to 50 BYTES in length…

…said no business user ever.

Business users don’t think in BYTES, they think in CHARACTERS. Tables in databases hold information for business users to use, not (generally) for geeky computer programmers who think in bytes.

Unfortunately, many databases (including Oracle) have their defaults configured to use bytes as the length for table columns. I believe that this is wrong…

Let’s first demonstrate the issue and then we’ll see how it should be fixed and finally how you actually have to fix it.

The Issue

Let’s say we have a business requirement from our customer that reads like the following (Note this is a really bad requirement and any competent DBA should never let a requirement like the below get into a database they manage, but let’s go with it):

“For our business we allow all of our customers to rate each item we sell with with 1 to 5 hearts. The rating should be stored using one of the following values:

  1. ♥♥
  2. ♥♥♥
  3. ♥♥♥♥
  4. ♥♥♥♥♥

We understand that this is ridiculous and that we should really use numbers, but we really want it stored this way.”

Let’s create a table that will store the rating. Note, that this is a horrible table and it’s missing just about everything that makes a table a good table, but we want to focus on the character length issue instead of proper database modeling techniques.

create table t1(c1 varchar2(5));

With our table created, let’s start to add each of the ratings to the table:

insert into table t1 values('♥');
one row inserted
insert into table t1 values('♥♥');
one row inserted
insert into table t1 values('♥♥♥');
error

Hmmm…. Our table should store up to 5 characters, and yet we can’t actually store 5 characters. Why is this? Let’s use SQL Developer (or SQLcl) to take a look at our table using the new(er) info command:

info t1
TABLE: T1 
	 LAST ANALYZED: 
	 ROWS         : 
	 SAMPLE SIZE  : 
	 INMEMORY     :DISABLED 
	 COMMENTS     : 

Columns 
NAME         DATA TYPE          NULL  DEFAULT    COMMENTS
 C1          VARCHAR2(5 BYTE)   Yes

As we can see, table t1 has been defined to allow up to 5 bytes of data to be stored in the column c1. Since the heart symbol is greater than one byte, we can only fit two hearts into the column. When we try to add a row with 3 hearts, we get an error.

The Right Fix (that’s really the wrong fix)

The ‘right’ fix for this is to do the following as a SYSDBA user in the root container of your database (You have moved on from the officially deprecated non-cdb architecture, right?):

alter system set nls_length_semantics = 'CHAR';

However, you’ll see in the documentation that you shouldn’t do this as it ends up breaking things, especially Oracle Text which has quite a few notes in Oracle Support that basically say the only way to get Oracle Text to work is to NOT use the above statement.

Screen Shot 2018-10-16 at 11.26.41 AM

In my opinion, this is a bug. Oracle has finally (starting with Oracle Database 12c) defaulting the character set of a newly created database to AL32UTF8 (which should be the character set used on all newly created databases), and they should in a future release, also fix the NLS_LENGTH_SEMANTICS issue. (At least in my opinion they should! I’m pretty sure it’s not part of the actual plans for future releases.) The default should be CHAR and code that would break if CHARs were used instead of BYTEs (I’m looking at you Oracle Text…) should be fixed to either specify BYTE directly, or fixed to work with CHAR.

The Actual Fix(es)

Since the right fix should work, but really doesn’t, how can you address the issue correctly? There are two different methods to fix the issue:

  1. Correctly set DDL definitions to use CHAR.
  2. Alter the user’s session setting the NLS_LENGTH_SEMANTICS to CHAR.

1. The Correct DDL Fix

This is actually the best way to fix the issue as you can be assured that no matter what the defaults are in the database, your tables are always created correctly. In the same way that you should never trust the date format in a database to be the (also incredibly horrible) DD-MON-RR format and instead always use the to_date function with a correct format mask, you  should always create the length of a character based column with the correct length, and the correct length is character based, not byte based (unless the business requirements are to REALLY store something with a length of bytes which would be strange indeed for 99.99% of business requirements).

So, instead of this:

create table t1(c1 varchar2(5));

Do this:

create table t1(c1 varchar2(5 char));

By explicitly creating the table correctly, everything will work correctly in any database:

drop table t1;
table dropped

create table t1(c1 varchar2(5 char));
table created

insert into table t1 values('♥');
one row inserted
insert into table t1 values('♥♥');
one row inserted
insert into table t1 values('♥♥♥');
one row inserted
insert into table t1 values('♥♥♥♥');
one row inserted
insert into table t1 values('♥♥♥♥♥');
one row inserted

Great! Everything worked just the way we wanted.

2. The Alter Session Fix

Potentially the above fix could require a lot of work if you have a few thousand, or tens of thousands of columns defined without specifying the correct CHAR attribute that should have been specified with each create table statement.

You could, brute force attempt to fix each statement, or instead, just run the following line BEFORE you run all your create table statements that are not as semantically specific as they should be:

alter session set nls_length_semantics = 'CHAR';

Now let’s recreate our t1 table:

drop table t1;
table dropped
create table t1(c1 varchar2(5)); --Still semantically incorrect!
table created
info t1
TABLE: T1 
	 LAST ANALYZED: 
	 ROWS         : 
	 SAMPLE SIZE  : 
	 INMEMORY     :DISABLED 
	 COMMENTS     : 

Columns 
NAME         DATA TYPE          NULL  DEFAULT    COMMENTS
 C1          VARCHAR2(5 CHAR)   Yes

By altering the NLS_LENGTH_SEMANTICS of our session, we’ve managed to make our semantically incorrect statement of  “create table t1(c1 varchar2(5));” work correctly.

Note that unless you work this out with Oracle Support directly ahead of time, you definitely shouldn’t alter your session and adjust your NLS_LENGTH_SEMANTICS with any Oracle supplied scripts. For example it would be wrong to do this before running the Oracle APEX install scripts. But for your OWN tables that you are creating in the database, you should always define them in the most semantically correct way, and if you can’t do that for some reason, then you can alter the session to get the correct storage of your data using semantically incorrect statements.

Remember… No business user ever said “Names can be 50 bytes in length.”

Rich