Unnecessary Field Validations

by Krishna on August 25, 2007

I was reading this interesting article from Haacked.com about email validation. The author Phil Haack illustrates how most email validations are really too strict because the RFC (Request for Comments) really had a much liberal interpretation of what characters an email address can have. Although in practice, most mainstream email services (like Yahoo! Mail and Gmail) do have certain email address conventions, there will be a few valid email addresses that do not get through.

Here are a couple of validations on other common fields that can be frustrating for people:

  1. Phone numbers: Many novice developers instinctively go and put an all-digit validation or a masked edit (like “(111) 555-9999”) to the field. This is wrong for many reasons:
    1. You may have mnemonic phone numbers like 800-ASK-ABCD, which use alphabets.
    2. International phone numbers will require additional digits and characters.
    3. Phone extensions and special codes (#, *, etc.) for reaching people may not pass such validation.
    4. People may want to use their unique way of viewing phone numbers like, say, 111.555.9999.
  2. Names: The most common interface for names is having 3 text boxes – two large ones for first name and last name respectively and a short one for the middle initial. And the validations are usually always alphabet-only. Again, the issues are:
    1. How do you represent a name like “Warnakulasuriya Patabendige Ushantha Joseph Chaminda Vaas“? OK, I am being a little extreme there, but many non-American-born people have more than 3 words in their names, including me.
    2. On the other hand, some people don’t have a middle initial. My wife (with only 2 words in her name) had a problem with the US Citizenship and Immigration Services (USCIS) website; fortunately, it was only a warning and not a complete block.
    3. Names can contain numbers and punctuation marks. Like “John Smith, Jr.”, “Daniel D’Souza”, “Mary Wilson-Smith”, “Jack Ford, III”, etc.
    4. European names can contain more than the normal ASCII set characters.

I am not saying that such user interfaces are dumb. Far from it, there are good reasons why people may want such an interface. In the case of the name fields, for example, administrators may want to search individually on first and last names, and the application must force the end user to specify them separately. Many people are more comfortable with reading a list of users with last names starting first, like “Smith, John” rather than “John Smith”.

Also, some applications do try to get around some of the issues by having separate fields for country codes and extensions, or even the three parts of the phone number itself. Such a function may be useful if you want to, say, show only people with a phone number in the United States or a regional area code. I can think of salespeople needing such functionality for cold calling purposes.

However, my point remains. Many times, developers add such functionality without any thought about the end user. Usually, they have seen some other similar user interface and think it is the right thing to do. I plead guilty to having done the same thing in the past.

Some questions that may help in determining whether the validation and user layout is indeed required:

  1. Does it matter if the field contain junk data? For example, if a person enters “!@#$%^&*()” as a callback telephone number in a “Contact Me” form, he obviously does not want you to call him. So do you care? (Aside: Try doing a Google search on that phone number. LOL.)
  2. Does the end user distinguish between the different parts of the data? For example, the extension as a separate field is really meaningless because users are not going to search on phone extension. Or are they?
  3. Does the end user want to force the display of some fields in a certain way? If not, maybe you don’t need masked edit controls or separate fields.
  4. There are other application-specific fields too, which you may want to investigate. What fields have validation they don’t need, including a check for mandatory values?

Finally a word of caution: I don’t intend to demean all validations. Obviously, many of them are important to maintain data integrity. So analyze your situation and application needs carefully.


orcmid August 26, 2007 at 12:47 pm

I abhor those “800-ASK-ABCD” phone numbers, especially when my dialer is not a telephone keypad. It would be interesting to translate them internally, but then we may get into language issues about keypad alphabets too?

At the same time, I agree that there is too much mindless strictness and format and entry requirements that are simply unfriendly demonstrations of inadequate care.

Krish August 26, 2007 at 7:59 pm

Thanks for your comment, orcmid

The problem with translation is that the original mnemonic is also lost when it is more convenient. Perhaps a solution would be to display the corresponding number next to the phone field. But then as you said, we may be guessing at the keypad alphabets.

BalaIvesian August 31, 2007 at 4:44 am

Hi Krish,

This article is really useful because now a days many of us are not following field validations properly even though by using .NET validation controls. This article is really useful like when to use and mainly how to aviod unnecessary field validations.

Thanks & regards,

Krish August 31, 2007 at 10:13 am

Thanks for your comments, Balaji

Bruce August 31, 2007 at 3:29 pm

One other aspect to keep in mind with field validation is sanitising the user inputs to prevent SQL injection attacks on the server.

For example if you used the following data input directly from the user:
firstname=”name”; drop table security;”

directly in an SQL statement:

sql=”select * from security where firstname='”+firstname+'”

You end up with a missing security table in your database…

Krish September 1, 2007 at 9:00 am

Bruce, what you say is true.

However, such problems should be handled by other means such as using parametrized queries, available in both .NET and Java.

Also there can be sanitizing and de-sanitizing functions for such input so that the information can be preserved while going to and from the database.

If your database code can be called separately from other than the place where you validate data, that is a potential security loophole.

Comments on this entry are closed.

{ 1 trackback }

Previous post:

Next post: