0

Like and Ranges

You will likely be aware of the ‘like’ operator and its use and the common use of the % symbol to denote a string of 0 or greater length of any characters. What you may not know is how to use ranges with the like operator to get some really powerful searches out of a select statement or to validate against the content of a field for use with an insert statement as one may whilst working with large data sets.

CTE statement

The examples in this document will refer to the following CTE to allow us to see explicitly what we’re selecting from. All outputs listed will be as though the select statement is referring to the data below:

WITH example_table AS (
SELECT '23498h' AS DATA UNION ALL
SELECT 'te-tst' UNION ALL
SELECT 'kj&98' UNION ALL
SELECT '9876543' UNION ALL
SELECT 'Joanna1' UNION ALL
SELECT 'abc*' UNION ALL
SELECT '1' UNION ALL
SELECT 'Joanna2' UNION ALL
SELECT 'Joanna3' UNION ALL
SELECT 'adasgh'''
)

% Symbol

To quickly recap the % symbol is used to denote a string of 0 or greater length of any characters and can be used as a wildcard. This means that the query:

SELECT DATA
FROM example_table
WHERE DATA  LIKE 'j&9'

will return no results as the query is looking for a data field that holds the value “j&9” exactly whereas:

SELECT DATA
FROM example_table
WHERE DATA  LIKE '%j&9%'

will return

data
kj&98

Ranges

So whilst writing some bulk data import functionality the other day I found that I had cause to use symbols with my like statements to denote ranges and whilst I had been aware that this could be done for some time I’ve generally found other ways of doing what I’ve needed to. If you’re not aware of ranges they can be used as below to denote different things and ranges can be added to without much fuss:

LIKE '[a-z]'     --One lower case alpha character
LIKE '[a-Z]%'    --Starts with one alpha character of upper or lower case
LIKE '%[A-Z]%'   --Contains at least one upper case alpha character
LIKE '[a-Z0-9]'  --One alpha numeric character (the range a-Z followed by 0-9)

Slightly more useful is the ability to use ranges with varchars as below:

LIKE 'Joanna[1-2]'

Which will pull back the following results

data
Joanna1
Joanna2

From the above examples we can see how ranges can be used in searches to great effect.

Field Validation

Above are some excellent ways to use the like operator but the usages that I discovered for the first time, and that I personally will find useful to be able to come back to are whole-field validations rather than placing emphasis on individual characters. To do this we need to make use of a character to be used within the square brackets to denote a ‘not’ match. This will have the effect of bringing back all results with any characters outside of those noted within the range, here are a few examples:

LIKE '%[^a-z]%'    --Returns all with any non-lowercase alpha characters 
LIKE '%[^a-Z]%'    --Returns all with any non-alpha characters
LIKE '%[^0-9]%'    --Returns all with any non-numerical characters

Which can be very useful for finding and listing fields that are not ‘valid’ for whatever your purposes may be and for denoting why said fields are ‘invalid’. To pull back rows that conform to the format that you’ve indicated in your range statement simply put a not in front of your like statement:

NOT LIKE '%[^a-z]%'     --Returns all with only lowercase alpha characters
NOT LIKE '%[^a-Z]%'     --Returns all with only alpha characters
NOT LIKE '%[^0-9]%'     --Returns all with only numerical characters

Of course the above can be altered if you would rather define a range of undesirable characters rather than desirable ones, for example if you had a varchar field that you were happy to accept special characters but did not want any numbers it would take a lot of code to give a list of desirable ranges:

'[a-Z''-,!ӣ$%^&*():;@~#.....

In that instance it would be far easier to simply code:

'[0-9]

If that approach has been taken then all that needs to be remembered is to remove the ^ symbol and that the following statement:

LIKE '%[0-9]%'

Will bring back only rows that have characters within the range (and as such are ‘invalid’) and:

NOT LIKE '%[0-9]%'

Will only bring back rows that have none of the characters within the range (and are ‘valid’). The table below is a handy way of remembering:

Code basis Like comparison Result
Specify desired format Like ‘%[^a-z]%’ returns invalid rows
Specify desired format Not Like ‘%[^a-z]%’ returns valid rows
Specify unwanted characters Like ‘%[0-9]%’ returns invalid rows
Specify unwanted characters Not Like ‘%[0-9]%’ returns valid rows

And all of this was a very long way for me to go to validate against a surname field with:

WHERE p.surname NOT LIKE '%[^a-Z''-]%'

!

JBaker

Joi is an IT Development Analyst working in London and a consummate technophile. Whilst he isn't coding he can be found reacquainting himself with his guitars, spinning poi, looking for climbing walls or chasing the latest shiny thing to cross his field of vision.

Leave a Reply

Your email address will not be published. Required fields are marked *