You will likely be aware of the ‘like’ operator and its use and the common use of the % symbol to denote a string of 0 or greater length of any characters. What you may not know is how to use ranges with the like operator to get some really powerful searches out of a select statement or to validate against the content of a field for use with an insert statement as one may whilst working with large data sets.
CTE statement
The examples in this document will refer to the following CTE to allow us to see explicitly what we’re selecting from. All outputs listed will be as though the select statement is referring to the data below:
WITH example_table AS ( SELECT '23498h' AS DATA UNION ALL SELECT 'te-tst' UNION ALL SELECT 'kj&98' UNION ALL SELECT '9876543' UNION ALL SELECT 'Joanna1' UNION ALL SELECT 'abc*' UNION ALL SELECT '1' UNION ALL SELECT 'Joanna2' UNION ALL SELECT 'Joanna3' UNION ALL SELECT 'adasgh''' )
% Symbol
To quickly recap the % symbol is used to denote a string of 0 or greater length of any characters and can be used as a wildcard. This means that the query:
SELECT DATA FROM example_table WHERE DATA LIKE 'j&9'
will return no results as the query is looking for a data field that holds the value “j&9” exactly whereas:
SELECT DATA FROM example_table WHERE DATA LIKE '%j&9%'
will return
data |
kj&98 |
Ranges
So whilst writing some bulk data import functionality the other day I found that I had cause to use symbols with my like statements to denote ranges and whilst I had been aware that this could be done for some time I’ve generally found other ways of doing what I’ve needed to. If you’re not aware of ranges they can be used as below to denote different things and ranges can be added to without much fuss:
LIKE '[a-z]' --One lower case alpha character LIKE '[a-Z]%' --Starts with one alpha character of upper or lower case LIKE '%[A-Z]%' --Contains at least one upper case alpha character LIKE '[a-Z0-9]' --One alpha numeric character (the range a-Z followed by 0-9)
Slightly more useful is the ability to use ranges with varchars as below:
LIKE 'Joanna[1-2]'
Which will pull back the following results
data |
Joanna1 |
Joanna2 |
From the above examples we can see how ranges can be used in searches to great effect.
Field Validation
Above are some excellent ways to use the like operator but the usages that I discovered for the first time, and that I personally will find useful to be able to come back to are whole-field validations rather than placing emphasis on individual characters. To do this we need to make use of a character to be used within the square brackets to denote a ‘not’ match. This will have the effect of bringing back all results with any characters outside of those noted within the range, here are a few examples:
LIKE '%[^a-z]%' --Returns all with any non-lowercase alpha characters LIKE '%[^a-Z]%' --Returns all with any non-alpha characters LIKE '%[^0-9]%' --Returns all with any non-numerical characters
Which can be very useful for finding and listing fields that are not ‘valid’ for whatever your purposes may be and for denoting why said fields are ‘invalid’. To pull back rows that conform to the format that you’ve indicated in your range statement simply put a not in front of your like statement:
NOT LIKE '%[^a-z]%' --Returns all with only lowercase alpha characters NOT LIKE '%[^a-Z]%' --Returns all with only alpha characters NOT LIKE '%[^0-9]%' --Returns all with only numerical characters
Of course the above can be altered if you would rather define a range of undesirable characters rather than desirable ones, for example if you had a varchar field that you were happy to accept special characters but did not want any numbers it would take a lot of code to give a list of desirable ranges:
'[a-Z''-,!ӣ$%^&*():;@~#.....
In that instance it would be far easier to simply code:
'[0-9]
If that approach has been taken then all that needs to be remembered is to remove the ^ symbol and that the following statement:
LIKE '%[0-9]%'
Will bring back only rows that have characters within the range (and as such are ‘invalid’) and:
NOT LIKE '%[0-9]%'
Will only bring back rows that have none of the characters within the range (and are ‘valid’). The table below is a handy way of remembering:
Code basis | Like comparison | Result |
Specify desired format | Like ‘%[^a-z]%’ | returns invalid rows |
Specify desired format | Not Like ‘%[^a-z]%’ | returns valid rows |
Specify unwanted characters | Like ‘%[0-9]%’ | returns invalid rows |
Specify unwanted characters | Not Like ‘%[0-9]%’ | returns valid rows |
And all of this was a very long way for me to go to validate against a surname field with:
WHERE p.surname NOT LIKE '%[^a-Z''-]%'
!