Sunday, February 19, 2012

Ignoring spaces!

Hi All,
Wondering if i can tap into your knowlege...
I have 2 lists of ID Codes (users and potential users of a service)
which i need to match together
1 list is of existing users, 1 list of potential users.
I want to find, from the list of potential users, ID codes which are
not in the list of users.
This is simple enough and i'm using a lef join to establish matching
ID codes in the 2 lists, those not matched have not used the service.
My problem is, that ID codes from both lists sometimes have a single
space at random points within the code and these are not constent
between the 2 lists.
What i ideally would like is a piece of code which says to match list
1 with list 2 but ignore anything which is not a-zA-Z0-9, which would
then ignore the ' ' [space].
Any advice'
PS i know i could use the replace ' ', with '' code in both lists to
uniform them, but i don't really want to have to go down that line
everytime i want to do the match.
Thanks!> PS i know i could use the replace ' ', with '' code in both lists to
> uniform them, but i don't really want to have to go down that line
> everytime i want to do the match.
Then fix the problem instead of searching for some magical better
alternative to using replace.
By "fix the problem" I mean:
(a) correct the existing data that shouldn't have spaces; and, more
importantly,
(b) correct the code/app(s) that is putting the spaces into the data in the
first place.
Aaron Bertrand
SQL Server MVP
http://www.sqlblog.com/
http://www.aspfaq.com/5006|||On Fri, 15 Jun 2007 06:00:13 -0700, chriselias271@.gmail.com wrote:

>What i ideally would like is a piece of code which says to match list
>1 with list 2 but ignore anything which is not a-zA-Z0-9, which would
>then ignore the ' ' [space].
ON REPLACE(A.ID, ' ', '') = REPLACE(B.ID, ' ', '')
However, performance will be poor as that can not use indexes. If the
tables are not too large and the match is not run too often
performance might be acceptable, or not.

>PS i know i could use the replace ' ', with '' code in both lists to
>uniform them, but i don't really want to have to go down that line
>everytime i want to do the match.
If you don't fix the data - which would seem to be the ideal solution
- then whatver "piece of code" you use will have to be used every time
the match is performed. I don't know what sort of code could be
simpler than using REPLACE as in the example above.
If it is not practical to remove the spaces, and the match must be run
regularly, then I would consider adding another column to each table
to hold the column without the blank, or adding such a column to a
pair of views on the two tables and indexing them to create indexed
views and match on the views.
Roy Harvey
Beacon Falls, CT|||I agree with Aaron on it's better to fix the data source, but assuming
you cannot...
A user-defined scalar-value function can do the string cleaning...if
you are on 2005 and can use CLR, just a simple wrapper of
Regex.Replace will do the trick in one line...if you are 2000 or no
CLR, then you would have to do t-sql string manipulation to clean it
up...did this last week for an ETL project...not a very good idea
performance wise, as it will scan all your base tables if you are
using it in the join:
-- Returns only the digits contained in @.input
CREATE FUNCTION dbo.VarcharDigits
(
@.input varchar(255)
)
RETURNS varchar(255)
AS
BEGIN
DECLARE @.i int
DECLARE @.cur char
DECLARE @.output varchar(255)
SET @.output = ''
SET @.i = 1
WHILE (@.i <= LEN(@.input))
BEGIN
SET @.cur = SUBSTRING(@.input,@.i,1)
IF (ASCII(@.cur) BETWEEN 48 AND 57) -- Digits only
SET @.output = @.output + @.cur
SET @.i = @.i + 1
END
RETURN @.output
END
On Jun 15, 9:00 am, chriselias...@.gmail.com wrote:
> Hi All,
> Wondering if i can tap into your knowlege...
> I have 2 lists of ID Codes (users and potential users of a service)
> which i need to match together
> 1 list is of existing users, 1 list of potential users.
> I want to find, from the list of potential users, ID codes which are
> not in the list of users.
> This is simple enough and i'm using a lef join to establish matching
> ID codes in the 2 lists, those not matched have not used the service.
> My problem is, that ID codes from both lists sometimes have a single
> space at random points within the code and these are not constent
> between the 2 lists.
> What i ideally would like is a piece of code which says to match list
> 1 with list 2 but ignore anything which is not a-zA-Z0-9, which would
> then ignore the ' ' [space].
> Any advice'
> PS i know i could use the replace ' ', with '' code in both lists to
> uniform them, but i don't really want to have to go down that line
> everytime i want to do the match.
> Thanks!|||On 15 Jun, 15:18, Roy Harvey <roy_har...@.snet.net> wrote:
> On Fri, 15 Jun 2007 06:00:13 -0700, chriselias...@.gmail.com wrote:
> ON REPLACE(A.ID, ' ', '') = REPLACE(B.ID, ' ', '')
> However, performance will be poor as that can not use indexes. If the
> tables are not too large and the match is not run too often
> performance might be acceptable, or not.
>
> If you don't fix the data - which would seem to be the ideal solution
> - then whatver "piece of code" you use will have to be used every time
> the match is performed. I don't know what sort of code could be
> simpler than using REPLACE as in the example above.
> If it is not practical to remove thespaces, and the match must be run
> regularly, then I would consider adding another column to each table
> to hold the column without the blank, or adding such a column to a
> pair of views on the two tables and indexing them to create indexed
> views and match on the views.
> Roy Harvey
> Beacon Falls, CT
Absolutely spot on exactly what i wanted.
Thanks for understanding the problem so well!!

No comments:

Post a Comment