This may not be the most elegant description of what's happening, but I
think you'll get the gist of it:
If you extend your original select to return the ID column from table b as
well as table a (see below), you may be able to see what's happening a
little more clearly:
Quote
> select a.id, a.locationname, b.id
> from locationname a, locationname b
> where a.locationname = b.locationname
> and a.id <> b.id
Your original query returns a row for each possible combination of pairs
where 2(or more) records contain the same locationname(except for the pair
where the record is joined to itself, since we filter out those instances in
the AND clause). That means that the number of rows returned for each
duplicated location name will be equal to n * (n-1), where n is the number
of records that contain the same locationname. Therefore, if there are 2
records with the locationname Wildwood, 2 rows will be returned (2 * 1 = 2).
3 records with the same locationname will return 6 rows for this query (3 *2
= 6). If you had 4 records, you would get 12 rows (4 * 3 = 12), and you
begin to get the picture (I hope). If you eliminated the AND clause from
your query, you would n^2 rows returned. I hope this answers your first
question somewhat.
I don't think I can adequately explain why the other solution previously
posted takes so much longer; suffice it to say that you're forcing the
server to do way more work, and the query optimizer may not be resolving
this query in the most efficient manner
Mike Pasveer
Quote
> Which works fine if there are only two locations with the same name -
> as in Wildwood or Windsor here. However, if you take the case of
> Wilmington, there are three of those in the db and as you can see,
> they are "doubled up" in the original query result set.
> Do you know why this is?
> Your suggestion works perfectly (in approx. 4 seconds for a
> locationname table of 10000 records) - thanks.
> See my other post in this thread for your bonus question 8-)
> Paul...