Board index » delphi » An SQL question - self join in Oracle/MS SQL...

An SQL question - self join in Oracle/MS SQL...

Hi all,

The following query (Oracle)

select ID, LOCATIONNAME from locationname where locationname like
'Paris%'

produces the following results.

ID              LOCATIONNAME                                              
----------      -----------------------------
2432     Paris Orly Apt                                            
3317     Paris Le Bourget Apt                                      
8111     Paris Charles de Gaulle Apt                              
8884     Paris                                                    
9026     Paris                                                    
9189     Paris Cergy Pontoise Airport                              
9246     Paris                                                    

Now, what I would like is to have a list of the names (such as Paris)
which are not different from the others (e.g. records 4, 5 and 7) in
this case, with their IDs beside them

I have tried

select a.id, a.locationname
from locationname a, locationname b
where a.locationname = b.locationname
and a.id <> b.id

But this does not give the right result. I know it's some sort of
fancy self-join, but I can't figure it out.

Anybody any ideas?

Paul...

 

Re:An SQL question - self join in Oracle/MS SQL...


This is untested; but you might try something like:

select a.id, a.locationname
from locationname a
where exists
(select * from locationname b
   where (a.locationname = b.locationname)
   and (a.id <> b.id) )

Clive

Quote
On Tue, 27 Mar 2001 17:17:44 GMT, p...@esat.net (Paul Linehan) wrote:

>Hi all,

>The following query (Oracle)

>select ID, LOCATIONNAME from locationname where locationname like
>'Paris%'

>produces the following results.

>ID          LOCATIONNAME                                              
>----------  -----------------------------
>2432     Paris Orly Apt                                            
>3317     Paris Le Bourget Apt                                      
>8111     Paris Charles de Gaulle Apt                              
>8884     Paris                                                    
>9026     Paris                                                    
>9189     Paris Cergy Pontoise Airport                              
>9246     Paris                                                    

>Now, what I would like is to have a list of the names (such as Paris)
>which are not different from the others (e.g. records 4, 5 and 7) in
>this case, with their IDs beside them

>I have tried

>select a.id, a.locationname
>from locationname a, locationname b
>where a.locationname = b.locationname
>and a.id <> b.id

>But this does not give the right result. I know it's some sort of
>fancy self-join, but I can't figure it out.

>Anybody any ideas?

>Paul...

Re:An SQL question - self join in Oracle/MS SQL...


Paul,
    you don't indicate what incorrect results were produced, but I'm
guessing that the only thing you're missing is a DISTINCT in your select
clause:

select DISTINCT a.id, a.locationname
from locationname a, locationname b
where a.locationname = b.locationname
and a.id <> b.id

If this is not the case, try to be more specific about the incorrect output
you are receiving, as well as the output you would like to receive.

Mike Pasveer

Quote
"Paul Linehan" <p...@esat.net> wrote in message

news:3ac0c835.12923533@newsgroups.borland.com...
Quote

> Hi all,

> The following query (Oracle)

> select ID, LOCATIONNAME from locationname where locationname like
> 'Paris%'

> produces the following results.

> ID LOCATIONNAME
> ---------- -----------------------------
> 2432     Paris Orly Apt
> 3317     Paris Le Bourget Apt
> 8111     Paris Charles de Gaulle Apt
> 8884     Paris
> 9026     Paris
> 9189     Paris Cergy Pontoise Airport
> 9246     Paris

> Now, what I would like is to have a list of the names (such as Paris)
> which are not different from the others (e.g. records 4, 5 and 7) in
> this case, with their IDs beside them

> I have tried

> select a.id, a.locationname
> from locationname a, locationname b
> where a.locationname = b.locationname
> and a.id <> b.id

> But this does not give the right result. I know it's some sort of
> fancy self-join, but I can't figure it out.

> Anybody any ideas?

> Paul...

Re:An SQL question - self join in Oracle/MS SQL...


 "Mike Pasveer" <mjpasv...@hotmail.com> wrote:

Quote
> Paul,
>     you don't indicate what incorrect results were produced, but I'm
> guessing that the only thing you're missing is a DISTINCT in your select
> clause:
> select DISTINCT a.id, a.locationname
> from locationname a, locationname b
> where a.locationname = b.locationname
> and a.id <> b.id
> If this is not the case, try to be more specific about the incorrect output
> you are receiving, as well as the output you would like to receive.

My apologies for not being clearer.

The output of my original query (without distinct) was this (sample)

       ID LOCATIONNAME

---------- ---------------

       997 White River

      7035 White River

      7060 Wildwood

      7055 Wildwood

      2957 Wilmington

      2956 Wilmington

      2951 Wilmington

      2956 Wilmington

      2951 Wilmington

      2957 Wilmington

       872 Windsor

       871 Windsor

Which works fine if there are only two locations with the same name -
as in Wildwood or Windsor here. However, if you take the case of
Wilmington, there are three of those in the db and as you can see,
they are "doubled up" in the  original query result set.

Do you know why this is?

Your suggestion works perfectly (in approx. 4 seconds for a
locationname table of 10000 records) - thanks.

See my other post in this thread for your bonus question  8-)

Paul...

Quote
> Mike Pasveer

Re:An SQL question - self join in Oracle/MS SQL...


 Clive Walden <cli...@clivewalden.com> wrote:

Quote
> This is untested; but you might try something like:
> select a.id, a.locationname
> from locationname a
> where exists
> (select * from locationname b
>    where (a.locationname = b.locationname)
>    and (a.id <> b.id) )

order by a.locationname

I did try this, and it works (have to add an order by clause for the
results to make any sense) - *_but_* (and it's a biggie) this query
takes 2 minutes to run on a table with 10000 records on a server with
500 MB of RAM and a P 750.

I also did (thanks to a suggestion by another poster)

select distinct a.id, a.locationname
from locationname a, locationname b
where a.id <> b.id
and a.locationname = b.locationname

and it takes about 3 seconds.

If I don't have the distinct clause, it doubles every name that has
more than two occurrences, but not those which have two (any idea
why?) and also, why does your first suggestion take soooo long?

I can understand it taking a wee bit longer, but 3 seconds to 2
minutes seems a wee bit extreme...

Paul...

Quote
> Clive

Re:An SQL question - self join in Oracle/MS SQL...


This may not be the most elegant description of what's happening, but I
think you'll get the gist of it:

If you extend your original select to return the ID column from table b as
well as table a (see below),  you may be able to see what's happening a
little more clearly:

Quote
> select a.id, a.locationname, b.id
> from locationname a, locationname b
> where a.locationname = b.locationname
> and a.id <> b.id

Your original query returns a row for each possible combination of pairs
where 2(or more) records contain the same locationname(except for the pair
where the record is joined to itself, since we filter out those instances in
the AND clause).  That means that the number of rows returned for each
duplicated location name will be equal to n * (n-1), where n is the number
of records that contain the same locationname.  Therefore, if there are 2
records with the locationname Wildwood, 2 rows will be returned (2 * 1 = 2).
3 records with the same locationname will return 6 rows for this query (3 *2
= 6).  If you had 4 records, you would get 12 rows (4 * 3 = 12), and you
begin to get the picture (I hope). If you eliminated the AND clause from
your query, you would n^2 rows returned. I hope this answers your first
question somewhat.

I don't think I can adequately explain why the other solution previously
posted takes so much longer;  suffice it to say that you're forcing the
server to do way more work, and the query optimizer may not be resolving
this query in the most efficient manner

Mike Pasveer

Quote
> Which works fine if there are only two locations with the same name -
> as in Wildwood or Windsor here. However, if you take the case of
> Wilmington, there are three of those in the db and as you can see,
> they are "doubled up" in the  original query result set.

> Do you know why this is?

> Your suggestion works perfectly (in approx. 4 seconds for a
> locationname table of 10000 records) - thanks.

> See my other post in this thread for your bonus question  8-)

> Paul...

Re:An SQL question - self join in Oracle/MS SQL...


Paul

Quote
>If I don't have the distinct clause, it doubles every name that has
>more than two occurrences, but not those which have two (any idea
>why?) and also, why does your first suggestion take soooo long?

>I can understand it taking a wee bit longer, but 3 seconds to 2
>minutes seems a wee bit extreme...

I think you already have the answer to the name "doubling".

As for how long it takes.

Without knowing a lot more about your Oracle configuration and the
indexes on the tables and Oracles's Explain Plan, it is impossible to
"know". However, if most entries are NOT duplicated AND there is no
index on the locationname field, then the exists clause is going to
have to do a complete table scan for every unique name.

Is there an index on the locationname field and does the Explain Plan
indicate whether or not it is being used?

Of course, now you have an acceptable solution, you may not want to
pursue this<s>

Clive

Re:An SQL question - self join in Oracle/MS SQL...


Quote
Paul Linehan <p...@esat.net> wrote in message

news:3ac1b31c.73058242@newsgroups.borland.com...
Quote
> I can understand it taking a wee bit longer, but 3 seconds to 2
> minutes seems a wee bit extreme...

Clive has explained why the exists query takes so long.  Using the second
method, I believe that Oracle will first order the records by locationname
so it will only requires two table scans.

Here is another alternative (not tested):

Select id, locationname from locationname
where locationname in
(select locationname from locationname
 group by locationname
 having count(*) > 1)

HTH,
Ping Kam

Other Threads