(fwd) Re: [HACKERS] [6.5.2] join problems ...

Andrey Gerzhov (kittle@freeland.kiev.ua)
Sun, 19 Sep 1999 16:40:32 +0300 (EEST)

-- forwarded message --
Path: freeland.kiev.ua!news.alexradio.kiev.ua!not-for-mail
Message-ID: <19990919071724.20897.rocketmail@web119.yahoomail.com>
Date: Sun, 19 Sep 1999 00:17:24 -0700 (PDT)
From: Mike Mascari <mascarim@yahoo.com>
Subject: Re: [HACKERS] [6.5.2] join problems ...
To: The Hermit Hacker <scrappy@hub.org>
Newsgroups: alex.gated.pgsql.hackers
Lines: 199
Xref: freeland.kiev.ua alex.gated.pgsql.hackers:10553

The query you've presented is rather convoluted, but
if I'm reading your query correctly, it should reduce
to a simple, three-way join:

SELECT c.id, c.name, c.url
FROM aecEntMain a, aecWebEntry b, aecCategory c
WHERE a.status LIKE 'active:ALL%'
AND a.representation LIKE '%:ALL%'
AND b.status LIKE 'active:ALL%'
AND b.indid='$indid'
AND b.divid='$divid'
AND (a.id,a.mid = b.id,b.mid)
AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid);

with the following indexes:
aecEntMain: (status) and (id,mid)
aecWebEntry: (status), (indid), (divid), and
(catid,indid,divid)
aecCategory: (id,ppid,pid)

Now, there are some differences between the above and
what you wrote. For example, the above requires that
the status begins with 'active:ALL'. Your query
requires the status begin with 'active' and must also
contain the pattern 'active:ALL'. So for the above
to be equivalent, you can't have a status such as
'active <some stuff> active:ALL'.

With respect to subqueries and PostgreSQL, as you
know, the IN clause requires a nested scan. If you
are going to use subqueries, correlated subqueries
using EXISTS clauses can use indexes:

SELECT c.id, c.name, c.url
FROM aecCategory c
WHERE EXISTS (
SELECT a.status
FROM aecEntMain a, aecWebEntry b
WHERE a.status LIKE 'active:ALL%'
AND a.representation LIKE '%:ALL%'
AND b.status LIKE 'active:ALL%'
AND b.indid='$indid'
AND b.divid='$divid'
AND (a.id,a.mid = b.id,b.mid)
AND (b.catid,b.indid,b.divid = c.id,c.ppid,c.pid));

Unfortunately, the lack of index support in IN
subqueries affects more than just the IN subquery
clause, since INTERSECT/EXCEPT uses the rewriter to
rewrite such queries as UNIONS of two queries with
an IN/NOT IN subquery, respectively. This makes the
INTERSECT/EXCEPT feature functionally useless except
on very small tables.

Hope that helps (and is equivalent),

Mike Mascari (mascarim@yahoo.com)

--- The Hermit Hacker <scrappy@hub.org> wrote:
>
> Morning...
>
> This weekend, up at a clients site working with
> them on improving
> database performance. They are currently running
> MySQL and I'm trying to
> convince them to switch over to PostgreSQL, for
> various features that they
> just don't have with MySQL...
>
> One of the 'queries' that they are currently doing
> with MySQL
> consists of two queries that I can reduce down to
> one using subqueries,
> but its so slow that its ridiculous...so I figured
> I'd throw it out as a
> problem to hopefully solve?
>
> The query I'm starting out with is works out as:
>
> SELECT id, name, url \
> FROM aecCategory \
> WHERE ppid='$indid' \
> AND pid='$divid'";
>
> The results of this get fed into a while look that
> takes the id
> returned and pushes them into:
>
> SELECT distinct b.indid, b.divid, b.catid,
> a.id, a.mid \
> FROM aecEntMain a, aecWebEntry b \
> WHERE (a.id=b.id AND a.mid=b.mid) \
> AND (a.status like 'active%' and b.status
> like 'active%')
> AND (a.status like '%active:ALL%' and
> b.status like '%active:ALL%')
> AND (a.representation like '%:ALL%')
> AND (b.indid='$indid' and
> b.divid='$divid' and b.catid='$catid')";
>
> Now, I can/have rewritten this as:
>
> SELECT id, name, url
> FROM aecCategory
> WHERE ppid='$indid'
> AND pid='$divid'
> AND id IN (
> SELECT distinct c.id
> FROM aecEntMain a, aecWebEntry b, aecCategory c
> WHERE (a.id=b.id AND a.mid=b.mid and b.catid=c.id)
> AND (a.status like 'active%' and b.status like
> 'active%')
> AND (a.status like '%active:ALL%' and b.status
> like '%active:ALL%')
> AND (a.representation like '%:ALL%')
> AND (b.indid='$indid' and b.divid='$divid' and
> b.catid IN (
> SELECT id FROM aecCategory WHERE
> ppid='$indid' AND pid='$divid' )
> ));";
>
> An explain of the above shows:
>
> Index Scan using aeccategory_primary on aeccategory
> (cost=8.28 rows=1 width=36)
> SubPlan
> -> Unique (cost=1283.70 rows=21 width=72)
> -> Sort (cost=1283.70 rows=21 width=72)
> -> Nested Loop (cost=1283.70 rows=21
> width=72)
> -> Nested Loop (cost=1280.70 rows=1
> width=60)
> -> Index Scan using aecwebentry_primary
> on aecwebentry b
> (cost=1278.63 rows=1 width=36)
> SubPlan
> -> Index Scan using
> aeccategory_primary on aeccategory
> (cost=8.28 rows=1
> width=12)
> -> Index Scan using aecentmain_primary on
> aecentmain a
> (cost=2.07 rows=348 width=24)
> -> Index Scan using aeccategory_id on
> aeccategory c
> (cost=3.00 rows=1170 width=12)
>
> Now, a few things bother me with the above explain
> output, based on me
> hopefully reading this right...
>
> The innermost SubPlan reports an estimated rows
> returned of 1...the
> actual query returns 59 rows...slightly off?
>
> The one that bothers me is the one that reports
> 1170 rows returned...if you
> look at the query, the only thing that would/should
> use aeccategory_id is the
> line that goes "SELECT distinct c.id"...if I run
> just that section of the
> query, it yields a result of 55 rows...way off??
>
> All of my queries are currently on static data,
> after a vacuum analyze has
> been performed...everything is faster if I split
> things up and do a SELECT
> on a per id basis on return values, but, as the list
> of 'ids' grow, the
> number of iterations of the while loop required will
> slow down the query...
>
> I'm not sure what else to look at towards
> optimizing the query further,
> or is this something that we still are/need to look
> at in the server itself?
>
> The machine we are working off of right now is an
> idle Dual-PIII 450Mhz with
> 512Meg of RAM, very fast SCSI hard drives on a UW
> controller...and that query
> is the only thing running while we test things...so
> we aren't under-powered :)
>
> ideas?
>
> Marc G. Fournier ICQ#7615664
> IRC Nick: Scrappy
> Systems Administrator @ hub.org
> primary: scrappy@hub.org secondary:
> scrappy@{freebsd|postgresql}.org

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com

************

-- end of forwarded message --

-- 
С тем, что не помешает никогда,
                                               Kittle