One query with multiple JOINs vs multiple queries
I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.
I'm considering two approaches: single query with multiple joins or two separate but simpler queries.
At a high level, and simplifying the structure, my schema is:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);
CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");
CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);
CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);
CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);
The addresses
table holds the geographical data. The position
column is calculated from the lat-lng columns when the rows are inserted or updated.
Each address
is associated to one location
.
Each address
may have many shops
, and each shop
will have one inventory
.
I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.
The tables have a few hundreds of thousands of rows.
With that in place, my main use case can be satisfied by this single query, which searches for addresses
within 1000 meters from a central geographical point (10.0, 10.0
) and returns data from all the tables:
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
This query works, and EXPLAIN ANALYZE
shows that it does correctly use the GIST
index.
However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:
--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;
where the values in l.address_id IN (1, 2, 3, 4, 5)
come from the first query.
The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.
I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.
What about memory usage? Or resource contention on the tables? (e.g. locks)
postgresql join postgis
New contributor
add a comment |
I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.
I'm considering two approaches: single query with multiple joins or two separate but simpler queries.
At a high level, and simplifying the structure, my schema is:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);
CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");
CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);
CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);
CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);
The addresses
table holds the geographical data. The position
column is calculated from the lat-lng columns when the rows are inserted or updated.
Each address
is associated to one location
.
Each address
may have many shops
, and each shop
will have one inventory
.
I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.
The tables have a few hundreds of thousands of rows.
With that in place, my main use case can be satisfied by this single query, which searches for addresses
within 1000 meters from a central geographical point (10.0, 10.0
) and returns data from all the tables:
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
This query works, and EXPLAIN ANALYZE
shows that it does correctly use the GIST
index.
However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:
--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;
where the values in l.address_id IN (1, 2, 3, 4, 5)
come from the first query.
The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.
I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.
What about memory usage? Or resource contention on the tables? (e.g. locks)
postgresql join postgis
New contributor
add a comment |
I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.
I'm considering two approaches: single query with multiple joins or two separate but simpler queries.
At a high level, and simplifying the structure, my schema is:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);
CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");
CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);
CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);
CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);
The addresses
table holds the geographical data. The position
column is calculated from the lat-lng columns when the rows are inserted or updated.
Each address
is associated to one location
.
Each address
may have many shops
, and each shop
will have one inventory
.
I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.
The tables have a few hundreds of thousands of rows.
With that in place, my main use case can be satisfied by this single query, which searches for addresses
within 1000 meters from a central geographical point (10.0, 10.0
) and returns data from all the tables:
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
This query works, and EXPLAIN ANALYZE
shows that it does correctly use the GIST
index.
However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:
--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;
where the values in l.address_id IN (1, 2, 3, 4, 5)
come from the first query.
The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.
I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.
What about memory usage? Or resource contention on the tables? (e.g. locks)
postgresql join postgis
New contributor
I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.
I'm considering two approaches: single query with multiple joins or two separate but simpler queries.
At a high level, and simplifying the structure, my schema is:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);
CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");
CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);
CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);
CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);
The addresses
table holds the geographical data. The position
column is calculated from the lat-lng columns when the rows are inserted or updated.
Each address
is associated to one location
.
Each address
may have many shops
, and each shop
will have one inventory
.
I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.
The tables have a few hundreds of thousands of rows.
With that in place, my main use case can be satisfied by this single query, which searches for addresses
within 1000 meters from a central geographical point (10.0, 10.0
) and returns data from all the tables:
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
This query works, and EXPLAIN ANALYZE
shows that it does correctly use the GIST
index.
However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:
--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;
where the values in l.address_id IN (1, 2, 3, 4, 5)
come from the first query.
The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.
I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.
What about memory usage? Or resource contention on the tables? (e.g. locks)
postgresql join postgis
postgresql join postgis
New contributor
New contributor
New contributor
asked 5 mins ago
tompavetompave
101
101
New contributor
New contributor
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
tompave is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232347%2fone-query-with-multiple-joins-vs-multiple-queries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
tompave is a new contributor. Be nice, and check out our Code of Conduct.
tompave is a new contributor. Be nice, and check out our Code of Conduct.
tompave is a new contributor. Be nice, and check out our Code of Conduct.
tompave is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232347%2fone-query-with-multiple-joins-vs-multiple-queries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown