One query with multiple JOINs vs multiple queries












0















I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.



I'm considering two approaches: single query with multiple joins or two separate but simpler queries.





At a high level, and simplifying the structure, my schema is:



CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;


CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);

CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");

CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);

CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);

CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);


The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.



Each address is associated to one location.



Each address may have many shops, and each shop will have one inventory.



I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.



The tables have a few hundreds of thousands of rows.





With that in place, my main use case can be satisfied by this single query, which searches for addresses within 1000 meters from a central geographical point (10.0, 10.0) and returns data from all the tables:



SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);


This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.



However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:



--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);

--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;


where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.





The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.



I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.



What about memory usage? Or resource contention on the tables? (e.g. locks)









share







New contributor




tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    0















    I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.



    I'm considering two approaches: single query with multiple joins or two separate but simpler queries.





    At a high level, and simplifying the structure, my schema is:



    CREATE EXTENSION "uuid-ossp";
    CREATE EXTENSION IF NOT EXISTS postgis;


    CREATE TABLE addresses (
    id bigint NOT NULL,
    latitude double precision,
    longitude double precision,
    line1 character varying NOT NULL,
    "position" geography(Point,4326),
    CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
    );

    CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");

    CREATE TABLE locations (
    id bigint NOT NULL,
    uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
    address_id bigint NOT NULL
    );

    CREATE TABLE shops (
    id bigint NOT NULL,
    name character varying NOT NULL,
    location_id bigint NOT NULL
    );

    CREATE TABLE inventories (
    id bigint NOT NULL,
    shop_id bigint NOT NULL,
    status character varying NOT NULL
    );


    The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.



    Each address is associated to one location.



    Each address may have many shops, and each shop will have one inventory.



    I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.



    The tables have a few hundreds of thousands of rows.





    With that in place, my main use case can be satisfied by this single query, which searches for addresses within 1000 meters from a central geographical point (10.0, 10.0) and returns data from all the tables:



    SELECT
    s.id AS shop_id,
    s.name AS shop_name,
    i.status AS inventory_status,
    l.uuid AS location_uuid,
    a.line1 AS addr_line,
    a.latitude AS lat,
    a.longitude AS lng
    FROM addresses a
    JOIN locations l ON l.address_id = a.id
    JOIN shops s ON s.location_id = l.id
    JOIN inventories i ON i.shop_id = s.id
    WHERE ST_DWithin(
    a.position, -- the position of each address
    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
    1000, -- radius distance in meters
    true
    );


    This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.



    However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:



    --- only search for the addresses
    SELECT
    a.id as addr_id,
    a.line1 AS addr_line,
    a.latitude AS lat,
    a.longitude AS lng
    FROM addresses a
    WHERE ST_DWithin(
    a.position, -- the position of each address
    ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
    1000, -- radius distance in meters
    true
    );

    --- get the rest of the data
    SELECT
    s.id AS shop_id,
    s.name AS shop_name,
    i.status AS inventory_status,
    l.id AS location_id,
    l.uuid AS location_uuid
    FROM locations l
    JOIN shops s ON s.location_id = l.id
    JOIN inventories i ON i.shop_id = s.id
    WHERE
    l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
    ;


    where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.





    The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.



    I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.



    What about memory usage? Or resource contention on the tables? (e.g. locks)









    share







    New contributor




    tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      0












      0








      0








      I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.



      I'm considering two approaches: single query with multiple joins or two separate but simpler queries.





      At a high level, and simplifying the structure, my schema is:



      CREATE EXTENSION "uuid-ossp";
      CREATE EXTENSION IF NOT EXISTS postgis;


      CREATE TABLE addresses (
      id bigint NOT NULL,
      latitude double precision,
      longitude double precision,
      line1 character varying NOT NULL,
      "position" geography(Point,4326),
      CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
      );

      CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");

      CREATE TABLE locations (
      id bigint NOT NULL,
      uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
      address_id bigint NOT NULL
      );

      CREATE TABLE shops (
      id bigint NOT NULL,
      name character varying NOT NULL,
      location_id bigint NOT NULL
      );

      CREATE TABLE inventories (
      id bigint NOT NULL,
      shop_id bigint NOT NULL,
      status character varying NOT NULL
      );


      The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.



      Each address is associated to one location.



      Each address may have many shops, and each shop will have one inventory.



      I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.



      The tables have a few hundreds of thousands of rows.





      With that in place, my main use case can be satisfied by this single query, which searches for addresses within 1000 meters from a central geographical point (10.0, 10.0) and returns data from all the tables:



      SELECT
      s.id AS shop_id,
      s.name AS shop_name,
      i.status AS inventory_status,
      l.uuid AS location_uuid,
      a.line1 AS addr_line,
      a.latitude AS lat,
      a.longitude AS lng
      FROM addresses a
      JOIN locations l ON l.address_id = a.id
      JOIN shops s ON s.location_id = l.id
      JOIN inventories i ON i.shop_id = s.id
      WHERE ST_DWithin(
      a.position, -- the position of each address
      ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
      1000, -- radius distance in meters
      true
      );


      This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.



      However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:



      --- only search for the addresses
      SELECT
      a.id as addr_id,
      a.line1 AS addr_line,
      a.latitude AS lat,
      a.longitude AS lng
      FROM addresses a
      WHERE ST_DWithin(
      a.position, -- the position of each address
      ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
      1000, -- radius distance in meters
      true
      );

      --- get the rest of the data
      SELECT
      s.id AS shop_id,
      s.name AS shop_name,
      i.status AS inventory_status,
      l.id AS location_id,
      l.uuid AS location_uuid
      FROM locations l
      JOIN shops s ON s.location_id = l.id
      JOIN inventories i ON i.shop_id = s.id
      WHERE
      l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
      ;


      where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.





      The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.



      I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.



      What about memory usage? Or resource contention on the tables? (e.g. locks)









      share







      New contributor




      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.



      I'm considering two approaches: single query with multiple joins or two separate but simpler queries.





      At a high level, and simplifying the structure, my schema is:



      CREATE EXTENSION "uuid-ossp";
      CREATE EXTENSION IF NOT EXISTS postgis;


      CREATE TABLE addresses (
      id bigint NOT NULL,
      latitude double precision,
      longitude double precision,
      line1 character varying NOT NULL,
      "position" geography(Point,4326),
      CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
      );

      CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");

      CREATE TABLE locations (
      id bigint NOT NULL,
      uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
      address_id bigint NOT NULL
      );

      CREATE TABLE shops (
      id bigint NOT NULL,
      name character varying NOT NULL,
      location_id bigint NOT NULL
      );

      CREATE TABLE inventories (
      id bigint NOT NULL,
      shop_id bigint NOT NULL,
      status character varying NOT NULL
      );


      The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.



      Each address is associated to one location.



      Each address may have many shops, and each shop will have one inventory.



      I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.



      The tables have a few hundreds of thousands of rows.





      With that in place, my main use case can be satisfied by this single query, which searches for addresses within 1000 meters from a central geographical point (10.0, 10.0) and returns data from all the tables:



      SELECT
      s.id AS shop_id,
      s.name AS shop_name,
      i.status AS inventory_status,
      l.uuid AS location_uuid,
      a.line1 AS addr_line,
      a.latitude AS lat,
      a.longitude AS lng
      FROM addresses a
      JOIN locations l ON l.address_id = a.id
      JOIN shops s ON s.location_id = l.id
      JOIN inventories i ON i.shop_id = s.id
      WHERE ST_DWithin(
      a.position, -- the position of each address
      ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
      1000, -- radius distance in meters
      true
      );


      This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.



      However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:



      --- only search for the addresses
      SELECT
      a.id as addr_id,
      a.line1 AS addr_line,
      a.latitude AS lat,
      a.longitude AS lng
      FROM addresses a
      WHERE ST_DWithin(
      a.position, -- the position of each address
      ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
      1000, -- radius distance in meters
      true
      );

      --- get the rest of the data
      SELECT
      s.id AS shop_id,
      s.name AS shop_name,
      i.status AS inventory_status,
      l.id AS location_id,
      l.uuid AS location_uuid
      FROM locations l
      JOIN shops s ON s.location_id = l.id
      JOIN inventories i ON i.shop_id = s.id
      WHERE
      l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
      ;


      where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.





      The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.



      I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.



      What about memory usage? Or resource contention on the tables? (e.g. locks)







      postgresql join postgis





      share







      New contributor




      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share







      New contributor




      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share



      share






      New contributor




      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 5 mins ago









      tompavetompave

      101




      101




      New contributor




      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      tompave is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "182"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          tompave is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232347%2fone-query-with-multiple-joins-vs-multiple-queries%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          tompave is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          tompave is a new contributor. Be nice, and check out our Code of Conduct.













          tompave is a new contributor. Be nice, and check out our Code of Conduct.












          tompave is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Database Administrators Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f232347%2fone-query-with-multiple-joins-vs-multiple-queries%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          SQL Server 17 - Attemping to backup to remote NAS but Access is denied

          Always On Availability groups resolving state after failover - Remote harden of transaction...

          Restoring from pg_dump with foreign key constraints