Create index for multiple WHERE EXISTS subqueries

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

I have to find an efficient way to write a very specific query for an application that I'm developing.

The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".

So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".

I've come up with this query:

SELECT

    markers.*, 1 AS "species_1",

    1 AS "species_2",

    1 AS "species_3"

FROM

    markers

LEFT JOIN genotypes g ON g.marker_id = markers.id

WHERE

    g.dataset_id = 3

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus1"

    AND taxonomies.species = "species1"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus2"

    AND taxonomies.species = "species2"

)

AND EXISTS (

    SELECT

        1

    FROM

        genotypes

    LEFT JOIN accession ON genotypes.accession_id = accession.id

    LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id

    WHERE

        genotypes.marker_id = markers.id

    AND genotypes.a = 1

    AND taxonomies.genus = "genus3"

    AND taxonomies.species = "species3"

)

NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.

It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.

However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.

Does anyone have any suggestions as to how to improve performance?

EDIT:

CREATE TABLE `genotypes` (

    `id` BIGINT (20) NOT NULL AUTO_INCREMENT,

    `marker_id` INT (11) NOT NULL,

    `dataset_id` INT (11) NOT NULL,

    `accession_id` INT (11) NOT NULL,

    `a` VARCHAR (3) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `marker_id` (`marker_id`) USING BTREE,

    KEY `dataset_id` (`dataset_id`) USING BTREE,

    KEY `accession_id` (`accession_id`) USING BTREE,

    KEY `genotypes_marker_dataset_allele1` (

        `marker_id`,

        `dataset_id`,

        `a`

    ) USING BTREE,

    KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,

    CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,

    CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE

) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1



CREATE TABLE `markers` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `marker_name` VARCHAR (45) DEFAULT NULL,

    PRIMARY KEY (`id`)

) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1



CREATE TABLE `accessions` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `name` VARCHAR (255) NOT NULL,

    `taxonomy_id` INT (11) DEFAULT NULL,

    PRIMARY KEY (`id`),

    KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,

    CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,



) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1



CREATE TABLE `taxonomies` (

    `id` INT (11) NOT NULL AUTO_INCREMENT,

    `genus` VARCHAR (255) NOT NULL DEFAULT '',

    `species` VARCHAR (255) NOT NULL DEFAULT '',

    PRIMARY KEY (`id`),

    KEY `taxonomies_genus` (`genus`) USING BTREE,

    KEY `taxonomies_species` (`species`) USING BTREE,

    KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE

) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1



id  select_type table   type    possible_keys   key key_len ref rows    Extra

1   PRIMARY markers ALL PRIMARY             8803    Using where; Using temporary

1   PRIMARY g   ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1  genotypes_marker_dataset_allele1    8   markers.id,const    104 Using index; Distinct

6   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

6   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

6   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

5   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

5   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

5   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

4   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

4   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

4   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

3   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

3   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

3   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

2   DEPENDENT SUBQUERY  taxonomies  ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species    taxonomies_species  257 const   1   Using index condition; Using where

2   DEPENDENT SUBQUERY  genotypes   ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1    marker_id   4   markers.id  115 Using where

2   DEPENDENT SUBQUERY  accessions  eq_ref  PRIMARY,taxonomy_id PRIMARY 4   genotypes.accession_id  1   Using where

mysql index query-performance subquery

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

edited Jun 8 '16 at 10:13

asked Jun 8 '16 at 9:00

Baz

105119

asked Jun 8 '16 at 9:00

Baz

105119

asked Jun 8 '16 at 9:00

Baz

105119

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 13 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

2

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16

@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29

The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30

@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13

I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31

|
show 2 more comments

1 Answer
1

active

oldest

votes

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

g.dataset_id = 3 is killing the left

try this

FROM markers

JOIN genotypes g 

      ON g.marker_id = markers.id

     AND g.dataset_id = 3

The where is killing the left joins in the exists so you can drop that

This may give you better performance

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus1"

        AND taxonomies.species = "species1"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus2"

        AND taxonomies.species = "species2"  

)

and exists 

(  SELECT 1

   FROM genotypes

   join accession 

         ON genotypes.accession_id = accession.id  

        AND genotypes.a = 1 

        AND genotypes.marker_id = markers.id

   join taxonomies 

         ON taxonomies.id = accession.taxonomy_id

        AND taxonomies.genus   = "genus3"

        AND taxonomies.species = "species3"  

)

ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id

SELECT 1

FROM genotypes

join accession 

      ON genotypes.accession_id = accession.id  

     AND genotypes.a = 1 

     AND genotypes.marker_id = markers.id

join taxonomies as t1 

      ON t1.id = accession.taxonomy_id

     AND t1.genus   = "genus1"

     AND t1.species = "species1"

join taxonomies as t2 

      ON t2.id = accession.taxonomy_id

     AND t2.genus   = "genus2"

     AND t2.species = "species2"

join taxonomies as t3 

      ON t3.id = accession.taxonomy_id

     AND t3.genus   = "genus3"

     AND t3.species = "species3"

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

edited Jun 8 '16 at 12:27

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

answered Jun 8 '16 at 10:38

paparazzo

4,6141230

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41

@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xrhrft