Create index for multiple WHERE EXISTS subqueries












3















I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where









share|improve this question
















bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31


















3















I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where









share|improve this question
















bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31
















3












3








3








I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where









share|improve this question
















I have to find an efficient way to write a very specific query for an application that I'm developing.



The idea is the following: Find all the genetic markers for which there exists at least one plant line per species for which the genotype value is "1".



So I've got, say, 3 plant species: "species_1", "species_2" and "species_3".



I've got these tables: accessions (plant lines), markers (the genetic markers), genotypes (the genotype; one per marker, accession and dataset), datasets (to subdivide genotype values into datasets) and taxonomies (the plant species).



I've come up with this query:



SELECT
markers.*, 1 AS "species_1",
1 AS "species_2",
1 AS "species_3"
FROM
markers
LEFT JOIN genotypes g ON g.marker_id = markers.id
WHERE
g.dataset_id = 3
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
AND EXISTS (
SELECT
1
FROM
genotypes
LEFT JOIN accession ON genotypes.accession_id = accession.id
LEFT JOIN taxonomies ON taxonomies.id = accession.taxonomy_id
WHERE
genotypes.marker_id = markers.id
AND genotypes.a = 1
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


NOTE: If you're wondering about the SELECT 1 AS bits, this is only because I also run the query for 0 and NOT EXISTS. So this is only one example query.



It returns the correct result, but is quite slow on a genotypes table with roughtly 2 million rows.



I've had a look at EXPLAIN of the query and tried to generate indices so that each sub-query has an index to use. The index that did improve performance was on genotypes.marker_id, genotypes.dataset_id, genotypes.a.



However, the query takes around 7-8 seconds to return the result and I think it could run faster if only I can figure out which other index to use or how to restructure the query.



Does anyone have any suggestions as to how to improve performance?





EDIT:



CREATE TABLE `genotypes` (
`id` BIGINT (20) NOT NULL AUTO_INCREMENT,
`marker_id` INT (11) NOT NULL,
`dataset_id` INT (11) NOT NULL,
`accession_id` INT (11) NOT NULL,
`a` VARCHAR (3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `marker_id` (`marker_id`) USING BTREE,
KEY `dataset_id` (`dataset_id`) USING BTREE,
KEY `accession_id` (`accession_id`) USING BTREE,
KEY `genotypes_marker_dataset_allele1` (
`marker_id`,
`dataset_id`,
`a`
) USING BTREE,
KEY `genotypes_marker_allele1` (`marker_id`, `a`) USING BTREE,
CONSTRAINT `genotypes_ibfk_1` FOREIGN KEY (`marker_id`) REFERENCES `markers` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_2` FOREIGN KEY (`dataset_id`) REFERENCES `datasets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `genotypes_ibfk_3` FOREIGN KEY (`accession_id`) REFERENCES `accessions` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = INNODB AUTO_INCREMENT = 2007206 DEFAULT CHARSET = latin1

CREATE TABLE `markers` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`marker_name` VARCHAR (45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE = INNODB AUTO_INCREMENT = 8665 DEFAULT CHARSET = latin1

CREATE TABLE `accessions` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR (255) NOT NULL,
`taxonomy_id` INT (11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `taxonomy_id` (`taxonomy_id`) USING BTREE,
CONSTRAINT `accessions_ibfk_3` FOREIGN KEY (`taxonomy_id`) REFERENCES `taxonomies` (`id`) ON DELETE NO ACTION ON UPDATE CASCADE,

) ENGINE = INNODB AUTO_INCREMENT = 2304 DEFAULT CHARSET = latin1

CREATE TABLE `taxonomies` (
`id` INT (11) NOT NULL AUTO_INCREMENT,
`genus` VARCHAR (255) NOT NULL DEFAULT '',
`species` VARCHAR (255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `taxonomies_genus` (`genus`) USING BTREE,
KEY `taxonomies_species` (`species`) USING BTREE,
KEY `taxonomies_genus_species` (`genus`, `species`) USING BTREE
) ENGINE = INNODB AUTO_INCREMENT = 103 DEFAULT CHARSET = latin1

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY markers ALL PRIMARY 8803 Using where; Using temporary
1 PRIMARY g ref marker_id,dataset_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 genotypes_marker_dataset_allele1 8 markers.id,const 104 Using index; Distinct
6 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
6 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
6 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
5 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
5 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
5 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
4 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
4 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
4 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
3 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
3 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
3 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where
2 DEPENDENT SUBQUERY taxonomies ref PRIMARY,taxonomies_genus,taxonomies_species,taxonomies_genus_species taxonomies_species 257 const 1 Using index condition; Using where
2 DEPENDENT SUBQUERY genotypes ref marker_id,accession_id,genotypes_marker_dataset_allele1,genotypes_marker_allele1 marker_id 4 markers.id 115 Using where
2 DEPENDENT SUBQUERY accessions eq_ref PRIMARY,taxonomy_id PRIMARY 4 genotypes.accession_id 1 Using where






mysql index query-performance subquery






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 8 '16 at 10:13







Baz

















asked Jun 8 '16 at 9:00









BazBaz

105119




105119





bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 13 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31





















  • Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:16













  • @ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

    – Baz
    Jun 8 '16 at 9:29











  • The EXPLAIN output of the query.

    – ypercubeᵀᴹ
    Jun 8 '16 at 9:30













  • @ypercubeᵀᴹ Ok, I've added that as well.

    – Baz
    Jun 8 '16 at 10:13






  • 2





    I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

    – ypercubeᵀᴹ
    Jun 8 '16 at 10:31



















Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16







Show us the execution plan and the SHOW CREATE TABLE table_name; output for all the 4 tables.

– ypercubeᵀᴹ
Jun 8 '16 at 9:16















@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29





@ypercubeᵀᴹ I've added the create statements. What do you mean by execution plan?

– Baz
Jun 8 '16 at 9:29













The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30







The EXPLAIN output of the query.

– ypercubeᵀᴹ
Jun 8 '16 at 9:30















@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13





@ypercubeᵀᴹ Ok, I've added that as well.

– Baz
Jun 8 '16 at 10:13




2




2





I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31







I'd try an index on (a, marker_id, accession_id) and on (a, accession_id, marker_id). And since that a is a varchar, use string literals, not numbers: where a = '1'. If the stored values are only numbers, then convert it from varchar to an appropriate number type (tinyiny, smallint).

– ypercubeᵀᴹ
Jun 8 '16 at 10:31












1 Answer
1






active

oldest

votes


















0














g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer


























  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer


























  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01
















0














g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer


























  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01














0












0








0







g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"





share|improve this answer















g.dataset_id = 3 is killing the left

try this



FROM markers
JOIN genotypes g
ON g.marker_id = markers.id
AND g.dataset_id = 3


The where is killing the left joins in the exists so you can drop that

This may give you better performance



and exists 
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus1"
AND taxonomies.species = "species1"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus2"
AND taxonomies.species = "species2"
)
and exists
( SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies
ON taxonomies.id = accession.taxonomy_id
AND taxonomies.genus = "genus3"
AND taxonomies.species = "species3"
)


ypercube is correct you cannot combine all 3

This would mean that had to be equal on the same accession.taxonomy_id



SELECT 1
FROM genotypes
join accession
ON genotypes.accession_id = accession.id
AND genotypes.a = 1
AND genotypes.marker_id = markers.id
join taxonomies as t1
ON t1.id = accession.taxonomy_id
AND t1.genus = "genus1"
AND t1.species = "species1"
join taxonomies as t2
ON t2.id = accession.taxonomy_id
AND t2.genus = "genus2"
AND t2.species = "species2"
join taxonomies as t3
ON t3.id = accession.taxonomy_id
AND t3.genus = "genus3"
AND t3.species = "species3"






share|improve this answer














share|improve this answer



share|improve this answer








edited Jun 8 '16 at 12:27

























answered Jun 8 '16 at 10:38









paparazzopaparazzo

4,6141230




4,6141230













  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01



















  • @ypercubeᵀᴹ what is different?

    – paparazzo
    Jun 8 '16 at 10:41











  • @ypercubeᵀᴹ I will take another look at it

    – paparazzo
    Jun 8 '16 at 11:01

















@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41





@ypercubeᵀᴹ what is different?

– paparazzo
Jun 8 '16 at 10:41













@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01





@ypercubeᵀᴹ I will take another look at it

– paparazzo
Jun 8 '16 at 11:01


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f140702%2fcreate-index-for-multiple-where-exists-subqueries%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL Server 17 - Attemping to backup to remote NAS but Access is denied

Always On Availability groups resolving state after failover - Remote harden of transaction...

Restoring from pg_dump with foreign key constraints