Database performance improvements for current setup. (mysql

Database performance improvements for current setup. (mysql - marriaDB)

I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.

SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:

A managed vps server

InnoDB storage for big tables

10.0.36-MariaDB

18gb ram

8 x 2.40 GHz CPU.

we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?

Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.

EDIT:

full query:

SELECT  `metric`, SUM( `amount` )

    FROM  big_table

    WHERE  1

      AND  (`metric` = '1' )

      AND  (`source` = 'some_unique_source'

              OR  `source` = 'some_other_unique_source'

              OR  `source` = 'yet_another_unique_source'

              OR  `source` = 'some_most_unique_source'

              OR  `source` = 'almost_last_unique_source'

              OR  `source` = 'the_last_unique_source' 

           )

      AND  (`platform` = '2'

              OR  `platform` = '1'

              OR  `platform` = '3' 

           )

      AND  (`account` = '1'

              OR  `account` = '2'

              OR  `account` = '3'

              OR  `account` = '4'

              OR  `account` = '5'

              OR  `account` = '6'

              OR  `account` = '7'

              OR  `account` = '8'

              OR  `account` = '9'

              OR  `account` = '10'

              OR  `account` = '11'

              OR  `account` = '12' 

           )

      AND  (`something_id` = 'some.unique.id' )

      AND  `target_date` >= '2018-08-27'

      AND  `target_date` <= '2018-08-27'

    GROUP BY  `metric`;

Create query:

CREATE TABLE `big_table` (

  `stat_id` int(8) NOT NULL AUTO_INCREMENT,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `target_date` date DEFAULT NULL,

  `country` varchar(2) DEFAULT NULL ,

  `version` varchar(16) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  `tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,

  `currency` varchar(3) DEFAULT NULL,

  `currency_rate` decimal(12,6) DEFAULT '500.000000',

  `rate_updated` int(11) NOT NULL DEFAULT '0',

  `multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',

  `unique_key` varchar(180) DEFAULT NULL ,

  `caption` varchar(128) DEFAULT NULL,

  `transaction_timestamp` timestamp NULL DEFAULT NULL ,

  `finalised` tinyint(1) NOT NULL DEFAULT '0',

  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  `modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  PRIMARY KEY (`stat_id`),

  UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,

  KEY `target_date` (`target_date`,`currency`),

  KEY `index_country` (`country`),

  KEY `currency_rate` (`currency_rate`,`multiplier`)

) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8

Edit:

The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.

A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.

edited Oct 1 '18 at 5:52

asked Sep 21 '18 at 14:39

PaaPs

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00

explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01

This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29

added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38

Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33

|
show 1 more comment

SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:

A managed vps server

InnoDB storage for big tables

10.0.36-MariaDB

18gb ram

8 x 2.40 GHz CPU.

we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?

EDIT:

full query:

SELECT  `metric`, SUM( `amount` )

    FROM  big_table

    WHERE  1

      AND  (`metric` = '1' )

      AND  (`source` = 'some_unique_source'

              OR  `source` = 'some_other_unique_source'

              OR  `source` = 'yet_another_unique_source'

              OR  `source` = 'some_most_unique_source'

              OR  `source` = 'almost_last_unique_source'

              OR  `source` = 'the_last_unique_source' 

           )

      AND  (`platform` = '2'

              OR  `platform` = '1'

              OR  `platform` = '3' 

           )

      AND  (`account` = '1'

              OR  `account` = '2'

              OR  `account` = '3'

              OR  `account` = '4'

              OR  `account` = '5'

              OR  `account` = '6'

              OR  `account` = '7'

              OR  `account` = '8'

              OR  `account` = '9'

              OR  `account` = '10'

              OR  `account` = '11'

              OR  `account` = '12' 

           )

      AND  (`something_id` = 'some.unique.id' )

      AND  `target_date` >= '2018-08-27'

      AND  `target_date` <= '2018-08-27'

    GROUP BY  `metric`;

Create query:

CREATE TABLE `big_table` (

  `stat_id` int(8) NOT NULL AUTO_INCREMENT,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `target_date` date DEFAULT NULL,

  `country` varchar(2) DEFAULT NULL ,

  `version` varchar(16) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  `tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,

  `currency` varchar(3) DEFAULT NULL,

  `currency_rate` decimal(12,6) DEFAULT '500.000000',

  `rate_updated` int(11) NOT NULL DEFAULT '0',

  `multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',

  `unique_key` varchar(180) DEFAULT NULL ,

  `caption` varchar(128) DEFAULT NULL,

  `transaction_timestamp` timestamp NULL DEFAULT NULL ,

  `finalised` tinyint(1) NOT NULL DEFAULT '0',

  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  `modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  PRIMARY KEY (`stat_id`),

  UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,

  KEY `target_date` (`target_date`,`currency`),

  KEY `index_country` (`country`),

  KEY `currency_rate` (`currency_rate`,`multiplier`)

) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8

Edit:

edited Oct 1 '18 at 5:52

asked Sep 21 '18 at 14:39

PaaPs

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00

explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01

This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29

added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38

Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33

|
show 1 more comment

SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:

A managed vps server

InnoDB storage for big tables

10.0.36-MariaDB

18gb ram

8 x 2.40 GHz CPU.

we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?

EDIT:

full query:

SELECT  `metric`, SUM( `amount` )

    FROM  big_table

    WHERE  1

      AND  (`metric` = '1' )

      AND  (`source` = 'some_unique_source'

              OR  `source` = 'some_other_unique_source'

              OR  `source` = 'yet_another_unique_source'

              OR  `source` = 'some_most_unique_source'

              OR  `source` = 'almost_last_unique_source'

              OR  `source` = 'the_last_unique_source' 

           )

      AND  (`platform` = '2'

              OR  `platform` = '1'

              OR  `platform` = '3' 

           )

      AND  (`account` = '1'

              OR  `account` = '2'

              OR  `account` = '3'

              OR  `account` = '4'

              OR  `account` = '5'

              OR  `account` = '6'

              OR  `account` = '7'

              OR  `account` = '8'

              OR  `account` = '9'

              OR  `account` = '10'

              OR  `account` = '11'

              OR  `account` = '12' 

           )

      AND  (`something_id` = 'some.unique.id' )

      AND  `target_date` >= '2018-08-27'

      AND  `target_date` <= '2018-08-27'

    GROUP BY  `metric`;

Create query:

CREATE TABLE `big_table` (

  `stat_id` int(8) NOT NULL AUTO_INCREMENT,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `target_date` date DEFAULT NULL,

  `country` varchar(2) DEFAULT NULL ,

  `version` varchar(16) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  `tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,

  `currency` varchar(3) DEFAULT NULL,

  `currency_rate` decimal(12,6) DEFAULT '500.000000',

  `rate_updated` int(11) NOT NULL DEFAULT '0',

  `multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',

  `unique_key` varchar(180) DEFAULT NULL ,

  `caption` varchar(128) DEFAULT NULL,

  `transaction_timestamp` timestamp NULL DEFAULT NULL ,

  `finalised` tinyint(1) NOT NULL DEFAULT '0',

  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  `modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  PRIMARY KEY (`stat_id`),

  UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,

  KEY `target_date` (`target_date`,`currency`),

  KEY `index_country` (`country`),

  KEY `currency_rate` (`currency_rate`,`multiplier`)

) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8

Edit:

edited Oct 1 '18 at 5:52

asked Sep 21 '18 at 14:39

PaaPs

SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:

A managed vps server

InnoDB storage for big tables

10.0.36-MariaDB

18gb ram

8 x 2.40 GHz CPU.

we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?

EDIT:

full query:

SELECT  `metric`, SUM( `amount` )

    FROM  big_table

    WHERE  1

      AND  (`metric` = '1' )

      AND  (`source` = 'some_unique_source'

              OR  `source` = 'some_other_unique_source'

              OR  `source` = 'yet_another_unique_source'

              OR  `source` = 'some_most_unique_source'

              OR  `source` = 'almost_last_unique_source'

              OR  `source` = 'the_last_unique_source' 

           )

      AND  (`platform` = '2'

              OR  `platform` = '1'

              OR  `platform` = '3' 

           )

      AND  (`account` = '1'

              OR  `account` = '2'

              OR  `account` = '3'

              OR  `account` = '4'

              OR  `account` = '5'

              OR  `account` = '6'

              OR  `account` = '7'

              OR  `account` = '8'

              OR  `account` = '9'

              OR  `account` = '10'

              OR  `account` = '11'

              OR  `account` = '12' 

           )

      AND  (`something_id` = 'some.unique.id' )

      AND  `target_date` >= '2018-08-27'

      AND  `target_date` <= '2018-08-27'

    GROUP BY  `metric`;

Create query:

CREATE TABLE `big_table` (

  `stat_id` int(8) NOT NULL AUTO_INCREMENT,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `target_date` date DEFAULT NULL,

  `country` varchar(2) DEFAULT NULL ,

  `version` varchar(16) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  `tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,

  `currency` varchar(3) DEFAULT NULL,

  `currency_rate` decimal(12,6) DEFAULT '500.000000',

  `rate_updated` int(11) NOT NULL DEFAULT '0',

  `multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',

  `unique_key` varchar(180) DEFAULT NULL ,

  `caption` varchar(128) DEFAULT NULL,

  `transaction_timestamp` timestamp NULL DEFAULT NULL ,

  `finalised` tinyint(1) NOT NULL DEFAULT '0',

  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  `modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  PRIMARY KEY (`stat_id`),

  UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,

  KEY `target_date` (`target_date`,`currency`),

  KEY `index_country` (`country`),

  KEY `currency_rate` (`currency_rate`,`multiplier`)

) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8

Edit:

innodb optimization mariadb

edited Oct 1 '18 at 5:52

asked Sep 21 '18 at 14:39

PaaPs

edited Oct 1 '18 at 5:52

asked Sep 21 '18 at 14:39

PaaPs

edited Oct 1 '18 at 5:52

asked Sep 21 '18 at 14:39

PaaPs

asked Sep 21 '18 at 14:39

PaaPs

asked Sep 21 '18 at 14:39

PaaPs

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 1 min ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00

explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01

This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29

added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38

Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33

|
show 1 more comment

yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00

explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01

This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29

added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38

Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33

yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the EXPLAIN {query} output and SHOW CREATE TABLE {tblname}? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?

– danblack
Sep 22 '18 at 8:00

explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log

– PaaPs
Sep 24 '18 at 10:01

This is rather incomprehensible. Please edit it into the question and format it neatly and include the SHOW CREATE TABLE {table} information for table used. A full query would also help a lot.

– danblack
Sep 25 '18 at 0:29

added the create and explain queries.

– PaaPs
Sep 26 '18 at 13:38

Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?

– danblack
Sep 26 '18 at 22:33

|
show 1 more comment

3 Answers
3

active

oldest

votes

As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.

As the query is always going to iterate over the range of the target_dates, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount). An index should start with date, and include all fields of the where clause and the amount. As this ends up with a largish index recommend creating it separately to the existing target_date index.

To make this query is faster when the user say selects one country, assuming this is a common case, a index country, target_date, {other fields} will aid this form of query. Likewise if a single metric, something_id or other field is common as a single value selection (i.e x=5, but not x=5 or x=9).

answered Sep 27 '18 at 8:26

danblack

2,1161214

so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43

looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47

created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26

1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00

yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51

|
show 9 more comments

Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.

CREATE TABLE `summary_of_big_table` (

  `target_date` date NOT NULL,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `country` varchar(2) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  PRIMARY KEY (`target_date`),

  KEY `index_country` (`country`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Index with fields there's likely to be few single values of.

Populated will data group by all the parameters and amount totalled:

INSERT INTO summary_of_big_table AS

SELECT target_date, metric, source,

  platform, account, something_id, country,

  SUM(amount) as amount

FROM big_table

GROUP BY target_date, metric, source,

  platform, account, something_id, country

answered Sep 30 '18 at 2:14

danblack

2,1161214

Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06

I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42

@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55

add a comment |

You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.

Start with the = columns (in any order):

INDEX(something_id, metric, target_date,

      xxxx)   -- then add one other column in the WHERE

If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.

Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the = columns. Don't bother with indexes longer than, say, 4 column.

Other issues:

int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.

DEFAULT NULL -- use NOT NULL when appropriate.

Change the ORs to INs. This won't improve performance, but it will be cleaner.

After moving to IN, you can say country NOT IN (short-list).

A Comment mentions JOIN vs LEFT JOIN, but I don't see either??

If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.

(I added some Comments about summary tables onto danblack's Answers.)

Variants:

foo LIKE '%something' -- can't use foo part of index due to leading wildcard

foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.

country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.

Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes -- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.

edited Oct 1 '18 at 15:51

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28

@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53

I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51

@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f218265%2fdatabase-performance-improvements-for-current-setup-mysql-marriadb%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.

answered Sep 27 '18 at 8:26

danblack

2,1161214

so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43

looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47

created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26

1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00

yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51

|
show 9 more comments

As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.

answered Sep 27 '18 at 8:26

danblack

2,1161214

so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43

looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47

created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26

1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00

yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51

|
show 9 more comments

As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.

answered Sep 27 '18 at 8:26

danblack

2,1161214

As you are going to be grabbing 30 days you may as well use a GROUP BY target_date and put the target_date in the result fields, will save some query parsing overhead.

answered Sep 27 '18 at 8:26

danblack

2,1161214

answered Sep 27 '18 at 8:26

danblack

2,1161214

answered Sep 27 '18 at 8:26

danblack

2,1161214

answered Sep 27 '18 at 8:26

danblack

2,1161214

so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43

looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47

created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26

1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00

yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51

|
show 9 more comments

so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43

looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47

created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26

1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00

yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51

so: CREATE INDEX idx_date ON big_table (( target_date + metric + source + platform + account + something_id + country +amount )); Seems about right? Wouldn't this be overkill?

– PaaPs
Sep 27 '18 at 8:43

looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.

– danblack
Sep 27 '18 at 8:47

created the indexes, load time decreased but 1-2 seconds. Thank you

– PaaPs
Sep 27 '18 at 12:26

1-2 seconds for 30 days worth?

– danblack
Sep 30 '18 at 2:00

yes. The time was reduced from 8 secs to around 6.

– PaaPs
Sep 30 '18 at 9:51

|
show 9 more comments

Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.

CREATE TABLE `summary_of_big_table` (

  `target_date` date NOT NULL,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `country` varchar(2) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  PRIMARY KEY (`target_date`),

  KEY `index_country` (`country`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Index with fields there's likely to be few single values of.

Populated will data group by all the parameters and amount totalled:

INSERT INTO summary_of_big_table AS

SELECT target_date, metric, source,

  platform, account, something_id, country,

  SUM(amount) as amount

FROM big_table

GROUP BY target_date, metric, source,

  platform, account, something_id, country

answered Sep 30 '18 at 2:14

danblack

2,1161214

Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06

I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42

@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55

add a comment |

Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.

CREATE TABLE `summary_of_big_table` (

  `target_date` date NOT NULL,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `country` varchar(2) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  PRIMARY KEY (`target_date`),

  KEY `index_country` (`country`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Index with fields there's likely to be few single values of.

Populated will data group by all the parameters and amount totalled:

INSERT INTO summary_of_big_table AS

SELECT target_date, metric, source,

  platform, account, something_id, country,

  SUM(amount) as amount

FROM big_table

GROUP BY target_date, metric, source,

  platform, account, something_id, country

answered Sep 30 '18 at 2:14

danblack

2,1161214

Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06

I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42

@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55

add a comment |

Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.

CREATE TABLE `summary_of_big_table` (

  `target_date` date NOT NULL,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `country` varchar(2) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  PRIMARY KEY (`target_date`),

  KEY `index_country` (`country`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Index with fields there's likely to be few single values of.

Populated will data group by all the parameters and amount totalled:

INSERT INTO summary_of_big_table AS

SELECT target_date, metric, source,

  platform, account, something_id, country,

  SUM(amount) as amount

FROM big_table

GROUP BY target_date, metric, source,

  platform, account, something_id, country

answered Sep 30 '18 at 2:14

danblack

2,1161214

Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.

CREATE TABLE `summary_of_big_table` (

  `target_date` date NOT NULL,

  `metric` tinyint(1) NOT NULL DEFAULT '0',

  `source` varchar(16) DEFAULT NULL ,

  `platform` tinyint(1) NOT NULL DEFAULT '0' ,

  `account` int(2) DEFAULT NULL ,

  `something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',

  `country` varchar(2) DEFAULT NULL ,

  `amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,

  PRIMARY KEY (`target_date`),

  KEY `index_country` (`country`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Index with fields there's likely to be few single values of.

Populated will data group by all the parameters and amount totalled:

INSERT INTO summary_of_big_table AS

SELECT target_date, metric, source,

  platform, account, something_id, country,

  SUM(amount) as amount

FROM big_table

GROUP BY target_date, metric, source,

  platform, account, something_id, country

answered Sep 30 '18 at 2:14

danblack

2,1161214

answered Sep 30 '18 at 2:14

danblack

2,1161214

answered Sep 30 '18 at 2:14

danblack

2,1161214

answered Sep 30 '18 at 2:14

danblack

2,1161214

Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06

I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42

@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55

add a comment |

Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06

I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42

@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55

Again, add some indexes, starting with the columns likely to be tested with =. Incrementally populate the summary table(s); don't rebuild from scratch.

– Rick James
Oct 1 '18 at 3:06

I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you

– PaaPs
Oct 1 '18 at 5:42

@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.

– Rick James
Oct 1 '18 at 15:55

add a comment |

You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.

Start with the = columns (in any order):

INDEX(something_id, metric, target_date,

      xxxx)   -- then add one other column in the WHERE

If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.

Other issues:

int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.

DEFAULT NULL -- use NOT NULL when appropriate.

Change the ORs to INs. This won't improve performance, but it will be cleaner.

After moving to IN, you can say country NOT IN (short-list).

A Comment mentions JOIN vs LEFT JOIN, but I don't see either??

(I added some Comments about summary tables onto danblack's Answers.)

Variants:

foo LIKE '%something' -- can't use foo part of index due to leading wildcard

foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.

country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.

edited Oct 1 '18 at 15:51

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28

@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53

I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51

@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52

add a comment |

You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.

Start with the = columns (in any order):

INDEX(something_id, metric, target_date,

      xxxx)   -- then add one other column in the WHERE

If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.

Other issues:

int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.

DEFAULT NULL -- use NOT NULL when appropriate.

Change the ORs to INs. This won't improve performance, but it will be cleaner.

After moving to IN, you can say country NOT IN (short-list).

A Comment mentions JOIN vs LEFT JOIN, but I don't see either??

(I added some Comments about summary tables onto danblack's Answers.)

Variants:

foo LIKE '%something' -- can't use foo part of index due to leading wildcard

foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.

country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.

edited Oct 1 '18 at 15:51

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28

@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53

I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51

@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52

add a comment |

You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.

Start with the = columns (in any order):

INDEX(something_id, metric, target_date,

      xxxx)   -- then add one other column in the WHERE

If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.

Other issues:

int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.

DEFAULT NULL -- use NOT NULL when appropriate.

Change the ORs to INs. This won't improve performance, but it will be cleaner.

After moving to IN, you can say country NOT IN (short-list).

A Comment mentions JOIN vs LEFT JOIN, but I don't see either??

(I added some Comments about summary tables onto danblack's Answers.)

Variants:

foo LIKE '%something' -- can't use foo part of index due to leading wildcard

foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.

country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.

edited Oct 1 '18 at 15:51

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE, then simply say target_date = '2018-08-27'. This will help in designing the optimal index.

Start with the = columns (in any order):

INDEX(something_id, metric, target_date,

      xxxx)   -- then add one other column in the WHERE

If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.

Other issues:

int(2) -- INT always takes 4 bytes; see TINYINT (etc) for space-savings.

DEFAULT NULL -- use NOT NULL when appropriate.

Change the ORs to INs. This won't improve performance, but it will be cleaner.

After moving to IN, you can say country NOT IN (short-list).

A Comment mentions JOIN vs LEFT JOIN, but I don't see either??

(I added some Comments about summary tables onto danblack's Answers.)

Variants:

foo LIKE '%something' -- can't use foo part of index due to leading wildcard

foo <> 'blah' or foo NOT IN ('blah') -- probably treated like a "range", therefore, foo may be harmful (to performance) if too early in index; may be helpful if last.

country NOT IN ('Foostan') versus country IN (very-long-list) -- the very-long-list takes some time to parse, etc; so NOT IN will be slightly better.

Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.

edited Oct 1 '18 at 15:51

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

edited Oct 1 '18 at 15:51

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

answered Oct 1 '18 at 2:52

Rick James

43.7k22259

JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28

@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53

I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51

@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52

add a comment |

JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28

@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53

I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51

@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52

JOIN comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?

– danblack
Oct 1 '18 at 3:28

@danblack - "abc" will take 4 bytes in a VARCHAR(20) column, regardless of the CHARACTER SET of that column. (Well, except for utf16 and a few others.) CHAR(..) may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.

– Rick James
Oct 1 '18 at 4:53

I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a something_id IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.

– PaaPs
Oct 1 '18 at 5:51

@PaaPs - I responded to your Comment.

– Rick James
Oct 1 '18 at 15:52

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

q,zbTDX hHMNPIPUg d,Ut,fKJWlRiFLuSOX1tUi W 8 vI tSsWtWVhBu3e p,k7lTej Js6eAvf8

搜尋此網誌

Xrhrft