Database performance improvements for current setup. (mysql - marriaDB)
I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.
SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:
A managed vps server
InnoDB storage for big tables
10.0.36-MariaDB
18gb ram
8 x 2.40 GHz CPU.
we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?
Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.
EDIT:
full query:
SELECT `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;
Create query:
CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8
Edit:
The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.
A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.
innodb optimization mariadb
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
|
show 1 more comment
I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.
SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:
A managed vps server
InnoDB storage for big tables
10.0.36-MariaDB
18gb ram
8 x 2.40 GHz CPU.
we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?
Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.
EDIT:
full query:
SELECT `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;
Create query:
CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8
Edit:
The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.
A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.
innodb optimization mariadb
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show theEXPLAIN {query}
output andSHOW CREATE TABLE {tblname}
? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?
– danblack
Sep 22 '18 at 8:00
explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log
– PaaPs
Sep 24 '18 at 10:01
This is rather incomprehensible. Please edit it into the question and format it neatly and include theSHOW CREATE TABLE {table}
information for table used. A full query would also help a lot.
– danblack
Sep 25 '18 at 0:29
added the create and explain queries.
– PaaPs
Sep 26 '18 at 13:38
Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?
– danblack
Sep 26 '18 at 22:33
|
show 1 more comment
I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.
SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:
A managed vps server
InnoDB storage for big tables
10.0.36-MariaDB
18gb ram
8 x 2.40 GHz CPU.
we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?
Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.
EDIT:
full query:
SELECT `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;
Create query:
CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8
Edit:
The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.
A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.
innodb optimization mariadb
I've currently got a pretty hefty database (for me anyway) with multiple tables exceeding 15 million rows. I'm trying to improve performance on the whole table lookup, I implemented memcached to cache the already parsed result, but the initial lookups are very slow. 20seconds + on a big query.
SELECT `something`, ROUND(SUM( ( `amount` / `something_of_value` ) * `column` ) WHERE ... (the where clause is huge based on many conditionals)
Current setup:
A managed vps server
InnoDB storage for big tables
10.0.36-MariaDB
18gb ram
8 x 2.40 GHz CPU.
we don't use query_cache because 25% of queries are write queries, and after some research if a write/update query is done - the cache is deleted. Is this a correct approach?
Whats the best way to improve the initial lookup without upgrading the hardware?
I probably left out a lot of important details, so if you need more info just ask.
If you can steer me in a good general direction that would be appreciated also. I know about query optimization, but we are trying to improve the time of lookup and generally mysql performance.
EDIT:
full query:
SELECT `metric`, SUM( `amount` )
FROM big_table
WHERE 1
AND (`metric` = '1' )
AND (`source` = 'some_unique_source'
OR `source` = 'some_other_unique_source'
OR `source` = 'yet_another_unique_source'
OR `source` = 'some_most_unique_source'
OR `source` = 'almost_last_unique_source'
OR `source` = 'the_last_unique_source'
)
AND (`platform` = '2'
OR `platform` = '1'
OR `platform` = '3'
)
AND (`account` = '1'
OR `account` = '2'
OR `account` = '3'
OR `account` = '4'
OR `account` = '5'
OR `account` = '6'
OR `account` = '7'
OR `account` = '8'
OR `account` = '9'
OR `account` = '10'
OR `account` = '11'
OR `account` = '12'
)
AND (`something_id` = 'some.unique.id' )
AND `target_date` >= '2018-08-27'
AND `target_date` <= '2018-08-27'
GROUP BY `metric`;
Create query:
CREATE TABLE `big_table` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL ,
`version` varchar(16) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
`tax` decimal(12,6) NOT NULL DEFAULT '0.000000' ,
`currency` varchar(3) DEFAULT NULL,
`currency_rate` decimal(12,6) DEFAULT '500.000000',
`rate_updated` int(11) NOT NULL DEFAULT '0',
`multiplier` decimal(16,6) NOT NULL DEFAULT '1.000000',
`unique_key` varchar(180) DEFAULT NULL ,
`caption` varchar(128) DEFAULT NULL,
`transaction_timestamp` timestamp NULL DEFAULT NULL ,
`finalised` tinyint(1) NOT NULL DEFAULT '0',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`,`currency`),
KEY `index_country` (`country`),
KEY `currency_rate` (`currency_rate`,`multiplier`)
) ENGINE=InnoDB AUTO_INCREMENT=43453089 DEFAULT CHARSET=utf8
Edit:
The date is always 1 day - the script runs a foreach date based on user inputted date range. The returned mysql result is parsed into a multidimensional array and then parsed into a json file after the datarange finishes.
Now that I think about it,a better approach may be to make a more intelligent query where the result would be grouped by date, though I don't know how much of an improvement that would be regarding speed. There are 5 main tabs, each selecting a different [main] thing - either grouping and selecting accounts, metrics, sources, platforms, countries and something_id, then the where clauses are also constructed from user input and may be different. This is a custom analytics dashboard if that helps understanding what we are using this for.
A lot of different selects can be chosen by the user and a custom query is constructed based on the user input. I've reduced the select size by excluding the countries because on default it loaded all > 250 countries as where clauses making the the query length pretty ridiculous and embarrassing. For clarification - countries are all marked as selected on default. Pasting it into the answer made me realise that it could be wastly improved. Removing countries if all are selected reduced the load time with parsing from 21 secs~ to 8-10secs (30day foreach loop with basically the select you see on a 14million database rows). Though if the user would exclude at least one country, the sql would be constructed with 250~ where countries. I'm thinking about making an algorithm where if there are only a few of unselected countries to make a where country <> 'unselected' instead of loading all the selected ones ultimately making the query less in size.
innodb optimization mariadb
innodb optimization mariadb
edited Oct 1 '18 at 5:52
PaaPs
asked Sep 21 '18 at 14:39
PaaPsPaaPs
12
12
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 1 min ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show theEXPLAIN {query}
output andSHOW CREATE TABLE {tblname}
? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?
– danblack
Sep 22 '18 at 8:00
explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log
– PaaPs
Sep 24 '18 at 10:01
This is rather incomprehensible. Please edit it into the question and format it neatly and include theSHOW CREATE TABLE {table}
information for table used. A full query would also help a lot.
– danblack
Sep 25 '18 at 0:29
added the create and explain queries.
– PaaPs
Sep 26 '18 at 13:38
Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?
– danblack
Sep 26 '18 at 22:33
|
show 1 more comment
yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show theEXPLAIN {query}
output andSHOW CREATE TABLE {tblname}
? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?
– danblack
Sep 22 '18 at 8:00
explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log
– PaaPs
Sep 24 '18 at 10:01
This is rather incomprehensible. Please edit it into the question and format it neatly and include theSHOW CREATE TABLE {table}
information for table used. A full query would also help a lot.
– danblack
Sep 25 '18 at 0:29
added the create and explain queries.
– PaaPs
Sep 26 '18 at 13:38
Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?
– danblack
Sep 26 '18 at 22:33
yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the
EXPLAIN {query}
output and SHOW CREATE TABLE {tblname}
? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?– danblack
Sep 22 '18 at 8:00
yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the
EXPLAIN {query}
output and SHOW CREATE TABLE {tblname}
? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?– danblack
Sep 22 '18 at 8:00
explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log
– PaaPs
Sep 24 '18 at 10:01
explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log
– PaaPs
Sep 24 '18 at 10:01
This is rather incomprehensible. Please edit it into the question and format it neatly and include the
SHOW CREATE TABLE {table}
information for table used. A full query would also help a lot.– danblack
Sep 25 '18 at 0:29
This is rather incomprehensible. Please edit it into the question and format it neatly and include the
SHOW CREATE TABLE {table}
information for table used. A full query would also help a lot.– danblack
Sep 25 '18 at 0:29
added the create and explain queries.
– PaaPs
Sep 26 '18 at 13:38
added the create and explain queries.
– PaaPs
Sep 26 '18 at 13:38
Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?
– danblack
Sep 26 '18 at 22:33
Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?
– danblack
Sep 26 '18 at 22:33
|
show 1 more comment
3 Answers
3
active
oldest
votes
As you are going to be grabbing 30 days you may as well use a GROUP BY target_date
and put the target_date
in the result fields, will save some query parsing overhead.
As the query is always going to iterate over the range of the target_dates
, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount
). An index should start with date
, and include all fields of the where clause and the amount
. As this ends up with a largish index recommend creating it separately to the existing target_date
index.
To make this query is faster when the user say selects one country, assuming this is a common case, a index country
, target_date
, {other fields} will aid this form of query. Likewise if a single metric
, something_id
or other field is common as a single value selection (i.e x=5
, but not x=5 or x=9
).
so: CREATE INDEX idx_date ON big_table ((target_date
+metric
+source
+platform
+account
+something_id
+country
+amount
)); Seems about right? Wouldn't this be overkill?
– PaaPs
Sep 27 '18 at 8:43
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
|
show 9 more comments
Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.
CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index with fields there's likely to be few single values of.
Populated will data group by all the parameters and amount totalled:
INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country
Again, add some indexes, starting with the columns likely to be tested with=
. Incrementally populate the summary table(s); don't rebuild from scratch.
– Rick James
Oct 1 '18 at 3:06
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
add a comment |
You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE
, then simply say target_date = '2018-08-27'
. This will help in designing the optimal index.
Start with the =
columns (in any order):
INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE
If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.
Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the =
columns. Don't bother with indexes longer than, say, 4 column.
Other issues:
int(2)
-- INT always takes 4 bytes; seeTINYINT
(etc) for space-savings.
DEFAULT NULL
-- useNOT NULL
when appropriate.- Change the
OR
s toIN
s. This won't improve performance, but it will be cleaner. - After moving to
IN
, you can saycountry NOT IN (short-list)
. - A Comment mentions
JOIN
vsLEFT JOIN
, but I don't see either??
If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.
(I added some Comments about summary tables onto danblack's Answers.)
Variants:
foo LIKE '%something'
-- can't usefoo
part of index due to leading wildcard
foo <> 'blah'
orfoo NOT IN ('blah')
-- probably treated like a "range", therefore,foo
may be harmful (to performance) if too early in index; may be helpful if last.
country NOT IN ('Foostan')
versuscountry IN (very-long-list)
-- the very-long-list takes some time to parse, etc; soNOT IN
will be slightly better.- Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.
Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes
-- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?
– danblack
Oct 1 '18 at 3:28
@danblack - "abc" will take 4 bytes in aVARCHAR(20)
column, regardless of theCHARACTER SET
of that column. (Well, except for utf16 and a few others.)CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.
– Rick James
Oct 1 '18 at 4:53
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include asomething_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.
– PaaPs
Oct 1 '18 at 5:51
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f218265%2fdatabase-performance-improvements-for-current-setup-mysql-marriadb%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
As you are going to be grabbing 30 days you may as well use a GROUP BY target_date
and put the target_date
in the result fields, will save some query parsing overhead.
As the query is always going to iterate over the range of the target_dates
, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount
). An index should start with date
, and include all fields of the where clause and the amount
. As this ends up with a largish index recommend creating it separately to the existing target_date
index.
To make this query is faster when the user say selects one country, assuming this is a common case, a index country
, target_date
, {other fields} will aid this form of query. Likewise if a single metric
, something_id
or other field is common as a single value selection (i.e x=5
, but not x=5 or x=9
).
so: CREATE INDEX idx_date ON big_table ((target_date
+metric
+source
+platform
+account
+something_id
+country
+amount
)); Seems about right? Wouldn't this be overkill?
– PaaPs
Sep 27 '18 at 8:43
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
|
show 9 more comments
As you are going to be grabbing 30 days you may as well use a GROUP BY target_date
and put the target_date
in the result fields, will save some query parsing overhead.
As the query is always going to iterate over the range of the target_dates
, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount
). An index should start with date
, and include all fields of the where clause and the amount
. As this ends up with a largish index recommend creating it separately to the existing target_date
index.
To make this query is faster when the user say selects one country, assuming this is a common case, a index country
, target_date
, {other fields} will aid this form of query. Likewise if a single metric
, something_id
or other field is common as a single value selection (i.e x=5
, but not x=5 or x=9
).
so: CREATE INDEX idx_date ON big_table ((target_date
+metric
+source
+platform
+account
+something_id
+country
+amount
)); Seems about right? Wouldn't this be overkill?
– PaaPs
Sep 27 '18 at 8:43
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
|
show 9 more comments
As you are going to be grabbing 30 days you may as well use a GROUP BY target_date
and put the target_date
in the result fields, will save some query parsing overhead.
As the query is always going to iterate over the range of the target_dates
, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount
). An index should start with date
, and include all fields of the where clause and the amount
. As this ends up with a largish index recommend creating it separately to the existing target_date
index.
To make this query is faster when the user say selects one country, assuming this is a common case, a index country
, target_date
, {other fields} will aid this form of query. Likewise if a single metric
, something_id
or other field is common as a single value selection (i.e x=5
, but not x=5 or x=9
).
As you are going to be grabbing 30 days you may as well use a GROUP BY target_date
and put the target_date
in the result fields, will save some query parsing overhead.
As the query is always going to iterate over the range of the target_dates
, and this will likely be the best column for the start of the index. To save the query from going back to the table space to fetch each row to further filter the result, or as a result (i.e. the amount
). An index should start with date
, and include all fields of the where clause and the amount
. As this ends up with a largish index recommend creating it separately to the existing target_date
index.
To make this query is faster when the user say selects one country, assuming this is a common case, a index country
, target_date
, {other fields} will aid this form of query. Likewise if a single metric
, something_id
or other field is common as a single value selection (i.e x=5
, but not x=5 or x=9
).
answered Sep 27 '18 at 8:26
danblackdanblack
2,1161214
2,1161214
so: CREATE INDEX idx_date ON big_table ((target_date
+metric
+source
+platform
+account
+something_id
+country
+amount
)); Seems about right? Wouldn't this be overkill?
– PaaPs
Sep 27 '18 at 8:43
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
|
show 9 more comments
so: CREATE INDEX idx_date ON big_table ((target_date
+metric
+source
+platform
+account
+something_id
+country
+amount
)); Seems about right? Wouldn't this be overkill?
– PaaPs
Sep 27 '18 at 8:43
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
so: CREATE INDEX idx_date ON big_table ((
target_date
+ metric
+ source
+ platform
+ account
+ something_id
+ country
+amount
)); Seems about right? Wouldn't this be overkill?– PaaPs
Sep 27 '18 at 8:43
so: CREATE INDEX idx_date ON big_table ((
target_date
+ metric
+ source
+ platform
+ account
+ something_id
+ country
+amount
)); Seems about right? Wouldn't this be overkill?– PaaPs
Sep 27 '18 at 8:43
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
looks right. Overkill? well that depends on the query benefit gained. Obviously more indexes eventually slows inserts. The alternate is getting a form of summary table going grouped by target_date.
– danblack
Sep 27 '18 at 8:47
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
created the indexes, load time decreased but 1-2 seconds. Thank you
– PaaPs
Sep 27 '18 at 12:26
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
1-2 seconds for 30 days worth?
– danblack
Sep 30 '18 at 2:00
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
yes. The time was reduced from 8 secs to around 6.
– PaaPs
Sep 30 '18 at 9:51
|
show 9 more comments
Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.
CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index with fields there's likely to be few single values of.
Populated will data group by all the parameters and amount totalled:
INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country
Again, add some indexes, starting with the columns likely to be tested with=
. Incrementally populate the summary table(s); don't rebuild from scratch.
– Rick James
Oct 1 '18 at 3:06
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
add a comment |
Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.
CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index with fields there's likely to be few single values of.
Populated will data group by all the parameters and amount totalled:
INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country
Again, add some indexes, starting with the columns likely to be tested with=
. Incrementally populate the summary table(s); don't rebuild from scratch.
– Rick James
Oct 1 '18 at 3:06
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
add a comment |
Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.
CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index with fields there's likely to be few single values of.
Populated will data group by all the parameters and amount totalled:
INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country
Option 2, based on the target_date data being generally static after creation you could create a summary table based on date.
CREATE TABLE `summary_of_big_table` (
`target_date` date NOT NULL,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`source` varchar(16) DEFAULT NULL ,
`platform` tinyint(1) NOT NULL DEFAULT '0' ,
`account` int(2) DEFAULT NULL ,
`something_id` varchar(128) DEFAULT 'EMPTY SOMETHING_ID',
`country` varchar(2) DEFAULT NULL ,
`amount` decimal(16,6) NOT NULL DEFAULT '0.000000' ,
PRIMARY KEY (`target_date`),
KEY `index_country` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index with fields there's likely to be few single values of.
Populated will data group by all the parameters and amount totalled:
INSERT INTO summary_of_big_table AS
SELECT target_date, metric, source,
platform, account, something_id, country,
SUM(amount) as amount
FROM big_table
GROUP BY target_date, metric, source,
platform, account, something_id, country
answered Sep 30 '18 at 2:14
danblackdanblack
2,1161214
2,1161214
Again, add some indexes, starting with the columns likely to be tested with=
. Incrementally populate the summary table(s); don't rebuild from scratch.
– Rick James
Oct 1 '18 at 3:06
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
add a comment |
Again, add some indexes, starting with the columns likely to be tested with=
. Incrementally populate the summary table(s); don't rebuild from scratch.
– Rick James
Oct 1 '18 at 3:06
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
Again, add some indexes, starting with the columns likely to be tested with
=
. Incrementally populate the summary table(s); don't rebuild from scratch.– Rick James
Oct 1 '18 at 3:06
Again, add some indexes, starting with the columns likely to be tested with
=
. Incrementally populate the summary table(s); don't rebuild from scratch.– Rick James
Oct 1 '18 at 3:06
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
I probably won't go trough the hassle of rebuilding the table and reimporting the data right now. But will keep in mind that this is a possibility. Thank you
– PaaPs
Oct 1 '18 at 5:42
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
@PaaPs - A Summary Table is an extra table, not a rebuild of the main table. No "reimporting", however it does involve fetching all the data from the main table to initialize the summary table.
– Rick James
Oct 1 '18 at 15:55
add a comment |
You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE
, then simply say target_date = '2018-08-27'
. This will help in designing the optimal index.
Start with the =
columns (in any order):
INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE
If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.
Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the =
columns. Don't bother with indexes longer than, say, 4 column.
Other issues:
int(2)
-- INT always takes 4 bytes; seeTINYINT
(etc) for space-savings.
DEFAULT NULL
-- useNOT NULL
when appropriate.- Change the
OR
s toIN
s. This won't improve performance, but it will be cleaner. - After moving to
IN
, you can saycountry NOT IN (short-list)
. - A Comment mentions
JOIN
vsLEFT JOIN
, but I don't see either??
If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.
(I added some Comments about summary tables onto danblack's Answers.)
Variants:
foo LIKE '%something'
-- can't usefoo
part of index due to leading wildcard
foo <> 'blah'
orfoo NOT IN ('blah')
-- probably treated like a "range", therefore,foo
may be harmful (to performance) if too early in index; may be helpful if last.
country NOT IN ('Foostan')
versuscountry IN (very-long-list)
-- the very-long-list takes some time to parse, etc; soNOT IN
will be slightly better.- Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.
Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes
-- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?
– danblack
Oct 1 '18 at 3:28
@danblack - "abc" will take 4 bytes in aVARCHAR(20)
column, regardless of theCHARACTER SET
of that column. (Well, except for utf16 and a few others.)CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.
– Rick James
Oct 1 '18 at 4:53
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include asomething_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.
– PaaPs
Oct 1 '18 at 5:51
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
add a comment |
You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE
, then simply say target_date = '2018-08-27'
. This will help in designing the optimal index.
Start with the =
columns (in any order):
INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE
If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.
Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the =
columns. Don't bother with indexes longer than, say, 4 column.
Other issues:
int(2)
-- INT always takes 4 bytes; seeTINYINT
(etc) for space-savings.
DEFAULT NULL
-- useNOT NULL
when appropriate.- Change the
OR
s toIN
s. This won't improve performance, but it will be cleaner. - After moving to
IN
, you can saycountry NOT IN (short-list)
. - A Comment mentions
JOIN
vsLEFT JOIN
, but I don't see either??
If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.
(I added some Comments about summary tables onto danblack's Answers.)
Variants:
foo LIKE '%something'
-- can't usefoo
part of index due to leading wildcard
foo <> 'blah'
orfoo NOT IN ('blah')
-- probably treated like a "range", therefore,foo
may be harmful (to performance) if too early in index; may be helpful if last.
country NOT IN ('Foostan')
versuscountry IN (very-long-list)
-- the very-long-list takes some time to parse, etc; soNOT IN
will be slightly better.- Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.
Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes
-- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?
– danblack
Oct 1 '18 at 3:28
@danblack - "abc" will take 4 bytes in aVARCHAR(20)
column, regardless of theCHARACTER SET
of that column. (Well, except for utf16 and a few others.)CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.
– Rick James
Oct 1 '18 at 4:53
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include asomething_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.
– PaaPs
Oct 1 '18 at 5:51
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
add a comment |
You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE
, then simply say target_date = '2018-08-27'
. This will help in designing the optimal index.
Start with the =
columns (in any order):
INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE
If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.
Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the =
columns. Don't bother with indexes longer than, say, 4 column.
Other issues:
int(2)
-- INT always takes 4 bytes; seeTINYINT
(etc) for space-savings.
DEFAULT NULL
-- useNOT NULL
when appropriate.- Change the
OR
s toIN
s. This won't improve performance, but it will be cleaner. - After moving to
IN
, you can saycountry NOT IN (short-list)
. - A Comment mentions
JOIN
vsLEFT JOIN
, but I don't see either??
If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.
(I added some Comments about summary tables onto danblack's Answers.)
Variants:
foo LIKE '%something'
-- can't usefoo
part of index due to leading wildcard
foo <> 'blah'
orfoo NOT IN ('blah')
-- probably treated like a "range", therefore,foo
may be harmful (to performance) if too early in index; may be helpful if last.
country NOT IN ('Foostan')
versuscountry IN (very-long-list)
-- the very-long-list takes some time to parse, etc; soNOT IN
will be slightly better.- Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.
Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes
-- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.
You don't have 1 day, you have 2. If you really wanted only one day and the column is a DATE
, then simply say target_date = '2018-08-27'
. This will help in designing the optimal index.
Start with the =
columns (in any order):
INDEX(something_id, metric, target_date,
xxxx) -- then add one other column in the WHERE
If there are only 3 "platforms", then leave out the test for it. I assume you are dynamically building the query. So put a little more effort into the UI to optimize things as you go.
Unfortunately, you may need multiple indexes to handle multiple use cases. It is not reasonable to have more than, say, 10 indexes. So design indexes that are likely to be used and start with the =
columns. Don't bother with indexes longer than, say, 4 column.
Other issues:
int(2)
-- INT always takes 4 bytes; seeTINYINT
(etc) for space-savings.
DEFAULT NULL
-- useNOT NULL
when appropriate.- Change the
OR
s toIN
s. This won't improve performance, but it will be cleaner. - After moving to
IN
, you can saycountry NOT IN (short-list)
. - A Comment mentions
JOIN
vsLEFT JOIN
, but I don't see either??
If the data is never modified after it is inserted, then building and maintaining Summary Tables is the way to go. Carefully designed, about 5 such tables could handle maybe 50 different queries. And the queries against these would run perhaps 10 times as fast. For more discussion, provide a list of the likely queries.
(I added some Comments about summary tables onto danblack's Answers.)
Variants:
foo LIKE '%something'
-- can't usefoo
part of index due to leading wildcard
foo <> 'blah'
orfoo NOT IN ('blah')
-- probably treated like a "range", therefore,foo
may be harmful (to performance) if too early in index; may be helpful if last.
country NOT IN ('Foostan')
versuscountry IN (very-long-list)
-- the very-long-list takes some time to parse, etc; soNOT IN
will be slightly better.- Elimination of "all options checked" -- Do it. The Optimizer can't tell the difference between that and "some checked". So, yes, it is a potential optimization.
Once you have done things like those, the next challenge will be to pick a limited set of 'composite' INDEXes
-- perhaps 5 indexes with 3 columns each. Keep track of the queries that users hit you with. The slowlog is one way to do this.
edited Oct 1 '18 at 15:51
answered Oct 1 '18 at 2:52
Rick JamesRick James
43.7k22259
43.7k22259
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?
– danblack
Oct 1 '18 at 3:28
@danblack - "abc" will take 4 bytes in aVARCHAR(20)
column, regardless of theCHARACTER SET
of that column. (Well, except for utf16 and a few others.)CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.
– Rick James
Oct 1 '18 at 4:53
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include asomething_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.
– PaaPs
Oct 1 '18 at 5:51
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
add a comment |
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?
– danblack
Oct 1 '18 at 3:28
@danblack - "abc" will take 4 bytes in aVARCHAR(20)
column, regardless of theCHARACTER SET
of that column. (Well, except for utf16 and a few others.)CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.
– Rick James
Oct 1 '18 at 4:53
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include asomething_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.
– PaaPs
Oct 1 '18 at 5:51
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?– danblack
Oct 1 '18 at 3:28
JOIN
comments was before there was a query posted. On space saving is non-UTF8 varchar columns where the dataset is only latin going to help?– danblack
Oct 1 '18 at 3:28
@danblack - "abc" will take 4 bytes in a
VARCHAR(20)
column, regardless of the CHARACTER SET
of that column. (Well, except for utf16 and a few others.) CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.– Rick James
Oct 1 '18 at 4:53
@danblack - "abc" will take 4 bytes in a
VARCHAR(20)
column, regardless of the CHARACTER SET
of that column. (Well, except for utf16 and a few others.) CHAR(..)
may be a different matter. Also temp tables may matter. But nothing egregious in the current schema.– Rick James
Oct 1 '18 at 4:53
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a
something_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.– PaaPs
Oct 1 '18 at 5:51
I altered the table information and query information before posting, this is causing some confusion, sorry for that. In reality the script is only creating one day date ranges. Some of the query variations include a
something_id
IS LIKE '%something%'; and a variations of selected accounts, metrics, platforms and sources. I think a big help for default load times will be query optimization by excluding where selects if all options are selected, and generating a <> 'something' if less options are unselected than selected.– PaaPs
Oct 1 '18 at 5:51
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
@PaaPs - I responded to your Comment.
– Rick James
Oct 1 '18 at 15:52
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f218265%2fdatabase-performance-improvements-for-current-setup-mysql-marriadb%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
yes, not using query_cache for this case is a good idea. You could try other optimizer_switch settings. Overall you need to improve the query and the tables. Can show the
EXPLAIN {query}
output andSHOW CREATE TABLE {tblname}
? Do you think you have index the table(s) right? Have you used JOIN rather than LEFT JOIN whereever possible. For multiple conditions on a single table have you considered a generated column with index?– danblack
Sep 22 '18 at 8:00
explain output -> id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE table_name range target_date,index_country,metric target_date 4 NULL 153856 Using index condition; Using where; Using temporary; Using filesort I'm not using joins for this table at all, this query took 144seconds in the slow_query log
– PaaPs
Sep 24 '18 at 10:01
This is rather incomprehensible. Please edit it into the question and format it neatly and include the
SHOW CREATE TABLE {table}
information for table used. A full query would also help a lot.– danblack
Sep 25 '18 at 0:29
added the create and explain queries.
– PaaPs
Sep 26 '18 at 13:38
Great start. Is it most common for this query to be on a small date range with respect to the size of the data? Is metric=1 always used? Your list of sources, is this always the list that is used? Is the platform list always used? Is the account list always used? Do these sets of sources/platforms/accounts correspond to a logical grouping?
– danblack
Sep 26 '18 at 22:33