How do partitions handle new data?
I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.
My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).
How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?
From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).
Edit to include requested information
Will your data be bigger than what can be cached in RAM?: Yes
There are actually two tables: a symbol_history
and a symbol_current
keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.
At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.
I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).
Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.
Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.
Any thoughts are appreciated.
CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;
Here's symbol_history:
CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;
mysql partitioning
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.
My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).
How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?
From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).
Edit to include requested information
Will your data be bigger than what can be cached in RAM?: Yes
There are actually two tables: a symbol_history
and a symbol_current
keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.
At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.
I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).
Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.
Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.
Any thoughts are appreciated.
CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;
Here's symbol_history:
CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;
mysql partitioning
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
How many rows now in Current? Do you have innodb_file_per_table set?
– Rick James
Aug 28 '17 at 22:10
symbol_current = 638k rows at the moment. innodb_file_per_table = ON
– DrDamnit
Aug 28 '17 at 22:29
So, 200MB for the table; Data_free is not big; hence no need forOPTIMIZE TABLE
.
– Rick James
Aug 28 '17 at 22:33
add a comment |
I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.
My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).
How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?
From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).
Edit to include requested information
Will your data be bigger than what can be cached in RAM?: Yes
There are actually two tables: a symbol_history
and a symbol_current
keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.
At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.
I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).
Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.
Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.
Any thoughts are appreciated.
CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;
Here's symbol_history:
CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;
mysql partitioning
I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.
My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).
How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?
From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).
Edit to include requested information
Will your data be bigger than what can be cached in RAM?: Yes
There are actually two tables: a symbol_history
and a symbol_current
keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.
At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.
I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).
Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.
Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.
Any thoughts are appreciated.
CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;
Here's symbol_history:
CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;
mysql partitioning
mysql partitioning
edited Aug 28 '17 at 21:45
DrDamnit
asked Aug 28 '17 at 14:03
DrDamnitDrDamnit
15029
15029
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
How many rows now in Current? Do you have innodb_file_per_table set?
– Rick James
Aug 28 '17 at 22:10
symbol_current = 638k rows at the moment. innodb_file_per_table = ON
– DrDamnit
Aug 28 '17 at 22:29
So, 200MB for the table; Data_free is not big; hence no need forOPTIMIZE TABLE
.
– Rick James
Aug 28 '17 at 22:33
add a comment |
1
How many rows now in Current? Do you have innodb_file_per_table set?
– Rick James
Aug 28 '17 at 22:10
symbol_current = 638k rows at the moment. innodb_file_per_table = ON
– DrDamnit
Aug 28 '17 at 22:29
So, 200MB for the table; Data_free is not big; hence no need forOPTIMIZE TABLE
.
– Rick James
Aug 28 '17 at 22:33
1
1
How many rows now in Current? Do you have innodb_file_per_table set?
– Rick James
Aug 28 '17 at 22:10
How many rows now in Current? Do you have innodb_file_per_table set?
– Rick James
Aug 28 '17 at 22:10
symbol_current = 638k rows at the moment. innodb_file_per_table = ON
– DrDamnit
Aug 28 '17 at 22:29
symbol_current = 638k rows at the moment. innodb_file_per_table = ON
– DrDamnit
Aug 28 '17 at 22:29
So, 200MB for the table; Data_free is not big; hence no need for
OPTIMIZE TABLE
.– Rick James
Aug 28 '17 at 22:33
So, 200MB for the table; Data_free is not big; hence no need for
OPTIMIZE TABLE
.– Rick James
Aug 28 '17 at 22:33
add a comment |
1 Answer
1
active
oldest
votes
This use of PARTITIONing
is unlikely to achieve much. More in a minute...
When a row is INSERTed
into a PARTITIONed
table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.
"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime))
. Then, depending on the specific partitions, today and tomorrow might be separate partitions.
There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.
You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.
Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?
And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).
Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables
Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)
Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY
and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing
.
Please provide SHOW CREATE TABLE
, how many tickers you will store, how much granularity of transactions, how much RAM, etc.
Critique of schema...
Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.
With rare exceptions (BRK), the ~7 significant digits of FLOAT
is sufficient for prices. FLOAT
takes 4 bytes; DOUBLE
takes 8. For volume, I suggest DECIMAL(11,0)
, which takes 5 bytes (versus 8 for BIGINT
. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED
.)
I doubt if you have more than 16M stocks, so change stocks_id
from INT
(4 bytes) to MEDIUMINT UNSIGNED
(3 bytes).
I see no DATE
in the history, unless it is python_time
. Consider using a 3-byte DATE
. Then get rid of id
and replace the PK by PRIMARY KEY(stocks_id, date)
. This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx
.
DATETIME
is 5 bytes; consider using DATE
for delete_after
.
Rethink the rest of the indexes. They cost something during INSERT
, and they cost disk space.
How big is a python time? If you really need it, can you make it a smallish VARCHAR
, not TEXT
? There may be issues with TEXT
. Please provide the actual SELECT
used against History.
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
|
show 3 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f184498%2fhow-do-partitions-handle-new-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
This use of PARTITIONing
is unlikely to achieve much. More in a minute...
When a row is INSERTed
into a PARTITIONed
table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.
"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime))
. Then, depending on the specific partitions, today and tomorrow might be separate partitions.
There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.
You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.
Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?
And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).
Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables
Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)
Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY
and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing
.
Please provide SHOW CREATE TABLE
, how many tickers you will store, how much granularity of transactions, how much RAM, etc.
Critique of schema...
Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.
With rare exceptions (BRK), the ~7 significant digits of FLOAT
is sufficient for prices. FLOAT
takes 4 bytes; DOUBLE
takes 8. For volume, I suggest DECIMAL(11,0)
, which takes 5 bytes (versus 8 for BIGINT
. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED
.)
I doubt if you have more than 16M stocks, so change stocks_id
from INT
(4 bytes) to MEDIUMINT UNSIGNED
(3 bytes).
I see no DATE
in the history, unless it is python_time
. Consider using a 3-byte DATE
. Then get rid of id
and replace the PK by PRIMARY KEY(stocks_id, date)
. This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx
.
DATETIME
is 5 bytes; consider using DATE
for delete_after
.
Rethink the rest of the indexes. They cost something during INSERT
, and they cost disk space.
How big is a python time? If you really need it, can you make it a smallish VARCHAR
, not TEXT
? There may be issues with TEXT
. Please provide the actual SELECT
used against History.
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
|
show 3 more comments
This use of PARTITIONing
is unlikely to achieve much. More in a minute...
When a row is INSERTed
into a PARTITIONed
table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.
"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime))
. Then, depending on the specific partitions, today and tomorrow might be separate partitions.
There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.
You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.
Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?
And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).
Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables
Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)
Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY
and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing
.
Please provide SHOW CREATE TABLE
, how many tickers you will store, how much granularity of transactions, how much RAM, etc.
Critique of schema...
Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.
With rare exceptions (BRK), the ~7 significant digits of FLOAT
is sufficient for prices. FLOAT
takes 4 bytes; DOUBLE
takes 8. For volume, I suggest DECIMAL(11,0)
, which takes 5 bytes (versus 8 for BIGINT
. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED
.)
I doubt if you have more than 16M stocks, so change stocks_id
from INT
(4 bytes) to MEDIUMINT UNSIGNED
(3 bytes).
I see no DATE
in the history, unless it is python_time
. Consider using a 3-byte DATE
. Then get rid of id
and replace the PK by PRIMARY KEY(stocks_id, date)
. This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx
.
DATETIME
is 5 bytes; consider using DATE
for delete_after
.
Rethink the rest of the indexes. They cost something during INSERT
, and they cost disk space.
How big is a python time? If you really need it, can you make it a smallish VARCHAR
, not TEXT
? There may be issues with TEXT
. Please provide the actual SELECT
used against History.
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
|
show 3 more comments
This use of PARTITIONing
is unlikely to achieve much. More in a minute...
When a row is INSERTed
into a PARTITIONed
table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.
"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime))
. Then, depending on the specific partitions, today and tomorrow might be separate partitions.
There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.
You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.
Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?
And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).
Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables
Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)
Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY
and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing
.
Please provide SHOW CREATE TABLE
, how many tickers you will store, how much granularity of transactions, how much RAM, etc.
Critique of schema...
Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.
With rare exceptions (BRK), the ~7 significant digits of FLOAT
is sufficient for prices. FLOAT
takes 4 bytes; DOUBLE
takes 8. For volume, I suggest DECIMAL(11,0)
, which takes 5 bytes (versus 8 for BIGINT
. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED
.)
I doubt if you have more than 16M stocks, so change stocks_id
from INT
(4 bytes) to MEDIUMINT UNSIGNED
(3 bytes).
I see no DATE
in the history, unless it is python_time
. Consider using a 3-byte DATE
. Then get rid of id
and replace the PK by PRIMARY KEY(stocks_id, date)
. This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx
.
DATETIME
is 5 bytes; consider using DATE
for delete_after
.
Rethink the rest of the indexes. They cost something during INSERT
, and they cost disk space.
How big is a python time? If you really need it, can you make it a smallish VARCHAR
, not TEXT
? There may be issues with TEXT
. Please provide the actual SELECT
used against History.
This use of PARTITIONing
is unlikely to achieve much. More in a minute...
When a row is INSERTed
into a PARTITIONed
table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.
"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime))
. Then, depending on the specific partitions, today and tomorrow might be separate partitions.
There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.
You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.
Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?
And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).
Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables
Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)
Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY
and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing
.
Please provide SHOW CREATE TABLE
, how many tickers you will store, how much granularity of transactions, how much RAM, etc.
Critique of schema...
Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.
With rare exceptions (BRK), the ~7 significant digits of FLOAT
is sufficient for prices. FLOAT
takes 4 bytes; DOUBLE
takes 8. For volume, I suggest DECIMAL(11,0)
, which takes 5 bytes (versus 8 for BIGINT
. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED
.)
I doubt if you have more than 16M stocks, so change stocks_id
from INT
(4 bytes) to MEDIUMINT UNSIGNED
(3 bytes).
I see no DATE
in the history, unless it is python_time
. Consider using a 3-byte DATE
. Then get rid of id
and replace the PK by PRIMARY KEY(stocks_id, date)
. This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx
.
DATETIME
is 5 bytes; consider using DATE
for delete_after
.
Rethink the rest of the indexes. They cost something during INSERT
, and they cost disk space.
How big is a python time? If you really need it, can you make it a smallish VARCHAR
, not TEXT
? There may be issues with TEXT
. Please provide the actual SELECT
used against History.
edited Aug 28 '17 at 22:13
answered Aug 28 '17 at 18:56
Rick JamesRick James
41.6k22258
41.6k22258
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
|
show 3 more comments
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.
– DrDamnit
Aug 28 '17 at 21:35
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.
– Rick James
Aug 28 '17 at 21:54
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.
– Rick James
Aug 28 '17 at 22:08
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...
– DrDamnit
Aug 28 '17 at 22:33
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....
– DrDamnit
Aug 28 '17 at 22:34
|
show 3 more comments
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f184498%2fhow-do-partitions-handle-new-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
How many rows now in Current? Do you have innodb_file_per_table set?
– Rick James
Aug 28 '17 at 22:10
symbol_current = 638k rows at the moment. innodb_file_per_table = ON
– DrDamnit
Aug 28 '17 at 22:29
So, 200MB for the table; Data_free is not big; hence no need for
OPTIMIZE TABLE
.– Rick James
Aug 28 '17 at 22:33