How do partitions handle new data?












0















I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.



My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).



How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?



From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).



Edit to include requested information



Will your data be bigger than what can be cached in RAM?: Yes



There are actually two tables: a symbol_history and a symbol_current keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.



At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.



I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).



Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.



Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.



Any thoughts are appreciated.



CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;


Here's symbol_history:



CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;









share|improve this question
















bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    How many rows now in Current? Do you have innodb_file_per_table set?

    – Rick James
    Aug 28 '17 at 22:10











  • symbol_current = 638k rows at the moment. innodb_file_per_table = ON

    – DrDamnit
    Aug 28 '17 at 22:29











  • So, 200MB for the table; Data_free is not big; hence no need for OPTIMIZE TABLE.

    – Rick James
    Aug 28 '17 at 22:33
















0















I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.



My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).



How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?



From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).



Edit to include requested information



Will your data be bigger than what can be cached in RAM?: Yes



There are actually two tables: a symbol_history and a symbol_current keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.



At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.



I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).



Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.



Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.



Any thoughts are appreciated.



CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;


Here's symbol_history:



CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;









share|improve this question
















bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1





    How many rows now in Current? Do you have innodb_file_per_table set?

    – Rick James
    Aug 28 '17 at 22:10











  • symbol_current = 638k rows at the moment. innodb_file_per_table = ON

    – DrDamnit
    Aug 28 '17 at 22:29











  • So, 200MB for the table; Data_free is not big; hence no need for OPTIMIZE TABLE.

    – Rick James
    Aug 28 '17 at 22:33














0












0








0








I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.



My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).



How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?



From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).



Edit to include requested information



Will your data be bigger than what can be cached in RAM?: Yes



There are actually two tables: a symbol_history and a symbol_current keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.



At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.



I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).



Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.



Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.



Any thoughts are appreciated.



CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;


Here's symbol_history:



CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;









share|improve this question
















I have a database with nearly 100M rows of stock data. Pulling the data for the last 7 days to generate a chart takes a couple seconds.



My thought is to create partitions that represent the normal chart periods (1 day, 1 week, 1 month, 6 months, 1 year, for example).



How does MySQL handle new data for these partitions? I assume that they would be INSERTed into the 1 day partition, because they are "today's" numbers. But what happens tomorrow? Are they automatically moved to the 1 week partition because they no longer belong to "today"?



From reading the docs, seems partitions is most appropriate for fixed, "in the past" data with fixed ranges that can be identified by the key (year, for example...) not the result of a function that gives a dynamic value (like distance into the past in days).



Edit to include requested information



Will your data be bigger than what can be cached in RAM?: Yes



There are actually two tables: a symbol_history and a symbol_current keeps 48 hours of intra day stats available for calculations in a proprietary algorithm.



At night, after the market closes, data older than 48 hours is computed for open, high, low, close, and we just save that information on a per stock basis. For our purposes, OHLC is the only thing required once data is 2 days old.



I have added a SCHEDULED EVENT that deletes information from current_symbol if it is older than that. This dumped 77M rows of data that was quite stale (not sure why this wasn't done before, but I digress).



Now, we still have 83M rows of data in symbol_history, which is used to generate candle sticks for historical perspective.



Generating charts from symbol_current is now blazing fast. But incorporating symbol_history has the same slow speed problem as before.



Any thoughts are appreciated.



CREATE TABLE `symbol_current` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`last_trade` double DEFAULT NULL,
`open` double DEFAULT NULL,
`ask` double DEFAULT NULL COMMENT 'The c urrent ask price',
`bid` double DEFAULT NULL COMMENT 'The c urrent bid price',
`change_close` double DEFAULT NULL COMMENT 'The dollar change of the las t price from the previous close',
`change_close_percentage` double DEFAULT NULL COMMENT 'The percentage change of the last price f rom the previous close',
`high` double DEFAULT NULL COMMENT 'The highest price at which a security ha s traded during the current day',
`low` double DEFAULT NULL COMMENT 'The lowest price at which a security has traded during the current day',
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`ohcl_calculation` tinyint(1) DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`record_creation_time` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`python_time` text,
`delete_after` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `new_fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `new_ohcl_calculation` (`ohcl_calculation`),
KEY `new_symbol_current_create_at` (`created_at`),
KEY `last_trade` (`last_trade`),
KEY `idxDeleteAfter` (`delete_after`),
CONSTRAINT `new_fk_symbol_current_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=634299 DEFAULT CHARSET=utf8;


Here's symbol_history:



CREATE TABLE `symbol_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`stocks_id` int(11) NOT NULL,
`open_price` double DEFAULT NULL,
`close_price` double DEFAULT NULL,
`high_price` double DEFAULT NULL,
`low_price` double DEFAULT NULL,
`total_volume` bigint(20) DEFAULT NULL COMMENT 'Total number of share s or contracts exchanging hands',
`updated_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_symbol_current_stocks1_idx` (`stocks_id`),
KEY `symbol_history_created_at` (`created_at`),
CONSTRAINT `fk_symbol_history_stocks1` FOREIGN KEY (`stocks_id`) REFERENCES `stocks` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=98852145 DEFAULT CHARSET=utf8;






mysql partitioning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 28 '17 at 21:45







DrDamnit

















asked Aug 28 '17 at 14:03









DrDamnitDrDamnit

15029




15029





bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1





    How many rows now in Current? Do you have innodb_file_per_table set?

    – Rick James
    Aug 28 '17 at 22:10











  • symbol_current = 638k rows at the moment. innodb_file_per_table = ON

    – DrDamnit
    Aug 28 '17 at 22:29











  • So, 200MB for the table; Data_free is not big; hence no need for OPTIMIZE TABLE.

    – Rick James
    Aug 28 '17 at 22:33














  • 1





    How many rows now in Current? Do you have innodb_file_per_table set?

    – Rick James
    Aug 28 '17 at 22:10











  • symbol_current = 638k rows at the moment. innodb_file_per_table = ON

    – DrDamnit
    Aug 28 '17 at 22:29











  • So, 200MB for the table; Data_free is not big; hence no need for OPTIMIZE TABLE.

    – Rick James
    Aug 28 '17 at 22:33








1




1





How many rows now in Current? Do you have innodb_file_per_table set?

– Rick James
Aug 28 '17 at 22:10





How many rows now in Current? Do you have innodb_file_per_table set?

– Rick James
Aug 28 '17 at 22:10













symbol_current = 638k rows at the moment. innodb_file_per_table = ON

– DrDamnit
Aug 28 '17 at 22:29





symbol_current = 638k rows at the moment. innodb_file_per_table = ON

– DrDamnit
Aug 28 '17 at 22:29













So, 200MB for the table; Data_free is not big; hence no need for OPTIMIZE TABLE.

– Rick James
Aug 28 '17 at 22:33





So, 200MB for the table; Data_free is not big; hence no need for OPTIMIZE TABLE.

– Rick James
Aug 28 '17 at 22:33










1 Answer
1






active

oldest

votes


















0














This use of PARTITIONing is unlikely to achieve much. More in a minute...



When a row is INSERTed into a PARTITIONed table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.



"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime)). Then, depending on the specific partitions, today and tomorrow might be separate partitions.



There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.



You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.



Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?



And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).



Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables



Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)



Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing.



Please provide SHOW CREATE TABLE, how many tickers you will store, how much granularity of transactions, how much RAM, etc.



Critique of schema...



Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.



With rare exceptions (BRK), the ~7 significant digits of FLOAT is sufficient for prices. FLOAT takes 4 bytes; DOUBLE takes 8. For volume, I suggest DECIMAL(11,0), which takes 5 bytes (versus 8 for BIGINT. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED.)



I doubt if you have more than 16M stocks, so change stocks_id from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes).



I see no DATE in the history, unless it is python_time. Consider using a 3-byte DATE. Then get rid of id and replace the PK by PRIMARY KEY(stocks_id, date). This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx.



DATETIME is 5 bytes; consider using DATE for delete_after.



Rethink the rest of the indexes. They cost something during INSERT, and they cost disk space.



How big is a python time? If you really need it, can you make it a smallish VARCHAR, not TEXT? There may be issues with TEXT. Please provide the actual SELECT used against History.






share|improve this answer


























  • That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

    – DrDamnit
    Aug 28 '17 at 21:35











  • "Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

    – Rick James
    Aug 28 '17 at 21:54











  • Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

    – Rick James
    Aug 28 '17 at 22:08











  • python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

    – DrDamnit
    Aug 28 '17 at 22:33











  • Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

    – DrDamnit
    Aug 28 '17 at 22:34











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f184498%2fhow-do-partitions-handle-new-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














This use of PARTITIONing is unlikely to achieve much. More in a minute...



When a row is INSERTed into a PARTITIONed table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.



"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime)). Then, depending on the specific partitions, today and tomorrow might be separate partitions.



There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.



You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.



Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?



And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).



Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables



Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)



Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing.



Please provide SHOW CREATE TABLE, how many tickers you will store, how much granularity of transactions, how much RAM, etc.



Critique of schema...



Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.



With rare exceptions (BRK), the ~7 significant digits of FLOAT is sufficient for prices. FLOAT takes 4 bytes; DOUBLE takes 8. For volume, I suggest DECIMAL(11,0), which takes 5 bytes (versus 8 for BIGINT. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED.)



I doubt if you have more than 16M stocks, so change stocks_id from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes).



I see no DATE in the history, unless it is python_time. Consider using a 3-byte DATE. Then get rid of id and replace the PK by PRIMARY KEY(stocks_id, date). This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx.



DATETIME is 5 bytes; consider using DATE for delete_after.



Rethink the rest of the indexes. They cost something during INSERT, and they cost disk space.



How big is a python time? If you really need it, can you make it a smallish VARCHAR, not TEXT? There may be issues with TEXT. Please provide the actual SELECT used against History.






share|improve this answer


























  • That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

    – DrDamnit
    Aug 28 '17 at 21:35











  • "Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

    – Rick James
    Aug 28 '17 at 21:54











  • Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

    – Rick James
    Aug 28 '17 at 22:08











  • python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

    – DrDamnit
    Aug 28 '17 at 22:33











  • Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

    – DrDamnit
    Aug 28 '17 at 22:34
















0














This use of PARTITIONing is unlikely to achieve much. More in a minute...



When a row is INSERTed into a PARTITIONed table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.



"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime)). Then, depending on the specific partitions, today and tomorrow might be separate partitions.



There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.



You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.



Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?



And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).



Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables



Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)



Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing.



Please provide SHOW CREATE TABLE, how many tickers you will store, how much granularity of transactions, how much RAM, etc.



Critique of schema...



Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.



With rare exceptions (BRK), the ~7 significant digits of FLOAT is sufficient for prices. FLOAT takes 4 bytes; DOUBLE takes 8. For volume, I suggest DECIMAL(11,0), which takes 5 bytes (versus 8 for BIGINT. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED.)



I doubt if you have more than 16M stocks, so change stocks_id from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes).



I see no DATE in the history, unless it is python_time. Consider using a 3-byte DATE. Then get rid of id and replace the PK by PRIMARY KEY(stocks_id, date). This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx.



DATETIME is 5 bytes; consider using DATE for delete_after.



Rethink the rest of the indexes. They cost something during INSERT, and they cost disk space.



How big is a python time? If you really need it, can you make it a smallish VARCHAR, not TEXT? There may be issues with TEXT. Please provide the actual SELECT used against History.






share|improve this answer


























  • That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

    – DrDamnit
    Aug 28 '17 at 21:35











  • "Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

    – Rick James
    Aug 28 '17 at 21:54











  • Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

    – Rick James
    Aug 28 '17 at 22:08











  • python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

    – DrDamnit
    Aug 28 '17 at 22:33











  • Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

    – DrDamnit
    Aug 28 '17 at 22:34














0












0








0







This use of PARTITIONing is unlikely to achieve much. More in a minute...



When a row is INSERTed into a PARTITIONed table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.



"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime)). Then, depending on the specific partitions, today and tomorrow might be separate partitions.



There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.



You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.



Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?



And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).



Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables



Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)



Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing.



Please provide SHOW CREATE TABLE, how many tickers you will store, how much granularity of transactions, how much RAM, etc.



Critique of schema...



Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.



With rare exceptions (BRK), the ~7 significant digits of FLOAT is sufficient for prices. FLOAT takes 4 bytes; DOUBLE takes 8. For volume, I suggest DECIMAL(11,0), which takes 5 bytes (versus 8 for BIGINT. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED.)



I doubt if you have more than 16M stocks, so change stocks_id from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes).



I see no DATE in the history, unless it is python_time. Consider using a 3-byte DATE. Then get rid of id and replace the PK by PRIMARY KEY(stocks_id, date). This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx.



DATETIME is 5 bytes; consider using DATE for delete_after.



Rethink the rest of the indexes. They cost something during INSERT, and they cost disk space.



How big is a python time? If you really need it, can you make it a smallish VARCHAR, not TEXT? There may be issues with TEXT. Please provide the actual SELECT used against History.






share|improve this answer















This use of PARTITIONing is unlikely to achieve much. More in a minute...



When a row is INSERTed into a PARTITIONed table, it looks at the partition definition to decide which partition to put the row in. Then it inserts the row into that one partition. A partition is essentially a table. (I like to call it a sub-table.) Each sub-table has its own index(es) referring only to the rows in that sub-table.



"Today" and "tomorrow" do not matter unless you, for example, PARTITION BY RANGE (TO_DAYS(datetime)). Then, depending on the specific partitions, today and tomorrow might be separate partitions.



There is no "automatic" way to partition "by week". You have to do it manually (or with some automated cron task). See http://mysql.rjweb.org/doc.php/partitionmaint for a discussion of this particular partitioning task.



You are unclear on the charts you want. For a 1-month chart, do you only want the 30 days ending now? Or any full month (1st thru 31st)? Or any arbitrary N days? The makes a big difference in optimization.



Next, when making a 1-day chart, do you want data points for each trade? For a 1-year chart, do you want candlesticks or spark lines?



And, for the 1-day chart, do you want the detail, even for one day last year? Or can the details be tossed. That is, new data has all trades; old data has only open-high-low-close for each day (or maybe only week/month).



Some of that is leading to suggestions of "summary tables"; see http://mysql.rjweb.org/doc.php/summarytables



Back to partitioning... Partitioning won't help with speed. It will help if you need to delete old data. (See Case 1 in my partitioning blog.)



Another issue... Indexing. Will your data be bigger than can be cached in RAM? If so, we need to talk about the PRIMARY KEY and perhaps one secondary index, so as to optimize fetching of data for charting, possibly at the expense of INSERTing.



Please provide SHOW CREATE TABLE, how many tickers you will store, how much granularity of transactions, how much RAM, etc.



Critique of schema...



Size leads to cacheability, which leads to speed. So, much of the following critiques the speed.



With rare exceptions (BRK), the ~7 significant digits of FLOAT is sufficient for prices. FLOAT takes 4 bytes; DOUBLE takes 8. For volume, I suggest DECIMAL(11,0), which takes 5 bytes (versus 8 for BIGINT. I have not seen more than 100 billion shares traded at once, have you? (I have seen more than 2^32 in a single day, so I don't recommend INT UNSIGNED.)



I doubt if you have more than 16M stocks, so change stocks_id from INT (4 bytes) to MEDIUMINT UNSIGNED (3 bytes).



I see no DATE in the history, unless it is python_time. Consider using a 3-byte DATE. Then get rid of id and replace the PK by PRIMARY KEY(stocks_id, date). This change will significantly speed up any chart for a single stock over any date range. Meanwhile, get rid of the now redundant new_fk_symbol_current_stocks1_idx.



DATETIME is 5 bytes; consider using DATE for delete_after.



Rethink the rest of the indexes. They cost something during INSERT, and they cost disk space.



How big is a python time? If you really need it, can you make it a smallish VARCHAR, not TEXT? There may be issues with TEXT. Please provide the actual SELECT used against History.







share|improve this answer














share|improve this answer



share|improve this answer








edited Aug 28 '17 at 22:13

























answered Aug 28 '17 at 18:56









Rick JamesRick James

41.6k22258




41.6k22258













  • That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

    – DrDamnit
    Aug 28 '17 at 21:35











  • "Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

    – Rick James
    Aug 28 '17 at 21:54











  • Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

    – Rick James
    Aug 28 '17 at 22:08











  • python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

    – DrDamnit
    Aug 28 '17 at 22:33











  • Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

    – DrDamnit
    Aug 28 '17 at 22:34



















  • That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

    – DrDamnit
    Aug 28 '17 at 21:35











  • "Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

    – Rick James
    Aug 28 '17 at 21:54











  • Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

    – Rick James
    Aug 28 '17 at 22:08











  • python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

    – DrDamnit
    Aug 28 '17 at 22:33











  • Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

    – DrDamnit
    Aug 28 '17 at 22:34

















That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

– DrDamnit
Aug 28 '17 at 21:35





That's what I figured. The way the partition function is described in the docs, it seemed that it would only work this way. I had thought that (perhaps) a partition was a separate file (.frm?) in the file system and perhaps it would migrate back and forth. But clearly not. Thank you for your answer, and I am adding your requested info now.

– DrDamnit
Aug 28 '17 at 21:35













"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

– Rick James
Aug 28 '17 at 21:54





"Transportable tablespaces" lets you turn a Partition into a Table, and vice versa.

– Rick James
Aug 28 '17 at 21:54













Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

– Rick James
Aug 28 '17 at 22:08





Added Critique of schema -- that should cut the disk space in half and significantly speed up the chart query.

– Rick James
Aug 28 '17 at 22:08













python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

– DrDamnit
Aug 28 '17 at 22:33





python_time is a unix timestamp. Not sure why my dev chose text instead of INT. They will be going to SQL re-education camp soon. For the rest, I will make these changes. On this amount of data, I expect a change like this to take upwards of 40 minutes. (Is that slower than you'd expect?) We have plenty of time until the market opens...

– DrDamnit
Aug 28 '17 at 22:33













Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

– DrDamnit
Aug 28 '17 at 22:34





Also, the change of datatypes should be lossless (float -> decimal), correct? I have backups, but they take quite a while to restore, and would rather not go through that exercise....

– DrDamnit
Aug 28 '17 at 22:34


















draft saved

draft discarded




















































Thanks for contributing an answer to Database Administrators Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f184498%2fhow-do-partitions-handle-new-data%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

SQL Server 17 - Attemping to backup to remote NAS but Access is denied

Always On Availability groups resolving state after failover - Remote harden of transaction...

Restoring from pg_dump with foreign key constraints