![readwrite think insert readwrite think insert](https://1.bp.blogspot.com/-W7hE4M-63rA/UnRaOGPtubI/AAAAAAAA_WE/Jcy5ywjYU1w/s1600/Screenshot+2013-11-01+20.47.51.png)
Hmm, I think this is best to be controlled via CH Proxy and extra infrastrucure. I have raise a discussion in 2021 roadmap discussion. Following is chart of architecture (use temporary Clickhouse cluster in k8s).
![readwrite think insert readwrite think insert](https://2.bp.blogspot.com/-qW4-897qSNw/We5AXCUdOxI/AAAAAAAAAZ0/DtDd3lJu8jElyeIoMfpqczSTalXpJpc8wCK4BGAYYCw/s640/Winter2017%2BAmbassador%2B-%2BBadge.png)
For example, QQ Music use case is a proven use case (unlucky, the article is wrote in Chinese). If users send a large amount of records(which belongs to many shards) to single write node, the write node need to reduce the number of small files and improve the merge speed.ĭo you agree with my proposal or do you have any suggestions?Īs far as I know, some Clickhouse users leverage merge tree engine's library or temporary Clickhouse cluster(in k8s) to generate merge-tree files, and move them directly to all local table path corresponding to the specific distributed table.
![readwrite think insert readwrite think insert](http://sqworl.com/full_img/369820.jpg)
![readwrite think insert readwrite think insert](http://wildwikilinks.pbworks.com/f/1380681389/Screen%20Shot%202013-10-01%20at%208.32.52%20PM.png)
For example, if I create a clickhouse cluster with 10 shards and 2 replica for each shard (in 20 physical nodes), then I will have 10 clickhouse nodes for write operation. So my question is, do community have a plan to support the built-in read-write separation function ?įrom my understanding, using dedicated replica to implement the separation/splitting of read and write do works, but this approach is not flexible and may lead to waste of resources. If clickhouse can support the built-in read-write separation function, all insertion operations will be carried out on the write node, and the write node will compact/merge input records, and then the merged files will be distributed to the corresponding read nodes according to shard, so that the impact of insert operation on query node can be reduced, thus ensuring the stability of query performance. However, this way depends on extra development to reinvent the wheel (also need good understanding to the internal of Clickhouse) and it is not user friendly. As a DBA, I want to ensure that clickhouse's performance is good and stable.Īs far as I know, some clickhouse users/developers leverage clickhouse mergetree engine's library and external compute engine(such as spark) to generate mergetree files, and move them directly to all local table path corresponding to the specific distributed table. At present, when users insert a large amount of records into the mergetree distributed table, the insertion task will take up a large amount of computing resources, which leads to the performance decline or even failure of the query.