一個ClickHouse高可用集群部署方案
?點擊左下方?閱讀原文?查看工程源碼
GitHub地址:?https://github.com/mapcoding-cn/clickhouse
開箱即用的容器化部署clickhouse高可用集群方案?

clickhouse-server
支持多副本多分片,支持zookeeper一致性副本的高可用集群,修復了docker化部署的一些問題 示例中是一個6節(jié)點3shard2副本的配置,真實部署需要修改docker IP和zookeeper指向地址
zookeeper-server
一個三節(jié)點的zookeeper高可用集群,clickhouse用它來注冊配置和維護副本一致性
clickhouse-tabix
一個支持語法高亮和代碼提示的sql編輯器,可以部署前端訪問clickhouse,該版本做了部分優(yōu)化,仍有非常多的beta特性,但也足夠使用
--創(chuàng)建數(shù)據(jù)庫
CREATE DATABASE mapcoding on CLUSTER map ;
--創(chuàng)建本地表
CREATE TABLE mapcoding.uk_price_paid on CLUSTER map
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
is_new UInt8,
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/map/mapcoding/{shard}/uk_price_paid', '{replica}')
ORDER BY (postcode1, postcode2, addr1, addr2);
--創(chuàng)建分布式表
create TABLE mapcoding.uk_price_paid_distributed on cluster map as mapcoding.uk_price_paid ENGINE = Distributed("map", "mapcoding", "uk_price_paid", rand());
--插入測試數(shù)據(jù)
INSERT INTO mapcoding.uk_price_paid_distributed
WITH
splitByChar(' ', postcode) AS p
SELECT
toUInt32(price_string) AS price,
parseDateTimeBestEffortUS(time) AS date,
p[1] AS postcode1,
p[2] AS postcode2,
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
b = 'Y' AS is_new,
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
addr1,
addr2,
street,
locality,
town,
district,
county
FROM url(
' http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv ',
'CSV',
'uuid_string String,
price_string String,
time String,
postcode String,
a String,
b String,
c String,
addr1 String,
addr2 String,
street String,
locality String,
town String,
district String,
county String,
d String,
e String'
) SETTINGS max_http_get_redirects=10;
機器配置推薦
-
硬盤存儲: 31億Github Events數(shù)據(jù)占用磁盤空間約在200G左右,同級別mysql占用了1T,驗證ch的壓縮效率約為mysql的5倍,推測單機百億級別磁盤合理配置應在1T以上,需使用副本確保數(shù)據(jù)不丟失
-
CPU和內(nèi)存: 使用場景強相關(guān),推薦單機最低配置在4c8G以上,設(shè)置多分片增強性能.CH會根據(jù)機器配置自動調(diào)整資源使用,使用2c4g在31億數(shù)據(jù)集上查詢出現(xiàn)memory_limit
附:性能測試報告
測試腳本 clickhouse-benchmark -h 172.17.1.1?–user=map?–password=clic?-c 100 -i 1000 -r < log.txt
查詢性能
SELECT count(*) from mapcoding.github_events_distributed where event_type='IssuesEvent';
--group
SELECT actor_login,count(),uniq(repo_name) AS repos,uniq(repo_name, number) AS prs, replaceRegexpAll(substringUTF8(anyHeavy(body), 1, 100), '[\r\n]', ' ') AS comment FROM mapcoding.github_events_distributed WHERE (event_type = 'PullRequestReviewCommentEvent') AND (action = 'created') GROUP BY actor_login ORDER BY count() DESC LIMIT 50
--join
select count(*) from (SELECT repo_name from mapcoding.github_events_distributed where event_type='IssuesEvent' limit 10000000) A left join (SELECT repo_name from mapcoding.github_events_distributed where event_type='IssueCommentEvent' limit 10000000) B ON A.repo_name=B.repo_name ;
--復合查詢
SELECT actor_login, COUNT(*) FROM mapcoding.github_events_distributed WHERE event_type='IssuesEvent' GROUP BY actor_login HAVING COUNT(*) > 10 ORDER BY count(*) DESC LIMIT 100
--主鍵查詢
select actor_login from mapcoding.github_events_distributed where repo_name like 'elastic%' limit 100
--二級索引 主鍵和索引粒度的選擇對查詢性能有致命的影響
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test ADD INDEX actor_login_index actor_login TYPE set(0) GRANULARITY 2;
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test MATERIALIZE INDEX actor_login_index;
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test DROP INDEX actor_login_index;
SELECT count(*) FROM mapcoding.github_events_distributed WHERE actor_login='frank';
-
限制查詢并發(fā)在100以內(nèi),走主鍵的查詢耗時90分位在2s左右
-
大數(shù)據(jù)量的分析時間和索引相關(guān)度高,31億數(shù)據(jù)分析走全表掃描約在10-20s
寫入性能
-
批量插入測試31億數(shù)據(jù)耗時3個小時,每秒插入數(shù)據(jù)在20-30w
-
單條插入性能沒有測試,預計不佳
?點擊左下方?閱讀原文?查看工程源碼
