大香蕉伊人宗和网,黄色青青草视频,九九无码专区免费喷水,色播五月激情,超碰97色,麻豆国产免费,亚洲娱乐综合,97超级碰人人

?點擊左下方?閱讀原文?查看工程源碼

GitHub地址:?https://github.com/mapcoding-cn/clickhouse

開箱即用的容器化部署clickhouse高可用集群方案?

clickhouse-server

支持多副本多分片,支持zookeeper一致性副本的高可用集群,修復了docker化部署的一些問題示例中是一個6節(jié)點3shard2副本的配置,真實部署需要修改docker IP和zookeeper指向地址

zookeeper-server

一個三節(jié)點的zookeeper高可用集群,clickhouse用它來注冊配置和維護副本一致性

clickhouse-tabix

一個支持語法高亮和代碼提示的sql編輯器,可以部署前端訪問clickhouse,該版本做了部分優(yōu)化,仍有非常多的beta特性,但也足夠使用

    --創(chuàng)建數(shù)據(jù)庫
CREATE DATABASE mapcoding on CLUSTER map ; 

--創(chuàng)建本地表
CREATE TABLE mapcoding.uk_price_paid on CLUSTER map
(
      price UInt32,
      date Date,
      postcode1 LowCardinality(String),
      postcode2 LowCardinality(String),
      type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
      is_new UInt8,
      duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
      addr1 String,
      addr2 String,
      street LowCardinality(String),
      locality LowCardinality(String),
      town LowCardinality(String),
      district LowCardinality(String),
      county LowCardinality(String)
)
ENGINE = ReplicatedMergeTree('/clickhouse/map/mapcoding/{shard}/uk_price_paid', '{replica}')
ORDER BY (postcode1, postcode2, addr1, addr2);

--創(chuàng)建分布式表
create TABLE mapcoding.uk_price_paid_distributed on cluster map as mapcoding.uk_price_paid ENGINE = Distributed("map", "mapcoding", "uk_price_paid", rand());

--插入測試數(shù)據(jù)
INSERT INTO mapcoding.uk_price_paid_distributed
WITH
    splitByChar(' ', postcode) AS p
SELECT
      toUInt32(price_string) AS price,
      parseDateTimeBestEffortUS(time) AS date,
      p[1] AS postcode1,
      p[2] AS postcode2,
      transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
      b = 'Y' AS is_new,
      transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
      addr1,
      addr2,
      street,
      locality,
      town,
      district,
      county
FROM url(
      ' http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv ',
      'CSV',
      'uuid_string String,
      price_string String,
      time String,
      postcode String,
      a String,
      b String,
      c String,
      addr1 String,
      addr2 String,
      street String,
      locality String,
      town String,
      district String,
      county String,
      d String,
      e String'
) SETTINGS max_http_get_redirects=10;

機器配置推薦

硬盤存儲: 31億Github Events數(shù)據(jù)占用磁盤空間約在200G左右,同級別mysql占用了1T,驗證ch的壓縮效率約為mysql的5倍,推測單機百億級別磁盤合理配置應在1T以上,需使用副本確保數(shù)據(jù)不丟失
CPU和內(nèi)存: 使用場景強相關(guān),推薦單機最低配置在4c8G以上,設(shè)置多分片增強性能.CH會根據(jù)機器配置自動調(diào)整資源使用,使用2c4g在31億數(shù)據(jù)集上查詢出現(xiàn)memory_limit

附:性能測試報告

測試腳本 clickhouse-benchmark -h 172.17.1.1?–user=map?–password=clic?-c 100 -i 1000 -r < log.txt

查詢性能

    SELECT count(*) from mapcoding.github_events_distributed where event_type='IssuesEvent';
--group 
SELECT actor_login,count(),uniq(repo_name) AS repos,uniq(repo_name, number) AS prs, replaceRegexpAll(substringUTF8(anyHeavy(body), 1, 100), '[\r\n]', ' ') AS comment FROM mapcoding.github_events_distributed WHERE (event_type = 'PullRequestReviewCommentEvent') AND (action = 'created') GROUP BY actor_login ORDER BY count() DESC LIMIT 50
--join
select count(*) from (SELECT repo_name from mapcoding.github_events_distributed where  event_type='IssuesEvent' limit 10000000)  A  left join (SELECT repo_name from mapcoding.github_events_distributed where  event_type='IssueCommentEvent' limit 10000000)  B ON A.repo_name=B.repo_name   ;
--復合查詢
SELECT actor_login, COUNT(*) FROM mapcoding.github_events_distributed WHERE event_type='IssuesEvent' GROUP BY actor_login HAVING COUNT(*) > 10 ORDER BY count(*) DESC LIMIT 100
--主鍵查詢
select actor_login from mapcoding.github_events_distributed where repo_name like 'elastic%' limit 100

--二級索引 主鍵和索引粒度的選擇對查詢性能有致命的影響
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test ADD INDEX actor_login_index actor_login TYPE set(0) GRANULARITY 2;
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test MATERIALIZE INDEX actor_login_index;
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test DROP INDEX actor_login_index;
SELECT count(*) FROM mapcoding.github_events_distributed WHERE actor_login='frank';

限制查詢并發(fā)在100以內(nèi),走主鍵的查詢耗時90分位在2s左右
大數(shù)據(jù)量的分析時間和索引相關(guān)度高,31億數(shù)據(jù)分析走全表掃描約在10-20s

寫入性能

批量插入測試31億數(shù)據(jù)耗時3個小時,每秒插入數(shù)據(jù)在20-30w
單條插入性能沒有測試,預計不佳