<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          Hive SQL經(jīng)典優(yōu)化案例(建議收藏)

          共 4350字,需瀏覽 9分鐘

           ·

          2020-10-14 18:58

          HiveSQL經(jīng)典優(yōu)化案例一:

          1.1 將要執(zhí)行的查詢(執(zhí)行了 1個(gè)多小時(shí)才出結(jié)果):

          SELECT?dt?as?DATA_DATE,STRATEGY,AB_GROUP,SOURCE,    count(distinct case when lower(event) not like '%push%' and event!='corner_mark_show' then udid else null end) as DAU,    count(case when event='client_show' then 1 else null end) as TOTAL_VSHOW,    count(distinct case when event='client_show' then vid else null end) as TOTAL_VIDEO_VSHOW,    count(case when event='video_play' then 1 else null end) as TOTAL_VV_VP,    count(distinct case when event='video_play' then udid else null end) as TOTAL_USERS_VP,    count(case when event='effective_play' then 1 else null end) as TOTAL_VV_EP,    count(distinct case when event='effective_play' then udid else null end) as TOTAL_USERS_EP,    sum(case when event='video_over' then duration else 0 end) as TOTAL_DURATION,    count(case when event='video_over' then 1 else null end) as TOTAL_VOVER,    sum(case when event='video_over' then play_cnts else 0 end) as TOTAL_VOVER_PCNTS,    count(case when event='push_video_clk' then 1 else null end) as TOTAL_PUSH_VC,    count(distinct case when event='app_start' and body_source = 'push' then udid else null end) as TOTAL_PUSH_START,    count(case when event='post_comment' then 1 else null end) as TOTAL_REPLY,    count(distinct case when event='post_comment' then udid else null end) as TOTAL_USERS_REPLY    FROM dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_zklgroup?by?dt,strategy,ab_group,source;

          1.2 查詢語句涉及到的表有 7.7億+ 數(shù)據(jù)。(查詢?nèi)缦?

          jdbc:hive2://ks-hdp-master-01.dns.rightpad (default)> select count(*) from dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_zkl;

          1.3?優(yōu)化思路:既然將要執(zhí)行的查詢是按照 dt, strategy, ab_group, source 這4個(gè)字段分組, 那么在建表的時(shí)候,就按這四個(gè)字段中的N個(gè)(1 或 2 或 3 或4)個(gè)字段組合分區(qū),直接讓 count(distinct xx) 之類的查詢定位到“更少的數(shù)據(jù)子集”,其執(zhí)行效率就應(yīng)該更高了(不需要每個(gè)子任務(wù)均從 7.7億+ 的數(shù)據(jù)中(去重)統(tǒng)計(jì))。


          1.4 先看每個(gè)字段將會(huì)有多少分區(qū)(因?yàn)?Hive 表分區(qū)也不宜過多,一般一個(gè)查詢語句涉及到的 hive分區(qū) 應(yīng)該控制在2K內(nèi))

          jdbc:hive2://ks-hdp-master-01.dns.rightpad (default)> select?count(distinct?dt)?as?dis_dt,?count(distinct?strategy)?as?dis_strategy,?count(distinct?ab_group)?as?dis_ab_group,?count(distinct?source)?as?dis_sourcefrom dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_zkl;


          [hue@ks-hdp-client-v02 10:55:08 /usr/local/hue]$ pythonPython 2.7.12 (default, Dec 4 2017, 14:50:18)[GCC 5.4.0 20160609] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> 2*14*722016-- 2016 個(gè)分區(qū)還可以接受。

          1.5 根據(jù)原表,新建分區(qū)表,并將原表數(shù)據(jù)插入新表:

          show create table dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_zkl;
          jdbc:hive2://ks-hdp-master-01.dns.rightpad (default)> show create table dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_zkl;

          創(chuàng)建新表:按 dt,source,stragegy,ab_group 分區(qū)(注意先后順序,一般習(xí)慣分區(qū)數(shù)越少的越靠前,根據(jù)1.5的查詢可知:dt=1,source=2,strategy=14,ab_group=72)

          create external table `dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_lym`(  event string,  udid string,  vid string,  duration string,  body_source string,  play_cnts string)PARTITIONED BY (  dt string,  source string,  strategy string,  ab_group string);

          將原表數(shù)據(jù)插入新表:

          insert into `dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_lym` partition(dt,source,strategy,ab_group)select event,udid,vid,duration,body_source,play_cnts,dt,source,strategy,ab_groupfrom `dwb_v8sp_tmp.base_report_bystrategy_byab_source_column_zkl`;

          核對兩表的數(shù)據(jù)是否一致:

          1.6 基于新表執(zhí)行查詢(執(zhí)行5分鐘出結(jié)果):


          HiveSQL經(jīng)典優(yōu)化案例二:

          問題描述:一個(gè)復(fù)雜的SQL,查詢執(zhí)行一段時(shí)間后報(bào)錯(cuò):基本上是查不出來;?

          分析函數(shù)對于大表來說不是 hive的強(qiáng)項(xiàng),這個(gè)時(shí)候我們將其分解成很多子集,并且合理利用 hive 分區(qū)表的優(yōu)勢,然后去 join 。


          2.1 將要執(zhí)行的查詢

          create?table?bi_tmp.aloha_UserLoyalty_190301_190303?as?    select aid, imei, idfa, udid, event, duration, dt, time_local, hour, source,         first_value(time_local) over(partition by udid, event order by time_local) as first_time,        last_value(time_local) over(partition by udid, event order by time_local) as last_time,        count(time_local) over(partition by udid, event, dt) as event_count_per_day,        sum(duration) over(partition by udid, event, dt) as event_duration_each_day    from dwb_v8sp.event_column_info_new_hour    where event in ('app_start', 'app_exit', 'effective_play', 'share_succ', 'like', 'unlike', 'like_comment', 'unlike_comment',         'comment_success')????????and?dt?>=?'2019-03-01'?and?dt?<=?'2019-03-03';
          select count(*) from dwb_v8sp.event_column_info_new_hourwhere event in ('app_start', 'app_exit', 'effective_play', 'share_succ', 'like', 'unlike', 'like_comment', 'unlike_comment', 'comment_success')and dt >= '2019-03-01' and dt <= '2019-03-03';

          select count(distinct event) as dis_eventfrom dwb_v8sp.event_column_info_new_hourwhere event in ('app_start', 'app_exit', 'effective_play', 'share_succ', 'like', 'unlike', 'like_comment', 'unlike_comment', 'comment_success')and dt >= '2019-03-01' and dt <= '2019-03-03';

          分解成三個(gè)子集,并保存到三張表:? bi_tmp.zyt1, bi_tmp.zyt2, bi_tmp.zyt3

          -- drop table if exists bi_tmp.zyt1;create table bi_tmp.zyt1 partitioned by(event)asselect udid,        min(time_local) as first_time,       max(time_local) as last_time,       eventfrom dwb_v8sp.event_column_info_new_hourwhere event in ('app_start', 'app_exit', 'effective_play', 'share_succ', 'like', 'unlike', 'like_comment', 'unlike_comment', 'comment_success')and dt >= '2019-03-01' and dt <= '2019-03-03'group by udid, event;
          -- drop table if exists bi_tmp.zyt2 purge;create table bi_tmp.zyt2 partitioned by(dt,event)asselect udid, count(time_local) as event_count_per_day, sum(duration) as event_duration_each_day, dt, eventfrom dwb_v8sp.event_column_info_new_hourwhere event in ('app_start', 'app_exit', 'effective_play', 'share_succ', 'like', 'unlike', 'like_comment', 'unlike_comment', 'comment_success')and dt >= '2019-03-01' and dt <= '2019-03-03'group by udid, dt, event;
          create table bi_tmp.zyt3 partitioned by(dt,event)as select aid, imei, idfa, udid, duration, time_local, hour, source, dt, eventfrom dwb_v8sp.event_column_info_new_hour t3 where event in ('app_start', 'app_exit', 'effective_play', 'share_succ', 'like', 'unlike', 'like_comment', 'unlike_comment', 'comment_success') and dt >= '2019-03-01' and dt <= '2019-03-03';
          -- 插入目標(biāo)表:create table bi_tmp.aloha_UserLoyalty_190301_190303 as select t3.aid, t3.imei, t3.idfa, t3.udid, t3.event, t3.duration, t3.dt, t3.time_local, t3.hour, t3.source, t1.first_time, t1.last_time, t2.event_count_per_day, t2.event_duration_each_day from bi_tmp.zyt1 t1 join bi_tmp.zyt2 t2 on t1.event=t2.event and t1.udid=t2.udid join bi_tmp.zyt3 t3 on t2.dt=t3.dt and t2.event= t3.event and t2.udid=t3.udid;
          -- 驗(yàn)證數(shù)據(jù):(與上面的查詢記錄行數(shù)對的上)


          HiveSQL經(jīng)典優(yōu)化案例三:

          如下SQL,用到了 PERCENTILE_APPROX 函數(shù),問題描述:如下SQL,用到了 PERCENTILE_APPROX 函數(shù),個(gè)人初步分析認(rèn)為:由于用到該函數(shù)的次數(shù)太多,導(dǎo)致性能嚴(yán)重下降。?

          我仔細(xì)查了一下該函數(shù),發(fā)現(xiàn):它是支持“數(shù)組傳參”的,那么就不難找到優(yōu)化該SQL的方法了。


          3.1 原SQL性能測試:

          3.2 優(yōu)化后的SQL,性能測試:

          優(yōu)化后的SQL,性能提升了4倍多。


          --end--


          掃描下方二維碼
          添加好友,備注【交流
          可私聊交流,也可進(jìn)資源豐富學(xué)習(xí)群
          瀏覽 105
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評論
          圖片
          表情
          推薦
          點(diǎn)贊
          評論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  播播影院在线特别黄色视频看看你 | 黄色A片免费看 | 成人色情做爱操女人视频 | 青青操久操视频 | 欧美最新中文字幕 |