<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          大廠數(shù)據(jù)分析師SQL試題合集

          共 4029字,需瀏覽 9分鐘

           ·

          2020-08-08 00:41


          作者:adorable_new

          來源:簡(jiǎn)書



          01


          提要




          筆者作為一名互聯(lián)網(wǎng)商業(yè)數(shù)據(jù)分析師,SQL是日常工作中最常用的數(shù)據(jù)提取&簡(jiǎn)單預(yù)處理語言。因?yàn)槠涫褂玫膹V泛性和易學(xué)程度也被其他崗位比如產(chǎn)品經(jīng)理、研發(fā)廣泛學(xué)習(xí)使用,本篇文章主要結(jié)合經(jīng)典面試題,給出通過數(shù)據(jù)分析師面試的SQL方法論。以下題目均來與筆者經(jīng)歷&網(wǎng)上分享的中高難度SQL題。

          02


          解題思路




          • 簡(jiǎn)單——會(huì)考察一些group by & limit之類的用法,或者平時(shí)用的不多的函數(shù)比如rand()類;會(huì)涉及到一些表之間的關(guān)聯(lián)

          • 中等——會(huì)考察一些窗口函數(shù)的基本用法;會(huì)有表之間的關(guān)聯(lián),相對(duì)tricky的地方在于會(huì)有一些自關(guān)聯(lián)的使用

          • 困難——會(huì)有中位數(shù)或者更加復(fù)雜的取數(shù)概念,可能要求按照某特定要求生成列;一般這種題建中間表會(huì)解得清晰些


          03


          SQL真題





          第一題



          • order訂單表,字段為:goods_id, amount ;

          • pv 瀏覽表,字段為:goods_id,uid;

          • goods按照總銷售金額排序,分成top10,top10~top20,其他三組


          求每組商品的瀏覽用戶數(shù)(同組內(nèi)同一用戶只能算一次)

          create table if not exists test.nil_goods_category as select goods_id,case when nn<= 10 then 'top10' when nn<= 20 then 'top10~top20' else 'other' end as goods_groupfrom( select goods_id ,row_number() over(partition by goods_id order by sale_sum desc) as nn from ( select goods_id,sum(amount) as sale_sum from order group by 1 ) aa) bb;select b.goods_group,count(distinct a.uid) as numfrom pv a left join test.nil_goods_category b on a.goods_id = b.goods_idgroup by 1;


          第二題



          商品活動(dòng)表 goods_event,g_id(有可能重復(fù)),t1(開始時(shí)間),t2(結(jié)束時(shí)間)

          給定時(shí)間段(t3,t4),求在時(shí)間段內(nèi)做活動(dòng)的商品數(shù)

          1.select count(distinct g_id) as event_goods_numfrom goods_eventwhere (t1<=t4 and t1>=t3) or (t2>=t3 and t2<=t4)
          2.select count(distinct g_id) as event_goods_numfrom goods_eventwhere (t1<=t4 and t1>=t3) union all


          第三題



          商品活動(dòng)流水表,表名為event,字段:goods_id, time;

          求參加活動(dòng)次數(shù)最多的商品的最近一次參加活動(dòng)的時(shí)間

          select a.goods_id,a.timefrom event a inner join( select goods_id,count(*) from event group by gooods_id order by count(*) desc limit 1) bon a.goods_id = b.goods_idorder by a.goods_id,a.time desc


          第四題



          用戶登錄的log數(shù)據(jù),劃定session,同一個(gè)用戶一個(gè)小時(shí)之內(nèi)的登錄算一個(gè)session;

          生成session列

          drop table if exists koo.nil_temp0222_a2;create table if not exists koo.nil_temp0222_a2 asselect * ,row_number() over(partition by userid order by inserttime) as nn1from ( select a.* ,b.inserttime as inserttime_aftr ,datediff(b.inserttime,a.inserttime) as session_diff from ( select userid,inserttime ,row_number() over(partition by userid order by inserttime asc) nn from koo.nil_temp0222 where userid = 1900000169 ) a left join ( select userid,inserttime ,row_number() over(partition by userid order by inserttime asc) nn from koo.nil_temp0222 where userid = 1900000169 ) b on a.userid = b.userid and a.nn = b.nn-1) aawhere session_diff >10 or nn = 1order by userid,inserttime;
          drop table if exists koo.nil_temp0222_a2_1;create table if not exists koo.nil_temp0222_a2_1 asselect a.*,case when b.nn is null then a.nn+3 else b.nn end as nn_endfrom koo.nil_temp0222_a2 a left join koo.nil_temp0222_a2 b on a.userid = b.userid and a.nn1 = b.nn1 - 1;
          select a.*,b.nn1 as session_idfrom( select userid,inserttime ,row_number() over(partition by userid order by inserttime asc) nn from koo.nil_temp0222 where userid = 1900000169) aleft join koo.nil_temp0222_a2_1 b on a.userid = b.useridand a.nn>=b.nnand a.nn


          第五題



          訂單表,字段有訂單編號(hào)和時(shí)間;

          取每月最后一天的最后三筆訂單

          select *from ( select * ,rank() over(partition by mm order by dd desc) as nn1 ,row_number() over(partition by mm,dd order by inserttime desc) as nn2 from (select cast(right(to_date(inserttime),2) as int) as dd,month(inserttime) as mm,userid,inserttime from koo.nil_temp0222) aa ) bb where nn1 = 1 and nn2<=3;


          第六題



          數(shù)據(jù)庫表Tourists,記錄了某個(gè)景點(diǎn)7月份每天來訪游客的數(shù)量如下:

          id date visits 1 2017-07-01 100 …… 非常巧,id字段剛好等于日期里面的幾號(hào)。

          現(xiàn)在請(qǐng)篩選出連續(xù)三天都有大于100天的日期。

          上面例子的輸出為:date 2017-07-01 ……

          select a.*,b.num as num2,c.num as num3from table a left join table bon a.userid = b.useridand a.dt = date_add(b.dt,-1)left join table con a.userid = c.useridand a.dt = date_add(c.dt,-2)where b.num>100and a.num>100and c.num>100


          第七題



          現(xiàn)有A表,有21個(gè)列,第一列id,剩余列為特征字段,列名從d1-d20,共10W條數(shù)據(jù)!

          另外一個(gè)表B稱為模式表,和A表結(jié)構(gòu)一樣,共5W條數(shù)據(jù)

          請(qǐng)找到A表中的特征符合B表中模式的數(shù)據(jù),并記錄下相對(duì)應(yīng)的id

          有兩種情況滿足要求:
          • 每個(gè)特征列都完全匹配的情況下

          • 最多有一個(gè)特征列不匹配,其他19個(gè)特征列都完全匹配,但哪個(gè)列不匹配未知


          1.select aa.*from ( select *,concat(d1,d2,d3……d20) as mmd from table) aa left join ( select id,concat(d1,d2,d3……d20) as mmd from table) bb on aa.id = bb.idand aa.mmd = bb.mmd
          2.select a.*,sum(d1_jp,d2_jp……,d20_jp) as same_judgefrom ( select a.* ,case when a.d1 = b.d1 then 1 else 0 end as d1_jp ,case when a.d2 = b.d2 then 1 else 0 end as d2_jp ,case when a.d3 = b.d3 then 1 else 0 end as d3_jp ,case when a.d4 = b.d4 then 1 else 0 end as d4_jp ,case when a.d5 = b.d5 then 1 else 0 end as d5_jp ,case when a.d6 = b.d6 then 1 else 0 end as d6_jp ,case when a.d7 = b.d7 then 1 else 0 end as d7_jp ,case when a.d8 = b.d8 then 1 else 0 end as d8_jp ,case when a.d9 = b.d9 then 1 else 0 end as d9_jp ,case when a.d10 = b.d10 then 1 else 0 end as d10_jp ,case when a.d20 = b.d20 then 1 else 0 end as d20_jp ,case when a.d11 = b.d11 then 1 else 0 end as d11_jp ,case when a.d12 = b.d12 then 1 else 0 end as d12_jp ,case when a.d13 = b.d13 then 1 else 0 end as d13_jp ,case when a.d14 = b.d14 then 1 else 0 end as d14_jp ,case when a.d15 = b.d15 then 1 else 0 end as d15_jp ,case when a.d16 = b.d16 then 1 else 0 end as d16_jp ,case when a.d17 = b.d17 then 1 else 0 end as d17_jp ,case when a.d18 = b.d18 then 1 else 0 end as d18_jp ,case when a.d19 = b.d19 then 1 else 0 end as d19_jp from table a left join table b on a.id = b.id ) aawhere sum(d1_jp,d2_jp……,d20_jp) = 19


          第八題



          我們把用戶對(duì)商品的評(píng)分用稀疏向量表示,保存在數(shù)據(jù)庫表t里面:
          • t的字段有:uid,goods_id,star。uid是用戶id

          • goodsid是商品id

          • star是用戶對(duì)該商品的評(píng)分,值為1-5


          現(xiàn)在我們想要計(jì)算向量?jī)蓛芍g的內(nèi)積,內(nèi)積在這里的語義為:

          對(duì)于兩個(gè)不同的用戶,如果他們都對(duì)同樣的一批商品打了分,那么對(duì)于這里面的每個(gè)人的分?jǐn)?shù)乘起來,并對(duì)這些乘積求和。

          例子,數(shù)據(jù)庫表里有以下的數(shù)據(jù):
          U0 g0 2
          U0 g1 4
          U1 g0 3
          U1 g1 1

          計(jì)算后的結(jié)果為:
          U0 U1 23+41=10 ……

          select aa.uid1,aa.uid2,sum(star_multi) as resultfrom ( select a.uid as uid1 ,b.uid as uid2 ,a.goods_id ,a.star * b.star as star_multi from t a left join t b on a.goods_id = b.goods_id and a.udi<>b.uid ) aa group by 1,2

          select uid1,uid2,sum(multiply) as resultfrom(select t.uid as uid1, t.uid as uid2, goods_id,a.star*star as multiplyfrom a left join b on a.goods_id = goods_idand a.uid<>uid) aagroup by goods


          第九題



          給出一堆數(shù)和頻數(shù)的表格,統(tǒng)計(jì)這一堆數(shù)中位數(shù)

          select a.*,b.s_mid_n,c.l_mid_n,avg(b.s_mid_n,c.l_mid_n)from ( select case when mod(count(*),2) = 0 then count(*)/2 else (count(*)+1)/2 end as s_mid ,case when mod(count(*),2) = 0 then count(*)/2+1 else (count(*)+1)/2 end as l_mid from table ) a left join ( select id,num,row_number() over(partition by id order by num asc) nn from table) b on a.s_mid = b.nnleft join ( select id,num,row_number() over(partition by id order by num asc) nn from table) c on a.l_mid = c.nn


          第十題



          表order有三個(gè)字段,店鋪ID,訂單時(shí)間,訂單金額

          查詢一個(gè)月內(nèi)每周都有銷量的店鋪

          select distinct credit_levelfrom ( select credit_level,count(distinct nn) as number from ( select userid,credit_level,inserttime,month(inserttime) as mm ,weekofyear(inserttime) as week ,dense_rank() over(partition by credit_level,month(inserttime) order by weekofyear(inserttime) asc) as nn from koo.nil_temp0222 where substring(inserttime,1,7) = '2019-12' order by credit_level ,inserttime ) aa group by 1 ) bbwhere number = (select count(distinct weekofyear(inserttime))from koo.nil_temp0222 where substring(inserttime,1,7) = '2019-12')
          瀏覽 41
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  IPX-811桃乃木かな无码破解 | 国产激情国产 | 大香蕉99热 | 午夜无码鲁丝片午夜精品一区二区 | 国产婷婷综合视频网站 |