<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          數(shù)據(jù)分析師SQL試題合集

          共 4029字,需瀏覽 9分鐘

           ·

          2020-08-01 00:39

          點擊上方數(shù)據(jù)管道”,選擇“置頂星標”公眾號

          干貨福利,第一時間送達

          整理來源自:SQL數(shù)據(jù)庫開發(fā)

          原作者:adorable_new 來自簡書


          提要

          筆者作為一名互聯(lián)網(wǎng)商業(yè)數(shù)據(jù)分析師,SQL是日常工作中最常用的數(shù)據(jù)提取&簡單預處理語言。因為其使用的廣泛性和易學程度也被其他崗位比如產(chǎn)品經(jīng)理、研發(fā)廣泛學習使用,本篇文章主要結(jié)合經(jīng)典面試題,給出通過數(shù)據(jù)分析師面試的SQL方法論。以下題目均來與筆者經(jīng)歷&網(wǎng)上分享的中高難度SQL題。


          解題思路

          • 簡單——會考察一些group by & limit之類的用法,或者平時用的不多的函數(shù)比如rand()類;會涉及到一些表之間的關(guān)聯(lián)
          • 中等——會考察一些窗口函數(shù)的基本用法;會有表之間的關(guān)聯(lián),相對tricky的地方在于會有一些自關(guān)聯(lián)的使用
          • 困難——會有中位數(shù)或者更加復雜的取數(shù)概念,可能要求按照某特定要求生成列;一般這種題建中間表會解得清晰些


          SQL真題
          第一題
          • order訂單表,字段為:goods_id, amount ;
          • pv 瀏覽表,字段為:goods_id,uid;
          • goods按照總銷售金額排序,分成top10,top10~top20,其他三組

          求每組商品的瀏覽用戶數(shù)(同組內(nèi)同一用戶只能算一次)


          create?table?if?not?exists?test.nil_goods_category as?
          select?goods_id
          ,case?when?nn<= 10?then?'top10'
          when?nn<= 20?then?'top10~top20'
          else?'other'?end?as?goods_group
          from
          (
          select?goods_id
          ,row_number() over(partition?by?goods_id order?by?sale_sum desc) as?nn
          from
          ????(
          select?goods_id,sum(amount) as?sale_sum
          from?order?
          group?by?1
          ????) aa
          ) bb;
          select?b.goods_group,count(distinct?a.uid) as?num
          from?pv a
          left?join?test.nil_goods_category b
          on?a.goods_id = b.goods_id
          group?by?1;

          (提示:可以左右滑動代碼)


          第二題
          商品活動表 goods_event,g_id(有可能重復),t1(開始時間),t2(結(jié)束時間)
          給定時間段(t3,t4),求在時間段內(nèi)做活動的商品數(shù)


          1.
          select?count(distinct?g_id) as?event_goods_num
          from?goods_event
          where?(t1<=t4 and?t1>=t3)
          or?(t2>=t3 and?t2<=t4)
          2.
          select?count(distinct?g_id) as?event_goods_num
          from?goods_event
          where?(t1<=t4 and?t1>=t3)
          union?all



          第三題
          商品活動流水表,表名為event,字段:goods_id, time;
          求參加活動次數(shù)最多的商品的最近一次參加活動的時間

          select?a.goods_id,a.time
          from?event?a
          inner?join
          (
          select?goods_id,count(*)
          from?event
          group?by?gooods_id
          order?by?count(*) desc
          limit?1
          ) b
          on?a.goods_id = b.goods_id
          order?by?a.goods_id,a.time desc



          第四題
          用戶登錄的log數(shù)據(jù),劃定session,同一個用戶一個小時之內(nèi)的登錄算一個session;
          生成session列

          drop?table?if?exists?koo.nil_temp0222_a2;
          create?table?if?not?exists?koo.nil_temp0222_a2 as
          select?*
          ????,row_number() over(partition?by?userid order?by?inserttime) as?nn1
          from?
          (
          select?a.*
          ????,b.inserttime as?inserttime_aftr
          ????,datediff(b.inserttime,a.inserttime) as?session_diff
          from
          ??(
          select?userid,inserttime
          ??????,row_number() over(partition?by?userid order?by?inserttime asc) nn
          from?koo.nil_temp0222
          where?userid = 1900000169
          ??) a
          left?join?
          ??(
          select?userid,inserttime
          ??????,row_number() over(partition?by?userid order?by?inserttime asc) nn
          from?koo.nil_temp0222
          where?userid = 1900000169?
          ??) b
          on?a.userid = b.userid and?a.nn = b.nn-1
          ) aa
          where?session_diff >10?or?nn = 1
          order?by?userid,inserttime;
          drop?table?if?exists?koo.nil_temp0222_a2_1;
          create?table?if?not?exists?koo.nil_temp0222_a2_1 as
          select?a.*
          ,case?when?b.nn is?null?then?a.nn+3?else?b.nn end?as?nn_end
          from?koo.nil_temp0222_a2 a
          left?join?koo.nil_temp0222_a2 b
          on?a.userid = b.userid
          and?a.nn1 = b.nn1 - 1;
          select?a.*,b.nn1 as?session_id
          from
          (
          select?userid,inserttime
          ????,row_number() over(partition?by?userid order?by?inserttime asc) nn
          from?koo.nil_temp0222
          where?userid = 1900000169
          ) a
          left?join?koo.nil_temp0222_a2_1 b
          on?a.userid = b.userid
          and?a.nn>=b.nn
          and?a.nn



          第五題
          訂單表,字段有訂單編號和時間;
          取每月最后一天的最后三筆訂單

          select?*
          from?
          (
          select?*
          ??,rank() over(partition?by?mm order?by?dd desc) as?nn1
          ??,row_number() over(partition?by?mm,dd order?by?inserttime desc) as?nn2
          from
          ??(
          select?
          cast(right(to_date(inserttime),2) as?int) as?dd,
          month(inserttime) as?mm,userid,inserttime
          from?koo.nil_temp0222
          ) aa
          ) bb
          where?nn1 = 1?and?nn2<=3;



          第六題
          數(shù)據(jù)庫表Tourists,記錄了某個景點7月份每天來訪游客的數(shù)量如下:
          id date visits 1 2017-07-01 100 …… 非常巧,id字段剛好等于日期里面的幾號。
          現(xiàn)在請篩選出連續(xù)三天都有大于100天的日期。
          上面例子的輸出為:date 2017-07-01 ……

          select?a.*,b.num as?num2,c.num as?num3
          from?table??a
          left?join?table?b
          on?a.userid = b.userid
          and?a.dt = date_add(b.dt,-1)
          left?join?table?c
          on?a.userid = c.userid
          and?a.dt = date_add(c.dt,-2)
          where?b.num>100
          and?a.num>100
          and?c.num>100



          第七題
          現(xiàn)有A表,有21個列,第一列id,剩余列為特征字段,列名從d1-d20,共10W條數(shù)據(jù)!
          另外一個表B稱為模式表,和A表結(jié)構(gòu)一樣,共5W條數(shù)據(jù)
          請找到A表中的特征符合B表中模式的數(shù)據(jù),并記錄下相對應的id
          有兩種情況滿足要求:
          • 每個特征列都完全匹配的情況下
          • 最多有一個特征列不匹配,其他19個特征列都完全匹配,但哪個列不匹配未知

          1.
          select?aa.*
          from?
          (
          select?*,concat(d1,d2,d3……d20) as?mmd
          from?table
          ) aa
          left?join?
          (
          select?id,concat(d1,d2,d3……d20) as?mmd
          from?table
          ) bb
          on?aa.id = bb.id
          and?aa.mmd = bb.mmd
          2.
          select?a.*,sum(d1_jp,d2_jp……,d20_jp) as?same_judge
          from?
          (
          select?a.*
          ??,case?when?a.d1 = b.d1 then?1?else?0?end?as?d1_jp
          ??,case?when?a.d2 = b.d2 then?1?else?0?end?as?d2_jp
          ??,case?when?a.d3 = b.d3 then?1?else?0?end?as?d3_jp
          ??,case?when?a.d4 = b.d4 then?1?else?0?end?as?d4_jp
          ??,case?when?a.d5 = b.d5 then?1?else?0?end?as?d5_jp
          ??,case?when?a.d6 = b.d6 then?1?else?0?end?as?d6_jp
          ??,case?when?a.d7 = b.d7 then?1?else?0?end?as?d7_jp
          ??,case?when?a.d8 = b.d8 then?1?else?0?end?as?d8_jp
          ??,case?when?a.d9 = b.d9 then?1?else?0?end?as?d9_jp
          ??,case?when?a.d10 = b.d10 then?1?else?0?end?as?d10_jp
          ??,case?when?a.d20 = b.d20 then?1?else?0?end?as?d20_jp
          ??,case?when?a.d11 = b.d11 then?1?else?0?end?as?d11_jp
          ??,case?when?a.d12 = b.d12 then?1?else?0?end?as?d12_jp
          ??,case?when?a.d13 = b.d13 then?1?else?0?end?as?d13_jp
          ??,case?when?a.d14 = b.d14 then?1?else?0?end?as?d14_jp
          ??,case?when?a.d15 = b.d15 then?1?else?0?end?as?d15_jp
          ??,case?when?a.d16 = b.d16 then?1?else?0?end?as?d16_jp
          ??,case?when?a.d17 = b.d17 then?1?else?0?end?as?d17_jp
          ??,case?when?a.d18 = b.d18 then?1?else?0?end?as?d18_jp
          ??,case?when?a.d19 = b.d19 then?1?else?0?end?as?d19_jp
          from?table?a
          left?join?table?b
          on?a.id = b.id
          ) aa
          where?sum(d1_jp,d2_jp……,d20_jp) = 19



          第八題
          我們把用戶對商品的評分用稀疏向量表示,保存在數(shù)據(jù)庫表t里面:
          • t的字段有:uid,goods_id,star。uid是用戶id
          • goodsid是商品id
          • star是用戶對該商品的評分,值為1-5
          現(xiàn)在我們想要計算向量兩兩之間的內(nèi)積,內(nèi)積在這里的語義為:
          對于兩個不同的用戶,如果他們都對同樣的一批商品打了分,那么對于這里面的每個人的分數(shù)乘起來,并對這些乘積求和。
          例子,數(shù)據(jù)庫表里有以下的數(shù)據(jù):
          U0 g0 2
          U0 g1 4
          U1 g0 3
          U1 g1 1
          計算后的結(jié)果為:
          U0 U1 23+41=10 ……

          select?aa.uid1,aa.uid2
          ,sum(star_multi) as?result
          from?
          (
          select?a.uid as?uid1
          ??,b.uid as?uid2
          ??,a.goods_id
          ??,a.star * b.star as?star_multi
          from?t a
          left?join?t b
          on?a.goods_id = b.goods_id
          and?a.udi<>b.uid
          ) aa
          group?by?1,2


          select?uid1,uid2,sum(multiply) as?result
          from
          (select?t.uid as?uid1, t.uid as?uid2, goods_id,a.star*star as?multiply
          from?a left?join?b
          on?a.goods_id = goods_id
          and?a.uid<>uid) aa
          group?by?goods



          第九題
          給出一堆數(shù)和頻數(shù)的表格,統(tǒng)計這一堆數(shù)中位數(shù)

          select?a.*
          ,b.s_mid_n
          ,c.l_mid_n
          ,avg(b.s_mid_n,c.l_mid_n)
          from?
          (
          select?
          case?when?mod(count(*),2) = 0?then?count(*)/2?else?(count(*)+1)/2?end?as?s_mid
          ??,case?when?mod(count(*),2) = 0?then?count(*)/2+1?else?(count(*)+1)/2?end?as?l_mid
          from?table?
          ) a
          left?join?
          (
          select?id,num,row_number() over(partition?by?id?order?by?num?asc) nn
          from?table
          ) b
          on?a.s_mid = b.nn
          left?join?
          (
          select?id,num,row_number() over(partition?by?id?order?by?num?asc) nn
          from?table
          ) c
          on?a.l_mid = c.nn



          第十題
          表order有三個字段,店鋪ID,訂單時間,訂單金額
          查詢一個月內(nèi)每周都有銷量的店鋪

          select?distinct?credit_level
          from?
          (
          select?credit_level,count(distinct?nn) as?number
          from
          ??(
          select?userid,credit_level,inserttime,month(inserttime) as?mm
          ????,weekofyear(inserttime) as?week
          ????,dense_rank() over(partition?by?credit_level,month(inserttime) order?by?weekofyear(inserttime) asc) as?nn
          from?koo.nil_temp0222
          where?substring(inserttime,1,7) = '2019-12'
          order?by?credit_level ,inserttime
          ??) aa
          group?by?1??
          ) bb
          where?number?= (select?count(distinct?weekofyear(inserttime))
          from?koo.nil_temp0222
          where?substring(inserttime,1,7) = '2019-12')


          End.

          作者:adorable_new

          來源:簡書

          本文為轉(zhuǎn)載分享,如侵權(quán)請聯(lián)系后臺刪除

          瀏覽 44
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  免费三级网 | 国产成人精品白浆久久69 | 免费看操逼网站 | 欧美操逼视频 | 小黄片网站 |