欧美日韩一区二区A片综合,收seo量跳转量影视X量@DJYT8,天堂新版8中文在线8,蜜臀久久99精品久久久久酒店 ,欧美精品久久久久性色,国产一级黄色A片在线观看,日韩欧美色,又大又粗视频

點擊上方藍色字體，選擇“設為星標”

回復”資源“獲取更多資源

點擊右側關注，大數(shù)據開發(fā)領域最強公眾號！

點擊右側關注，暴走大數(shù)據！

先把大家都知道的分桶抽樣查詢的語法以及用法po出

select?*?from?分桶表?tablesample(bucket?x?out?of?y?on?分桶字段);

假設當前分桶表，一共分了z桶！

x: 代表從當前的第幾桶開始抽樣

y: z/y 代表一共抽多少桶！

y必須是z的因子或倍數(shù)！

怎么抽：從第x桶開始抽，當y<=z每間隔y桶抽一桶，直到抽滿 z/y桶

舉例1：

select?*?from?stu_buck2?tablesample(bucket?1?out?of?2?on?id);

從第1桶開始抽，每間隔2桶抽一桶，一共抽2桶！

桶號：x+y*(n-1) 抽0號桶和2號桶

舉例2：

select?*?from?stu_buck2?tablesample(bucket?1?out?of?1?on?id);

從第1桶開始抽，每間隔1桶抽一桶，一共抽4桶！

抽0,1,2,3號桶

舉例3：

select?*?from?stu_buck2?tablesample(bucket?2?out?of?8?on?id);

從第2桶開始抽，一共抽0.5桶！

抽1號桶的一半

然而，當我自己實驗時，發(fā)現(xiàn)實際情況跟預期有偏差

建表語句：

--創(chuàng)建分桶表create table people (id int,name string)clustered by (id)sorted by (name desc) into 4 bucketsrow format  delimited fields terminated by '\t';--創(chuàng)建臨時表create table tmp (id int,name string)row format delimited fields terminated by '\t';--加載數(shù)據load data local inpath '/home/guigu/data.txt' into table tmp;--加載數(shù)據到分桶表insert overwrite table people select * from tmp;

數(shù)據：分好的桶如下

然而查詢時卻發(fā)現(xiàn)? 本來打算取第2個桶里的4/8 數(shù)據，但返回的數(shù)據跟預期差得很多

其實

select * from 分桶表 tablesample(bucket x out of y on 分桶字段);
這個抽樣查詢的底層是把所有數(shù)據按照字段的hash值 % y ?分成y 個區(qū)（相當于Hadoop里的分區(qū)），然后取第 x 區(qū) 中的數(shù)據。
之所以沒有達到預期的效果，是因為用來測試的數(shù)據太少！

歡迎點贊+收藏+轉發(fā)朋友圈素質三連

文章不錯？點個【在看】吧！??

Hive小知識之分桶抽樣