Spark SQL快速入門系列之Hive
點擊上方藍(lán)色字體,選擇“設(shè)為星標(biāo)”

目錄
一.hive和spark sql的集成方式(面試可能會問到)
二.spark_shell和spark_sql操作
spark_shell
spark_sql
使用hiveserver2 + beeline
三.腳本使用spark-sql
四.idea中讀寫Hive數(shù)據(jù)
1.從hive中讀數(shù)據(jù)
2.從hive中寫數(shù)據(jù)
使用hive的insert語句去寫
使用df.write.saveAsTable("表名")(常用)
使用df.write.insertInto("表名")
3.saveAsTable和insertInto的原理
五.聚合后的分區(qū)數(shù)
一.hive和spark sql的集成方式(面試可能會問到)

hive on spark(版本兼容)
官網(wǎng)https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
spark on hive(版本兼容)
官網(wǎng)
http://spark.apache.org/docs/2.1.1/sql-programming-guide.html#hive-tables
二.spark_shell和spark_sql操作
spark_shell

如果你在集群上使用了tez,你需要在spark/conf下spark-defaults.conf添加lzo的路徑
spark.jars=/export/servers/hadoop-2.7.7/share/hadoop/common/hadoop-lzo-0.4.20.jarbin/spark-shell?--master?yarnspark_sql
完全跟sql一樣


使用hiveserver2 + beeline
spark-sql 得到的結(jié)果不夠友好, 所以可以使用hiveserver2 + beeline
1.啟動thriftserver(后臺)
sbin/start-thriftserver.shbin/beeline# 然后輸入!connect jdbc:hive2://hadoop102:10000#?然后按照提示輸入用戶名和密碼

三.腳本使用spark-sql

四.idea中讀寫Hive數(shù)據(jù)
1.從hive中讀數(shù)據(jù)

添加依賴
org.apache.spark spark-hive_2.11 2.1.1
import org.apache.spark.sql.SparkSessionobject HiveRead {def main(args: Array[String]): Unit = {val spark = SparkSession.builder().master("local[*]").appName("HiveRead")//添加支持外置hive.enableHiveSupport().getOrCreate()spark.sql("show databases")spark.sql("use guli")spark.sql("select count(*) from gulivideo_orc").show()spark.close()??}}
結(jié)果
2.從hive中寫數(shù)據(jù)

使用hive的insert語句去寫
import org.apache.spark.sql.SparkSessionobject HiveWrite {def main(args: Array[String]): Unit = {System.setProperty("HADOOP_USER_NAME", "root");val spark = SparkSession.builder().master("local[*]").appName("HiveRead")//添加支持外置hive.enableHiveSupport().config("spark.sql.warehouse.dir","hdfs://hadoop102:9000/user/hive/warehouse").getOrCreate()//先創(chuàng)建一個數(shù)據(jù)庫spark.sql("create database spark1602")spark.sql("use spark1602")spark.sql("create table user1(id int,name string)")spark.sql("insert into user1 values(10,'lisi')").show()spark.close()}}
使用df.write.saveAsTable(“表名”)(常用)
import org.apache.spark.sql.SparkSessionobject HiveWrite {def main(args: Array[String]): Unit = {System.setProperty("HADOOP_USER_NAME", "root");val spark = SparkSession.builder().master("local[*]").appName("HiveRead")//添加支持外置hive.enableHiveSupport().config("spark.sql.warehouse.dir","hdfs://hadoop102:9000/user/hive/warehouse").getOrCreate()val df = spark.read.json("D:\\idea\\spark-sql\\input\\user.json")spark.sql("use spark1602")//直接把數(shù)據(jù)寫入到hive中,表可以存在也可以不存在df.write.saveAsTable("user2")//也可以進行追加//df.write.mode("append").saveAsTable("user2")spark.close()??}}
使用df.write.insertInto(“表名”)
import org.apache.spark.sql.SparkSessionobject HiveWrite {def main(args: Array[String]): Unit = {System.setProperty("HADOOP_USER_NAME", "root");val spark = SparkSession.builder().master("local[*]").appName("HiveRead")//添加支持外置hive.enableHiveSupport().config("spark.sql.warehouse.dir","hdfs://hadoop102:9000/user/hive/warehouse").getOrCreate()val df = spark.read.json("D:\\idea\\spark-sql\\input\\user.json")spark.sql("use spark1602")df.write.insertInto("user2")spark.close()??}}
3.saveAsTable和insertInto的原理
saveAsTable
使用列名進行分配值
insertInto
按照位置進行1對1
五.聚合后的分區(qū)數(shù)
import org.apache.spark.sql.SparkSessionobject HiveWrite {def main(args: Array[String]): Unit = {System.setProperty("HADOOP_USER_NAME", "root");val spark = SparkSession.builder().master("local[*]").appName("HiveRead")//添加支持外置hive.enableHiveSupport().config("spark.sql.warehouse.dir","hdfs://hadoop102:9000/user/hive/warehouse").getOrCreate()val df = spark.read.json("D:\\idea\\spark-sql\\input\\user.json")df.createOrReplaceTempView("a")spark.sql("use spark1602")val df1 = spark.sql("select * from a ")val df2 = spark.sql("select sum(age) sum_age from a group by name")println(df1.rdd.getNumPartitions)println(df2.rdd.getNumPartitions)df1.write.saveAsTable("a3")df2.write.saveAsTable("a4")spark.close()??}}

如果數(shù)據(jù)量小,沒必要200兩個分區(qū),簡直浪費。
?df2.write.saveAsTable("a4")
?df2.coalesce(1).write.saveAsTable("a4")

版權(quán)聲明:
文章不錯?點個【在看】吧!??




