<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          Doris 實(shí)踐 | Apache Doris Spark-Connecter實(shí)戰(zhàn)

          共 6015字,需瀏覽 13分鐘

           ·

          2022-06-26 22:15

          | 從https://github.com/apache/incubator-doris-spark-connector 下載并編譯,編譯的時(shí)候建議使用Doris官方提供的編譯鏡像編譯。

          $ docker pull apache/incubator-doris:build-env-ldb-toolchain-latest

          編譯結(jié)果如下:

          [root@xxx spark-doris-connector]# pwd
          /data/incubator-doris-spark-connector/spark-doris-connector
          [root@xxx spark-doris-connector]# ls target/ -trhl
          總用量 7.7M
          drwxr-xr-x 3 root root 4.0K 5月  11 22:32 maven-shared-archive-resources
          -rw-r--r-- 1 root root    1 5月  11 22:32 classes.520860527.timestamp
          drwxr-xr-x 5 root root 4.0K 5月  11 22:32 classes
          drwxr-xr-x 5 root root 4.0K 5月  11 22:32 generated-sources
          drwxr-xr-x 3 root root 4.0K 5月  11 22:32 maven-status
          drwxr-xr-x 3 root root 4.0K 5月  11 22:32 thrift-dependencies
          drwxr-xr-x 2 root root 4.0K 5月  11 22:32 maven-archiver
          -rw-r--r-- 1 root root 174K 5月  11 22:32 spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT-sources.jar
          -rw-r--r-- 1 root root    1 5月  11 22:32 test-classes.104362910.timestamp
          drwxr-xr-x 4 root root 4.0K 5月  11 22:32 test-classes
          drwxr-xr-x 3 root root 4.0K 5月  11 22:32 generated-test-sources
          drwxr-xr-x 2 root root 4.0K 5月  11 22:32 surefire-reports
          -rw-r--r-- 1 root root 551K 5月  11 22:32 original-spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar
          -rw-r--r-- 1 root root 7.0M 5月  11 22:32 spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar
          [root@xx spark-doris-connector]#

          從官網(wǎng)下載Spark,如果官網(wǎng)比較慢,這里有個(gè)騰訊的鏡像地址,十分方便。https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/

          執(zhí)行命令下載編譯好的spark包,并解壓。

          #下載
          wget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
          #解壓
          tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz

          配置Spark環(huán)境。

          vim /etc/profile
          export SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2
          export PATH=$PATH:$SPARK_HOME/bin
          source /etc/profile

          將編譯好的spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar復(fù)制到spark 的jars目錄。

          cp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar  $SPARK_HOME/jars

          Scala 使用方式

          執(zhí)行 spark-shell,進(jìn)入Spark交互環(huán)境。確定Spark的版本。

          1. 執(zhí)行如下命令,通過(guò)Spark-Doris-Conneter 查詢(xún)Doirs數(shù)據(jù)。

            import org.apache.doris.spark._
            val dorisSparkRDD = sc.dorisRDD(
             tableIdentifier = Some("mongo_doris.data_sync_test"),
             cfg = Some(Map(
            "doris.fenodes" -> "127.0.0.1:8030",
            "doris.request.auth.user" -> "root",
            "doris.request.auth.password" -> ""
            ))
            )
            dorisSparkRDD.collect()
            1. doris.request.auth.password為密碼。

            2. doris.request.auth.user為用戶名,

            3. doris.fenodes為FE節(jié)點(diǎn)的IP和http_port,

            4. data_sync_test為表名稱(chēng),

            5. mongo_doris為數(shù)據(jù)庫(kù)名稱(chēng),

          2. 執(zhí)行完成后會(huì)將數(shù)據(jù)輸出在控制臺(tái),如果看到數(shù)據(jù)輸出則代表對(duì)接完成了。完整的情況如下:

          scala> import org.apache.doris.spark._
          import org.apache.doris.spark._

          scala> val dorisSparkRDD = sc.dorisRDD(
              |   tableIdentifier = Some("mongo_doris.data_sync_test"),
              |   cfg = Some(Map(
              |     "doris.fenodes" -> "127.0.0.1:8030",
              |     "doris.request.auth.user" -> "root",
              |     "doris.request.auth.password" -> ""
              |   ))
              | )
          dorisSparkRDD: org.apache.spark.rdd.RDD[AnyRef] = ScalaDorisRDD[0] at RDD at AbstractDorisRDD.scala:32

          scala> dorisSparkRDD.collect()
          res0: Array[AnyRef] = Array([4, 1, alex, Document{{key1=1.0}}, 20.0, 3.14, 123456.0, 2022-05-10, false], [2, 1, alex, [1.0, 2.0, 3.0], 20.0, 3.14, 123456.0, 2022-05-09, false], [3, 1, alex, [Document{{key1=1.0}}], 20.0, 3.14, 123456.0, 2022-05-10, false])

          集群版Spark一般會(huì)將依賴(lài)Jar包上傳到HDFS,然后通過(guò)spark.yarn.jars添加HDFS路徑,Spark會(huì)從HDFS上讀取Jar包。

          spark.yarn.jars=local:/usr/lib/spark/jars/*,hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
          具體參照:https://github.com/apache/incubator-doris/discussions/9486

          pyspark使用方式

          輸入如下命令進(jìn)入pyspark

          [root@xxx ~]# pyspark
          Python 3.6.9 (default, Dec  8 2021, 21:08:43)
          [GCC 8.4.0] on linux
          Type "help", "copyright", "credits" or "license" for more information.
          22/05/12 10:29:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
          Setting default log level to "WARN".
          To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
          22/05/12 10:29:26 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
          Welcome to
              ____             __
              / __/__ ___ _____/ /__
            _\ \/ _ \/ _ `/ __/ '_/
            /__ / .__/\_,_/_/ /_/\_\   version 3.1.2
              /_/

          Using Python version 3.6.9 (default, Dec  8 2021 21:08:43)
          Spark context Web UI available at http://jiafeng:4041
          Spark context available as 'sc' (master = local[*], app id = local-1652322566766).
          SparkSession available as 'spark'.

          通過(guò)pysprk從Doris讀取數(shù)據(jù).

          dorisSparkDF = spark.read.format("doris").option("doris.table.identifier", "mongo_doris.data_sync_test").option("doris.fenodes", "127.0.0.1:8030").option("user", "root").option("password", "").load()

          # 顯示5行數(shù)據(jù)

          dorisSparkDF.show(5)

          4 . 完成運(yùn)行結(jié)果如下:

          >>> dorisSparkDF = spark.read.format("doris").option("doris.table.identifier", "mongo_doris.data_sync_test").option("doris.fenodes", "127.0.0.1:8030").option("user", "root").option("password", "").load()
          >>> dorisSparkDF.show(5)
          +---+---+---------+--------------------+----+------+------------+-----------+----------+
          |_id| id|user_name|         member_list| age|height|lucky_number|create_time|is_married|
          +---+---+---------+--------------------+----+------+------------+-----------+----------+
          |  3|  1|     alex|[Document{{key1=1...|20.0|  3.14|    123456.0| 2022-05-10|     false|
          |  4|  1|     alex|Document{{key1=1.0}}|20.0|  3.14|    123456.0| 2022-05-10|     false|
          |  2|  1|     alex|     [1.0, 2.0, 3.0]|20.0|  3.14|    123456.0| 2022-05-09|     false|
          +---+---+---------+--------------------+----+------+------------+-----------+----------+

          瀏覽 137
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  欧美日P视频 | 国产精品一级毛片无码视频 | 国产亚洲 久一区二区写真 | 一级a做一级a做片性视频视频在线 | 成A人无码AV无码免费 |