<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          編譯支持 spark 讀寫(xiě) oss(cdh 5.x)

          共 17233字,需瀏覽 35分鐘

           ·

          2021-07-28 11:07

          作者:來(lái)世愿做友人_A

          來(lái)源:SegmentFault 思否社區(qū)

          前言

          背景:使用 spark 讀取 hdfs 文件寫(xiě)入到 oss
          hadoop : 2.6.0-cdh5.15.1
          spark : 2.4.1
          增加了注意點(diǎn)和坑點(diǎn)

          編譯 hadoop-aliyun

          hadoop 高版本已經(jīng)默認(rèn)支持 aliyun-oss 的訪(fǎng)問(wèn),而本版本不支持,需要編譯支持下

          • 拉取 hadoop trunk 分支代碼,copy hadoop-tools/hadoop-aliyun 模塊代碼到 cdh 對(duì)應(yīng)的項(xiàng)目模塊中
          • 修改 hadoop-tools pom.xml
            • <module>hadoop-aliyun</module> 添加 hadoop-aliyun 子 module
          • 修改根 pom.xml 中的 java 版本為 1.8,hadoop-aliyun 使用了 1.8 的 lambda 語(yǔ)法,也可以直接修改代碼支持
          • 修改 hadoop-aliyun pom.xml,修改 version,以及相關(guān)的 oss,http 依賴(lài)包,使用 shade 插件將相關(guān)依賴(lài)打進(jìn)去
          • 代碼修改
            • import org.apache.commons.lang3 改為 import org.apache.commons.lang
            • 復(fù)制(cdh版本) hadoop-aws 模塊下的 BlockingThreadPoolExecutorService 和 SemaphoredDelegatingExecutor 兩個(gè)類(lèi) 到 org.apache.hadoop.util 目錄下
          • 編譯模塊 hadoop-aliyun
            • mvn clean package -pl hadoop-tools/hadoop-aliyun
          最終的配置文件如下
          <?xml version="1.0" encoding="UTF-8"?>
          <!--
            Licensed under the Apache License, Version 2.0 (the "License");
            you may not use this file except in compliance with the License.
            You may obtain a copy of the License at

              http://www.apache.org/licenses/LICENSE-2.0

            Unless required by applicable law or agreed to in writing, software
            distributed under the License is distributed on an "AS IS" BASIS,
            WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
            See the License for the specific language governing permissions and
            limitations under the License. See accompanying LICENSE file.
          -->
          <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/maven-v4_0_0.xsd">
            <modelVersion>4.0.0</modelVersion>
            <parent>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-project</artifactId>
              <version>2.6.0-cdh5.15.1</version>
              <relativePath>../../hadoop-project</relativePath>
            </parent>
            <artifactId>hadoop-aliyun</artifactId>
            <name>Apache Hadoop Aliyun OSS support</name>
            <packaging>jar</packaging>

            <properties>
              <file.encoding>UTF-8</file.encoding>
              <downloadSources>true</downloadSources>
            </properties>

            <profiles>
              <profile>
                <id>tests-off</id>
                <activation>
                  <file>
                    <missing>src/test/resources/auth-keys.xml</missing>
                  </file>
                </activation>
                <properties>
                  <maven.test.skip>true</maven.test.skip>
                </properties>
              </profile>
              <profile>
                <id>tests-on</id>
                <activation>
                  <file>
                    <exists>src/test/resources/auth-keys.xml</exists>
                  </file>
                </activation>
                <properties>
                  <maven.test.skip>false</maven.test.skip>
                </properties>
              </profile>
            </profiles>

            <build>
              <plugins>
                <plugin>
                  <groupId>org.codehaus.mojo</groupId>
                  <artifactId>findbugs-maven-plugin</artifactId>
                  <configuration>
                    <findbugsXmlOutput>true</findbugsXmlOutput>
                    <xmlOutput>true</xmlOutput>
                    <excludeFilterFile>${basedir}/dev-support/findbugs-exclude.xml
                    </excludeFilterFile>
                    <effort>Max</effort>
                  </configuration>
                </plugin>
                <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-surefire-plugin</artifactId>
                  <configuration>
                    <forkedProcessTimeoutInSeconds>3600</forkedProcessTimeoutInSeconds>
                  </configuration>
                </plugin>
                <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-dependency-plugin</artifactId>
                  <executions>
                    <execution>
                      <id>deplist</id>
                      <phase>compile</phase>
                      <goals>
                        <goal>list</goal>
                      </goals>
                      <configuration>
                        <!-- build a shellprofile -->
                        <outputFile>
                          ${project.basedir}/target/hadoop-tools-deps/${project.artifactId}.tools-optional.txt
                        </outputFile>
                      </configuration>
                    </execution>
                  </executions>
                </plugin>
                <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-shade-plugin</artifactId>
                  <version>3.1.0</version>
                  <executions>
                    <execution>
                      <id>shade-aliyun-sdk-oss</id>
                      <phase>package</phase>
                      <goals>
                        <goal>shade</goal>
                      </goals>
                      <configuration>
                        <shadedArtifactAttached>false</shadedArtifactAttached>
                        <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
                        <createDependencyReducedPom>true</createDependencyReducedPom>
                        <createSourcesJar>true</createSourcesJar>
                        <relocations>
                          <relocation>
                            <pattern>org.apache.http</pattern>
                            <shadedPattern>com.xxx.thirdparty.org.apache.http</shadedPattern>
                          </relocation>
                        </relocations>
                      </configuration>
                    </execution>
                  </executions>
                </plugin>
              </plugins>
            </build>

            <dependencies>
              <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <scope>test</scope>
              </dependency>

              <dependency>
                <groupId>com.aliyun.oss</groupId>
                <artifactId>aliyun-sdk-oss</artifactId>
                <version>3.4.1</version>
              </dependency>
              <dependency>
                <groupId>org.apache.httpcomponents</groupId>
                <artifactId>httpclient</artifactId>
                <version>4.4.1</version>
              </dependency>
              <dependency>
                <groupId>org.apache.httpcomponents</groupId>
                <artifactId>httpcore</artifactId>
                <version>4.4.1</version>
              </dependency>

              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <exclusions>
                  <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpclient</artifactId>
                  </exclusion>
                  <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpcore</artifactId>
                  </exclusion>
                </exclusions>
                <scope>provided</scope>
              </dependency>

              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <scope>test</scope>
                <type>test-jar</type>
              </dependency>
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-distcp</artifactId>
                <scope>test</scope>
              </dependency>
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-distcp</artifactId>
                <scope>test</scope>
                <type>test-jar</type>
              </dependency>
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-yarn-server-tests</artifactId>
                <scope>test</scope>
                <type>test-jar</type>
              </dependency>
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
                <scope>test</scope>
              </dependency>
              <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-examples</artifactId>
                <scope>test</scope>
                <type>jar</type>
              </dependency>
            </dependencies>

          </project>

          spark 讀寫(xiě)取 oss 文件

          val inputPath = "hdfs:///xxx"
          val outputPath = "oss://bucket/OSS_FILES"

          val conf = new SparkConf()
              conf.set("spark.hadoop.fs.oss.endpoint""oss-cn-xxx")
              conf.set("spark.hadoop.fs.oss.accessKeyId""xxx")
              conf.set("spark.hadoop.fs.oss.accessKeySecret""xxx")
              conf.set("spark.hadoop.fs.oss.impl""org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem")
              conf.set("spark.hadoop.fs.oss.buffer.dir""/tmp/oss")
              conf.set("spark.hadoop.fs.oss.connection.secure.enabled""false")
              conf.set("spark.hadoop.fs.oss.connection.maximum""2048")
              
          spark.write.format("orc").mode("overwrite").save(outputPath)
          其它以 spark sql 以及 hdfs 讀取 oss 的方式可以參考后面的第三個(gè)鏈接

          spark submit

          spark-submit \
          --class org.example.HdfsToOSS \
          --master yarn \
          --deploy-mode cluster \
          --num-executors 2 \
          --executor-cores 2 \
          --executor-memory 3G \
          --driver-cores 1  \
          --driver-memory 3G \
          --conf "spark.driver.extraClassPath=hadoop-common-2.6.0-cdh5.15.1.jar" \
          --conf "spark.executor.extraClassPath=hadoop-common-2.6.0-cdh5.15.1.jar" \
          --jars ./hadoop-aliyun-2.6.0-cdh5.15.1.jar,./hadoop-common-2.6.0-cdh5.15.1.jar \
          ./spark-2.4-worker-1.0-SNAPSHOT.jar
          注意下 extraClassPath 這個(gè)參數(shù),如果沒(méi)有特殊的配置,spark 會(huì)默認(rèn)加載自身的 hadoop-common 包,如果版本不對(duì),可能會(huì)導(dǎo)致 ClassNotFound,需要 extraClassPath 指定,會(huì)優(yōu)先加載

          才疏學(xué)淺,如果有錯(cuò)誤的地方,歡迎指正


          點(diǎn)擊左下角閱讀原文,到 SegmentFault 思否社區(qū) 和文章作者展開(kāi)更多互動(dòng)和交流,掃描下方”二維碼“或在“公眾號(hào)后臺(tái)回復(fù)“ 入群 ”即可加入我們的技術(shù)交流群,收獲更多的技術(shù)文章~

          - END -


          瀏覽 35
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          評(píng)論
          圖片
          表情
          推薦
          點(diǎn)贊
          評(píng)論
          收藏
          分享

          手機(jī)掃一掃分享

          分享
          舉報(bào)
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  国产偷窥盗摄精品视频 | 操碰在线视频观看 | 国产精品伦理久久久久 | 欧美精品123区分布 | 亚洲天堂网视频 |