背景

在1.9.1版本中分布式緩存并未拷貝HDFS下的文件到TM，運行時拋出如下異常。

升級到1.10.1版本，能正常使用。借此，學習下Flink 分布式緩存相關知識。

定義

官網對 distributed cache 的定義：

    Flink offers a distributed cache, similar to Apache Hadoop, to make files locally accessible to parallel instances of user functions. This functionality can be used to share files that contain static external data such as dictionaries or machine-learned regression models.           The cache works as follows. A program registers a file or directory of a local or remote filesystem such as HDFS or S3 under a specific name in its ExecutionEnvironment as a cached file. When the program is executed, Flink automatically copies the file or directory to the local filesystem of all workers. A user function can look up the file or directory under the specified name and access it from the worker’s local filesystem.

意思是通過Flink程序注冊一個本地或者Hdfs文件，程序在運行時，Flink會自動將該文件拷貝到每個tm中，每個函數可以通過注冊的名稱獲取該文件。

使用

官網給出的使用案例：

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// register a file from HDFSenv.registerCachedFile("hdfs:///path/to/your/file", "hdfsFile")
// register a local executable file (script, executable, ...)env.registerCachedFile("file:///path/to/exec/file", "localExecFile", true)
// define your program and execute...DataStream input = ...DataStream result = input.map(new MyMapper());...env.execute();
---------------------------------------------------------------// extend a RichFunction to have access to the RuntimeContextpublic final class MyMapper extends RichMapFunction<String, Integer> {
    @Override    public void open(Configuration config) {
      // access cached file via RuntimeContext and DistributedCache      File myFile = getRuntimeContext().getDistributedCache().getFile("hdfsFile");      // read the file (or navigate the directory)      ...    }
    @Override    public Integer map(String value) throws Exception {      // use content of cached file      ...    }}

實現流程

參考flink1.10.1版本的源碼，了解實現流程。

將分布式文件地址及注冊名稱寫入StreamExecutionEnvironment的cacheFile中。

protected final ListString, DistributedCache.DistributedCacheEntry>> cacheFile = new ArrayList<>();
public void registerCachedFile(String filePath, String name, boolean executable) {    this.cacheFile.add(new Tuple2<>(name, new DistributedCache.DistributedCacheEntry(filePath, executable)));}

在生成StreamGraph時將該cacheFile傳遞給StreamGraph的 userArtifacts。

StreamGraphGenerator-->StreamGraph

org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#getStreamGraphGenerator   private StreamGraphGenerator getStreamGraphGenerator() {    if (transformations.size() <= 0) {        throw new IllegalStateException("No operators defined in streaming topology. Cannot execute.");    }    return new StreamGraphGenerator(transformations, config, checkpointCfg)        .setStateBackend(defaultStateBackend)        .setChaining(isChainingEnabled)        .setUserArtifacts(cacheFile)   // note:傳遞cacheFile        .setTimeCharacteristic(timeCharacteristic)        .setDefaultBufferTimeout(bufferTimeout);}
org.apache.flink.streaming.api.graph.StreamGraphGenerator#generatepublic StreamGraph generate() {    streamGraph = new StreamGraph(executionConfig, checkpointConfig, savepointRestoreSettings);    streamGraph.setStateBackend(stateBackend);    streamGraph.setChaining(chaining);    streamGraph.setScheduleMode(scheduleMode);    streamGraph.setUserArtifacts(userArtifacts); // note:傳遞userArtifacts    streamGraph.setTimeCharacteristic(timeCharacteristic);    streamGraph.setJobName(jobName);    streamGraph.setBlockingConnectionsBetweenChains(blockingConnectionsBetweenChains);
    alreadyTransformed = new HashMap<>();
    for (Transformation transformation: transformations) {        transform(transformation);    }
    final StreamGraph builtStreamGraph = streamGraph;
    alreadyTransformed.clear();    alreadyTransformed = null;    streamGraph = null;
    return builtStreamGraph;}

3. 在生成JobGraph時將StreamGraph的userArtifacts 傳遞給JobGraph的userArtifacts。如果緩存文件為本地文件夾則會將該文件夾壓縮為.zip格式存儲在客戶端的臨時文件夾中，并使用新的存儲路徑。

org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator#createJobGraph()    private JobGraph createJobGraph() {    ...    JobGraphGenerator.addUserArtifactEntries(streamGraph.getUserArtifacts(), jobGraph);    ...    return jobGraph;}
public static void addUserArtifactEntries(Collection> userArtifacts, JobGraph jobGraph) {    if (userArtifacts != null && !userArtifacts.isEmpty()) {        try {            java.nio.file.Path tmpDir = Files.createTempDirectory("flink-distributed-cache-" + jobGraph.getJobID());            for (Tuple2 originalEntry : userArtifacts) {                Path filePath = new Path(originalEntry.f1.filePath);                boolean isLocalDir = false;                try {                    FileSystem sourceFs = filePath.getFileSystem();                    isLocalDir = !sourceFs.isDistributedFS() && sourceFs.getFileStatus(filePath).isDir();                } catch (IOException ioe) {                    LOG.warn("Could not determine whether {} denotes a local path.", filePath, ioe);                }                // zip local directories because we only support file uploads                DistributedCache.DistributedCacheEntry entry;                if (isLocalDir) {                    // note: 壓縮本地文件夾，返回zip文件路徑                    Path zip = FileUtils.compressDirectory(filePath, new Path(tmpDir.toString(), filePath.getName() + ".zip"));                    entry = new DistributedCache.DistributedCacheEntry(zip.toString(), originalEntry.f1.isExecutable, true);                } else {                    entry = new DistributedCache.DistributedCacheEntry(filePath.toString(), originalEntry.f1.isExecutable, false);                }                jobGraph.addUserArtifact(originalEntry.f0, entry);            }        } catch (IOException ioe) {            throw new FlinkRuntimeException("Could not compress distributed-cache artifacts.", ioe);        }    }}

4. yarnPerjob 模式部署jobGraph時，如果是本地文件則上傳本地zip，返回該文件所在的hdfs路徑。如果緩存文件為hdfs已存在路徑，則直接寫入配置文件。

org.apache.flink.yarn.YarnClusterDescriptor#startAppMaster
// only for per job modeif (jobGraph != null) {    for (Map.Entry entry : jobGraph.getUserArtifacts().entrySet()) {        org.apache.flink.core.fs.Path path = new org.apache.flink.core.fs.Path(entry.getValue().filePath);        // only upload local files        // note: 上傳本地文件，返回hdfs中的路徑存儲在jobGraph的userArtifacts        if (!path.getFileSystem().isDistributedFS()) {            Path localPath = new Path(path.getPath());            Tuple2 remoteFileInfo =                Utils.uploadLocalFileToRemote(fs, appId.toString(), localPath, homeDir, entry.getKey());            jobGraph.setUserArtifactRemotePath(entry.getKey(), remoteFileInfo.f0.toString());        }    }    // 將分布式緩存文件信息寫入到Configuration中    jobGraph.writeUserArtifactEntriesToConfiguration();}

DistributedCache#writeFileInfoToConfigpublic static void writeFileInfoToConfig(String name, DistributedCacheEntry e, Configuration conf) {    int num = conf.getInteger(CACHE_FILE_NUM, 0) + 1;    conf.setInteger(CACHE_FILE_NUM, num);    conf.setString(CACHE_FILE_NAME + num, name);    // note: DISTRIBUTED_CACHE_FILE_PATH_0    conf.setString(CACHE_FILE_PATH + num, e.filePath);    conf.setBoolean(CACHE_FILE_EXE + num, e.isExecutable || new File(e.filePath).canExecute());    conf.setBoolean(CACHE_FILE_DIR + num, e.isZipped || new File(e.filePath).isDirectory());    if (e.blobKey != null) {        conf.setBytes(CACHE_FILE_BLOB_KEY + num, e.blobKey);    }}

Task執(zhí)行時，會先讀取緩存文件中，并傳遞給RuntimeEnvironment，這樣便可以根據注冊名稱獲取文件。

從config文件中讀取緩存文件路徑。
創(chuàng)建臨時文件，將緩存文件從hdfs異步拷貝到當前TM，并將拷貝后的本地路徑存儲在內存中。臨時文件夾格式flink-dist-cache-uuid/jobId/。

org.apache.flink.runtime.taskmanager.Task#doRunprivate void doRun() {    .......    // all resource acquisitions and registrations from here on    // need to be undone in the end    Map> distributedCacheEntries = new HashMap<>();
    // next, kick off the background copying of files for the distributed cache    try {        for (Map.Entry entry :                DistributedCache.readFileInfoFromConfig(jobConfiguration)) {            LOG.info("Obtaining local cache file for '{}'.", entry.getKey());                        Future cp = fileCache.createTmpFile(entry.getKey(), entry.getValue(), jobId, executionId);            distributedCacheEntries.put(entry.getKey(), cp);        }    }    catch (Exception e) {        throw new Exception(            String.format("Exception while adding files to distributed cache of task %s (%s).", taskNameWithSubtask, executionId), e);    }
    Environment env = new RuntimeEnvironment(        jobId,        vertexId,        executionId,        executionConfig,        taskInfo,        jobConfiguration,        taskConfiguration,        userCodeClassLoader,        memoryManager,        ioManager,        broadcastVariableManager,        taskStateManager,        aggregateManager,        accumulatorRegistry,        kvStateRegistry,        inputSplitProvider,        distributedCacheEntries,  // note:         consumableNotifyingPartitionWriters,        inputGates,        taskEventDispatcher,        checkpointResponder,        taskManagerConfig,        metrics,        this);}

public Future createTmpFile(String name, DistributedCacheEntry entry, JobID jobID, ExecutionAttemptID executionId) throws Exception {    synchronized (lock) {        Map> jobEntries = entries.computeIfAbsent(jobID, k -> new HashMap<>());
        // register reference holder        final Set refHolders = jobRefHolders.computeIfAbsent(jobID, id -> new HashSet<>());        refHolders.add(executionId);
        Future fileEntry = jobEntries.get(name);        if (fileEntry != null) {            // file is already in the cache. return a future that            // immediately returns the file            return fileEntry;        } else {            // need to copy the file                        // create the target path            File tempDirToUse = new File(storageDirectories[nextDirectory++], jobID.toString());            if (nextDirectory >= storageDirectories.length) {                nextDirectory = 0;            }
            // kick off the copying            Callable cp;            if (entry.blobKey != null) {                cp = new CopyFromBlobProcess(entry, jobID, blobService, new Path(tempDirToUse.getAbsolutePath()));            } else {                ## note: 從hdfs異步拷貝到TM內部文件夾                cp = new CopyFromDFSProcess(entry, new Path(tempDirToUse.getAbsolutePath()));            }            FutureTask copyTask = new FutureTask<>(cp);            executorService.submit(copyTask);
            // store our entry            jobEntries.put(name, copyTask);
            return copyTask;        }    }}

算子在open函數中，讀取緩存文件。

org.apache.flink.api.common.cache.DistributedCache#getFile
public File getFile(String name) {    // note: Map> distributedCacheEntries     Future future = cacheCopyTasks.get(name);
    try {        final Path path = future.get();        URI tmp = path.makeQualified(path.getFileSystem()).toUri();        return new File(tmp);    }    catch (ExecutionException e) {        throw new RuntimeException("An error occurred while copying the file.", e.getCause());    }    catch (Exception e) {        throw new RuntimeException("Error while getting the file registered under '" + name +                "' from the distributed cache", e);    }}

轉自：https://www.jianshu.com/p/63eb0d8eb510

Flink 分布式緩存原理及使用

背景

定義

使用

實現流程