Flink 分布式緩存原理及使用
背景
在1.9.1版本中分布式緩存并未拷貝HDFS下的文件到TM,運行時拋出如下異常。

升級到1.10.1版本,能正常使用。借此,學習下Flink 分布式緩存相關知識。
定義
官網對 distributed cache 的定義:
Flink offers a distributed cache, similar to Apache Hadoop, to make files locally accessible to parallel instances of user functions. This functionality can be used to share files that contain static external data such as dictionaries or machine-learned regression models.The cache works as follows. A program registers a file or directory of a local or remote filesystem such as HDFS or S3 under a specific name in its ExecutionEnvironment as a cached file. When the program is executed, Flink automatically copies the file or directory to the local filesystem of all workers. A user function can look up the file or directory under the specified name and access it from the worker’s local filesystem.
意思是通過Flink程序注冊一個本地或者Hdfs文件,程序在運行時,Flink會自動將該文件拷貝到每個tm中,每個函數可以通過注冊的名稱獲取該文件。
使用
官網給出的使用案例:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();// register a file from HDFSenv.registerCachedFile("hdfs:///path/to/your/file", "hdfsFile")// register a local executable file (script, executable, ...)env.registerCachedFile("file:///path/to/exec/file", "localExecFile", true)// define your program and execute...DataStreaminput = ... DataStreamresult = input.map(new MyMapper()); ...env.execute();---------------------------------------------------------------// extend a RichFunction to have access to the RuntimeContextpublic final class MyMapper extends RichMapFunction<String, Integer> {public void open(Configuration config) {// access cached file via RuntimeContext and DistributedCacheFile myFile = getRuntimeContext().getDistributedCache().getFile("hdfsFile");// read the file (or navigate the directory)...}public Integer map(String value) throws Exception {// use content of cached file...}}
實現流程
參考flink1.10.1版本的源碼,了解實現流程。
將分布式文件地址及注冊名稱寫入StreamExecutionEnvironment的cacheFile中。
protected final ListString , DistributedCache.DistributedCacheEntry>> cacheFile = new ArrayList<>();public void registerCachedFile(String filePath, String name, boolean executable) {this.cacheFile.add(new Tuple2<>(name, new DistributedCache.DistributedCacheEntry(filePath, executable)));}
在生成StreamGraph時將該cacheFile傳遞給StreamGraph的 userArtifacts。
StreamGraphGenerator-->StreamGraph
org.apache.flink.streaming.api.environment.StreamExecutionEnvironmentprivate StreamGraphGenerator getStreamGraphGenerator() {if (transformations.size() <= 0) {throw new IllegalStateException("No operators defined in streaming topology. Cannot execute.");}return new StreamGraphGenerator(transformations, config, checkpointCfg).setStateBackend(defaultStateBackend).setChaining(isChainingEnabled).setUserArtifacts(cacheFile) // note:傳遞cacheFile.setTimeCharacteristic(timeCharacteristic).setDefaultBufferTimeout(bufferTimeout);}org.apache.flink.streaming.api.graph.StreamGraphGeneratorpublic StreamGraph generate() {streamGraph = new StreamGraph(executionConfig, checkpointConfig, savepointRestoreSettings);streamGraph.setStateBackend(stateBackend);streamGraph.setChaining(chaining);streamGraph.setScheduleMode(scheduleMode);streamGraph.setUserArtifacts(userArtifacts); // note:傳遞userArtifactsstreamGraph.setTimeCharacteristic(timeCharacteristic);streamGraph.setJobName(jobName);streamGraph.setBlockingConnectionsBetweenChains(blockingConnectionsBetweenChains);alreadyTransformed = new HashMap<>();for (Transformation> transformation: transformations) {transform(transformation);}final StreamGraph builtStreamGraph = streamGraph;alreadyTransformed.clear();alreadyTransformed = null;streamGraph = null;return builtStreamGraph;}
3. 在生成JobGraph時將StreamGraph的userArtifacts 傳遞給JobGraph的userArtifacts。如果緩存文件為本地文件夾則會將該文件夾壓縮為.zip格式存儲在客戶端的臨時文件夾中,并使用新的存儲路徑。
org.apache.flink.streaming.api.graph.StreamingJobGraphGeneratorprivate JobGraph createJobGraph() {...JobGraphGenerator.addUserArtifactEntries(streamGraph.getUserArtifacts(), jobGraph);...return jobGraph;}public static void addUserArtifactEntries(Collection> userArtifacts, JobGraph jobGraph ) {if (userArtifacts != null && !userArtifacts.isEmpty()) {try {java.nio.file.Path tmpDir = Files.createTempDirectory("flink-distributed-cache-" + jobGraph.getJobID());for (Tuple2originalEntry : userArtifacts) { Path filePath = new Path(originalEntry.f1.filePath);boolean isLocalDir = false;try {FileSystem sourceFs = filePath.getFileSystem();isLocalDir = !sourceFs.isDistributedFS() && sourceFs.getFileStatus(filePath).isDir();} catch (IOException ioe) {LOG.warn("Could not determine whether {} denotes a local path.", filePath, ioe);}// zip local directories because we only support file uploadsDistributedCache.DistributedCacheEntry entry;if (isLocalDir) {// note: 壓縮本地文件夾,返回zip文件路徑Path zip = FileUtils.compressDirectory(filePath, new Path(tmpDir.toString(), filePath.getName() + ".zip"));entry = new DistributedCache.DistributedCacheEntry(zip.toString(), originalEntry.f1.isExecutable, true);} else {entry = new DistributedCache.DistributedCacheEntry(filePath.toString(), originalEntry.f1.isExecutable, false);}jobGraph.addUserArtifact(originalEntry.f0, entry);}} catch (IOException ioe) {throw new FlinkRuntimeException("Could not compress distributed-cache artifacts.", ioe);}}}
4. yarnPerjob 模式部署jobGraph時,如果是本地文件則上傳本地zip,返回該文件所在的hdfs路徑。如果緩存文件為hdfs已存在路徑,則直接寫入配置文件。
org.apache.flink.yarn.YarnClusterDescriptor// only for per job modeif (jobGraph != null) {for (Map.Entryentry : jobGraph.getUserArtifacts().entrySet()) { org.apache.flink.core.fs.Path path = new org.apache.flink.core.fs.Path(entry.getValue().filePath);// only upload local files// note: 上傳本地文件,返回hdfs中的路徑存儲在jobGraph的userArtifactsif (!path.getFileSystem().isDistributedFS()) {Path localPath = new Path(path.getPath());Tuple2remoteFileInfo = Utils.uploadLocalFileToRemote(fs, appId.toString(), localPath, homeDir, entry.getKey());jobGraph.setUserArtifactRemotePath(entry.getKey(), remoteFileInfo.f0.toString());}}// 將分布式緩存文件信息寫入到Configuration中jobGraph.writeUserArtifactEntriesToConfiguration();}DistributedCachepublic static void writeFileInfoToConfig(String name, DistributedCacheEntry e, Configuration conf) {int num = conf.getInteger(CACHE_FILE_NUM, 0) + 1;conf.setInteger(CACHE_FILE_NUM, num);conf.setString(CACHE_FILE_NAME + num, name);// note: DISTRIBUTED_CACHE_FILE_PATH_0conf.setString(CACHE_FILE_PATH + num, e.filePath);conf.setBoolean(CACHE_FILE_EXE + num, e.isExecutable || new File(e.filePath).canExecute());conf.setBoolean(CACHE_FILE_DIR + num, e.isZipped || new File(e.filePath).isDirectory());if (e.blobKey != null) {conf.setBytes(CACHE_FILE_BLOB_KEY + num, e.blobKey);}}
Task執(zhí)行時,會先讀取緩存文件中,并傳遞給RuntimeEnvironment,這樣便可以根據注冊名稱獲取文件。
從config文件中讀取緩存文件路徑。
創(chuàng)建臨時文件,將緩存文件從hdfs異步拷貝到當前TM,并將拷貝后的本地路徑存儲在內存中。臨時文件夾格式flink-dist-cache-uuid/jobId/。
org.apache.flink.runtime.taskmanager.Taskprivate void doRun() {.......// all resource acquisitions and registrations from here on// need to be undone in the endMap> distributedCacheEntries = new HashMap<>(); // next, kick off the background copying of files for the distributed cachetry {for (Map.Entryentry : DistributedCache.readFileInfoFromConfig(jobConfiguration)) {LOG.info("Obtaining local cache file for '{}'.", entry.getKey());Futurecp = fileCache.createTmpFile(entry.getKey(), entry.getValue(), jobId, executionId); distributedCacheEntries.put(entry.getKey(), cp);}}catch (Exception e) {throw new Exception(String.format("Exception while adding files to distributed cache of task %s (%s).", taskNameWithSubtask, executionId), e);}Environment env = new RuntimeEnvironment(jobId,vertexId,executionId,executionConfig,taskInfo,jobConfiguration,taskConfiguration,userCodeClassLoader,memoryManager,ioManager,broadcastVariableManager,taskStateManager,aggregateManager,accumulatorRegistry,kvStateRegistry,inputSplitProvider,distributedCacheEntries, // note:consumableNotifyingPartitionWriters,inputGates,taskEventDispatcher,checkpointResponder,taskManagerConfig,metrics,this);}public FuturecreateTmpFile(String name, DistributedCacheEntry entry, JobID jobID, ExecutionAttemptID executionId) throws Exception {synchronized (lock) {Map> jobEntries = entries.computeIfAbsent(jobID, k -> new HashMap<>()); // register reference holderfinal SetrefHolders = jobRefHolders.computeIfAbsent(jobID, id -> new HashSet<>()); refHolders.add(executionId);FuturefileEntry = jobEntries.get(name); if (fileEntry != null) {// file is already in the cache. return a future that// immediately returns the filereturn fileEntry;} else {// need to copy the file// create the target pathFile tempDirToUse = new File(storageDirectories[nextDirectory++], jobID.toString());if (nextDirectory >= storageDirectories.length) {nextDirectory = 0;}// kick off the copyingCallablecp; if (entry.blobKey != null) {cp = new CopyFromBlobProcess(entry, jobID, blobService, new Path(tempDirToUse.getAbsolutePath()));} else {cp = new CopyFromDFSProcess(entry, new Path(tempDirToUse.getAbsolutePath()));}FutureTaskcopyTask = new FutureTask<>(cp); executorService.submit(copyTask);// store our entryjobEntries.put(name, copyTask);return copyTask;}}}
算子在open函數中,讀取緩存文件。
org.apache.flink.api.common.cache.DistributedCachepublic File getFile(String name) {// note: Map> distributedCacheEntries Futurefuture = cacheCopyTasks.get(name); try {final Path path = future.get();URI tmp = path.makeQualified(path.getFileSystem()).toUri();return new File(tmp);}catch (ExecutionException e) {throw new RuntimeException("An error occurred while copying the file.", e.getCause());}catch (Exception e) {throw new RuntimeException("Error while getting the file registered under '" + name +"' from the distributed cache", e);}}
