最近一段時間都在審計 Java 代碼，也算是積累了一些各式各樣小技巧，但總感覺不夠體系化。因此有必要先停下來，跳出業(yè)務邏輯并后退一步，更加深入地思考一下漏洞背后的成因。

前言

說到 URL 解析，想必關注 Web 安全的朋友們都看過 Orange 那篇 A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages^[1]，其中對不同語言中的 URL Parser 做了較為詳盡的分析。該議題主要關注不同 Parser 處理 URL 時的域名部分，以實現針對 SSRF 的繞過和后利用。

本文的關注點則有所不同，主要是針對 URL 解析的路徑部分。因為 URL 的路徑部分通常涉及到資源和服務的路由，以及對應的鑒權校驗。通常我們在漏洞挖掘和滲透測試時都收集過一些鑒權繞過的 “Tricks”，但很多時候并不了解其所以然，每每測試結束后總覺得缺少了些什么。這些繞過的 payload 是否覆蓋了所有場景？是否還有其他可能的變種？其實也不能完全肯定。

因此就有了這篇文章，一方面記錄和整理筆者遇到過的鑒權繞過技巧，另一方面也嘗試分析這些繞過背后的原理，希望對大家有所啟發(fā)。

威脅模型

本文主要以 JavaEE 的 Web 網絡框架舉例，這是因為 Java 目前在國內有著龐大的使用率和 Web 生態(tài)。不過我相信在許多其他 Web 框架下如 PHP、NodeJS 中也是可以舉一反三的。

回顧 Java 安全研究初探^[2] 一文中提到的 Java Web 應用的請求流程，大致可以抽象成如下的鏈路:

Client -> Filter_1 -> Filter_2 -> ... -> Filter_n -> Servlet

即客戶端的請求會經過一個或者多個 Filter，然后再到達實際處理請求的 Servlet 中。通常 Filter 以級聯方式運作，稱為 FilterChain，而出于低耦合的設計模式考慮，開發(fā)者一般會將重要的鑒權邏輯放在 Filter 中實現。一旦認證失敗，可以提前終止請求，這對于 Servlet 而言是透明的。

因此，所謂的鑒權繞過，很多情況下就是鑒權 Filter 中的邏輯錯誤。而 Filter 中的鑒權，大部分情況下也是 URL 粒度的鑒權，畢竟在一個網站中總是會有無需認證的前臺界面(如登錄界面)，以及需要認證的后臺服務(如管理后臺)。

從代碼上看，Filter 中鑒權使用的多是 HttpServletRequest.getRequestURI()^[3] 接口，判斷對應路徑是否需要鑒權，如果需要則進行 Session 的判斷和認證，對于鑒權失敗的請求則返回 403 拒絕訪問或者 302 跳轉到登錄界面。

繞過 Filter 中的 URL 的鑒權認證只是第一步，而更為重要的一步是如何在構造畸形 URL 的同時依然能尋址到正確的 Servlet，從而正確處理業(yè)務請求。因此本文也正是從這兩方面出發(fā)，分別探尋 URL 解析中的隱秘。

Servlet 容器

首先要明確的是，不管構造的 URL 再怎么花里胡哨，如果 URL 路徑尋址不到正確的 Servlet，那都是沒有意義的，只能獲得一個空虛的 404 聊以自慰。因此我們先從 Servlet 的尋址開始，嘗試在正常尋址的基礎上進行變異。

URL 的路由在不同的 Servlet 容器中的實現各不相同，這里僅選擇兩個筆者最近在看的容器去進行分析。

Tomcat

首先是最常見的 Tomcat 容器。關于 Tomcat 的源碼分析在網上已經有過很多文章了，我們只需要關注其中 URL 到業(yè)務代碼的路由映射過程。最簡單的方式就是使用調試器在 Servlet 中打上一個斷點去反向追蹤源碼的相關部分。業(yè)務代碼如下:

@WebServlet(name = "flagServlet", value = "/api/flag")
public class FlagServlet extends HttpServlet {
    public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException {
        PrintWriter out = response.getWriter();
        out.println(String.format("YOU WIN: [%s]", request.getRequestURI()));
    }
}

在成功訪問后返回請求的 RequestURI 的值。在 doGet 打個斷點，請求 /api/flag 后觸發(fā)的堆棧信息如下:

doGet:17, FlagServlet (com.example.tc)
service:529, HttpServlet (javax.servlet.http)
service:623, HttpServlet (javax.servlet.http)
internalDoFilter:209, ApplicationFilterChain (org.apache.catalina.core)
doFilter:153, ApplicationFilterChain (org.apache.catalina.core)
doFilter:51, WsFilter (org.apache.tomcat.websocket.server)
internalDoFilter:178, ApplicationFilterChain (org.apache.catalina.core)
doFilter:153, ApplicationFilterChain (org.apache.catalina.core)
doFilter:20, AdminFilter (com.example.tc)
doFilter:53, HttpFilter (javax.servlet.http)
internalDoFilter:178, ApplicationFilterChain (org.apache.catalina.core)
doFilter:153, ApplicationFilterChain (org.apache.catalina.core)
invoke:167, StandardWrapperValve (org.apache.catalina.core)
invoke:90, StandardContextValve (org.apache.catalina.core)
invoke:481, AuthenticatorBase (org.apache.catalina.authenticator)
invoke:130, StandardHostValve (org.apache.catalina.core)
invoke:93, ErrorReportValve (org.apache.catalina.valves)
invoke:673, AbstractAccessLogValve (org.apache.catalina.valves)
invoke:74, StandardEngineValve (org.apache.catalina.core)
service:343, CoyoteAdapter (org.apache.catalina.connector)
service:390, Http11Processor (org.apache.coyote.http11)
process:63, AbstractProcessorLight (org.apache.coyote)
process:926, AbstractProtocol$ConnectionHandler (org.apache.coyote)
doRun:1791, NioEndpoint$SocketProcessor (org.apache.tomcat.util.net)
run:52, SocketProcessorBase (org.apache.tomcat.util.net)
runWorker:1191, ThreadPoolExecutor (org.apache.tomcat.util.threads)
run:659, ThreadPoolExecutor$Worker (org.apache.tomcat.util.threads)
run:61, TaskThread$WrappingRunnable (org.apache.tomcat.util.threads)
run:829, Thread (java.lang)

注: 這里使用的 Tomcat 為 9.0.78 版本，是文章編寫時 Tomcat9 的最新版本。之所以選擇 Tomcat9 而不是 Tomcat10 是因為目前其存量應用相對較多(聽說)。

CoyoAdaptor

AdminFilter 是我自己寫的一個 Filter，這里先無需在意。Tomcat 在進入第一個 FIlter 之前就已經確定了路由的目標。注意上述調用棧中的 CoyoteAdapter，實際方法為 org.apache.catalina.connector.CoyoteAdapter#service:

@Override
public void service(org.apache.coyote.Request req, org.apache.coyote.Response res) throws Exception {
    // ...
    postParseSuccess = postParseRequest(req, request, res, response);
    if (postParseSuccess) {
        // Calling the container
        connector.getService().getContainer().getPipeline().getFirst().invoke(request, response);
    }
}

postParseRequest 的調用時機是在 TCP 的 HTTP 請求頭接收完之后，并且在 Body 讀取之前，也正是在其中處理了 URL 的映射。在 Tomcat 中請求對應的路由是存放在 request.mappingData 中的。對于上述請求，會依次調用:

? postParseRequest
? org.apache.catalina.mapper.Mapper#map
? org.apache.catalina.mapper.Mapper#internalMap

internalMap 的關鍵代碼片段如下:

private void internalMap(CharChunk host, CharChunk uri, String version, MappingData mappingData) throws IOException {
    if (uri.isNull()) {
        // Can't map context or wrapper without a uri
        return;
    }
    uri.setLimit(-1);

    // Context mapping
    ContextList contextList = mappedHost.contextList;
    MappedContext[] contexts = contextList.contexts;
    int pos = find(contexts, uri);
    if (pos == -1) {
        return;
    }
    // Wrapper mapping
    if (!contextVersion.isPaused()) {
        internalMapWrapper(contextVersion, uri, mappingData);
    }

}

contexts 數組中包含一系列 org.apache.catalina.mapper.Mapper.MappedContext#MappedContext，表示 URL 路由的上下文，說簡單點就是 URL 的前綴。一般一個 Context 前綴代表一個 JavaEE 應用，在 Tomcat 中其實還注冊了幾個默認的應用，對應的前綴(ContextRoot)分別是:

? /docs
? /examples
? /manager
? /host-manager

我們這次請求的 URI 為 /api/flag，且部署的 ContextRoot 為空，因此命中的就是我們的應用。一個完整的 URL 如下所示:

http://host:port/context-root[/url-pattern]

如果 ContextRoot 為 /demo 的話，示例的 Servlet 會映射到 /demo/api/flag。

Mapper

定位到 context 之后，接下來就是在對應 context 下查找具體的路由。這個過程稱為 “Wrapper mapping”，Wrapper 可以理解為 Servlet 的封裝。所調用的方法為 org.apache.catalina.mapper.Mapper#internalMapWrapper。

關鍵代碼如下:

private void internalMapWrapper(ContextVersion contextVersion, CharChunk path, MappingData mappingData) throws IOException {
    // Rule 1 -- Exact Match
    MappedWrapper[] exactWrappers = contextVersion.exactWrappers;
    internalMapExactWrapper(exactWrappers, path, mappingData);
    // Rule 2 -- Prefix Match
    MappedWrapper[] wildcardWrappers = contextVersion.wildcardWrappers;
    if (mappingData.wrapper == null) {
        internalMapWildcardWrapper(wildcardWrappers, contextVersion.nesting, path, mappingData);
    }
    // Rule 3 -- Extension Match
    MappedWrapper[] extensionWrappers = contextVersion.extensionWrappers;
    if (mappingData.wrapper == null && !checkJspWelcomeFiles) {
        internalMapExtensionWrapper(extensionWrappers, path, mappingData, true);
    }
    // Rule 4 -- Welcome resources processing for servlets
        // Rule 4a -- Welcome resources processing for exact macth
        // Rule 4b -- Welcome resources processing for prefix match
        // Rule 4c -- Welcome resources processing

    /*
    * welcome file processing - take 2 Now that we have looked for welcome files with a physical backing, now look
    * for an extension mapping listed but may not have a physical backing to it. This is for the case of index.jsf,
    * index.do, etc. A watered down version of rule 4
    */

    // Rule 7 -- Default servlet
}

在深入代碼之前，我們先插播一條關于 JavaEE Servlet 標準的回憶錄。

Servlet 標準

在前面的文章也簡單提到過，Servlet 映射查找是依據 JSR 340: Java Servlet 3.1 Specification^[4] 中的第 12 章 Mapping Requests to Servlets 的介紹而實現的。其中關鍵的尋址邏輯筆者總結如下:

1. 首先嘗試查找 Servlet 路由的 精確匹配；
2. 遞歸查找最長前綴匹配，以 / 字符為每一級目錄樹的分隔，即前綴匹配的單位是目錄；
3. 如果 URL 路徑的最后一個片段(Segment)包含后綴，容器會嘗試使用后綴匹配對應的 Servlet，比如對于 .jsp 后綴使用 JspServlet；
4. 如果上述規(guī)則都沒有成功匹配，容器將會嘗試根據請求的 URL 去匹配對應的資源，這通常會使用一個容器自帶的默認 Servlet 去處理。

在標準中還提到了幾個值得注意的點:

? 在匹配 ContextRoot 的時候也是使用最長前綴匹配；
? 在 URL 進行匹配時候都是 大小寫敏感的；

對于配置映射的 <url-pattern>，有以下規(guī)則:

? 映射值以 / 開頭且以 /* 結尾的用于前綴匹配映射；
? 映射值以 *. 開頭的使用后綴匹配；
? 空字符是一個特殊的映射值，指向 context-root；
? 僅包含字符 / 的映射值表示對應應用的默認 Servlet；
? 其他所有的值都被認為是精確匹配；

因此，對于應用中定義的不同映射，都可以根據尋址邏輯按照順序找到最佳的匹配。

Wrappers

回憶殺結束，繼續(xù)回到上述 internalMapWrapper 的代碼中。按照查找的順序，Tomcat 會依次從下面的 Wrapper 中進行匹配:

1. exactWrappers: 用于精確匹配的 Wrapper，比如值為 /api/flag 的 FlagServlet；
2. wildcardWrappers: 用于(最長)前綴匹配的通配符 Wrapper，比如值為 /admin/* 的 Servlet；
3. extensionWrappers: 用于后綴匹配的 Wrapper，比如值為 *.txt 的 Servlet。在 Tomcat 中默認有兩個，分別是 jsp 和 jspx，都對應 org.apache.jasper.servlet.JspServlet；
4. welcomeResources: 如果 URL 的最后一個字符是 /，則會嘗試匹配歡迎頁面，默認是 index.html、index.htm、index.jsp；對于沒有物理文件的歡迎頁面，比如 index.do、index.jsf 等，會根據后綴匹配的方式在內置資源文件中查找；
5. defaultWrapper: 在前面都沒有匹配的情況下，使用 Tomcat 的默認 Servlet 去進行處理，對應類是 org.apache.catalina.servlets.DefaultServlet，用于請求磁盤文件或者 jar 包中的文件。

大體來說，每一個 Servlet 對應一個 Wrapper 實例，而根據 URL 查找 Wrapper 的過程也就是對應 Servlet 路由查找的過程。

decodeURI

值得注意的是，在進行 Wrapper 查找的時候，所使用的 uri 是已經進行過處理的，還是在 CoyoteAdapter#postParseRequest 方法中:

protected boolean postParseRequest(){
    MessageBytes decodedURI = req.decodedURI();
    // Parse (and strip out) the path parameters
    parsePathParameters(req, request);
    // URI decoding
    // %xx decoding of the URL
    req.getURLDecoder().convert(decodedURI.getByteChunk(),
            connector.getEncodedSolidusHandlingInternal());
    // Normalization
    if (normalize(req.decodedURI())) {
        // Character decoding
        convertURI(decodedURI, request);
    }
}

decodedURI 只是 req.decodedUriMB 的一個指針(引用)，在后續(xù)處理中有時沒有直接指定 decodeURI 而是使用了 req 進行傳參，但效果都是對 decodedURI 進行修改。

首先是第一步，parsePathParameters，該方法會解析 Path Parameter 即路徑參數，后面介紹 URL 標準文檔的時候會詳細說到。路徑參數是針對每一級 URL 目錄的參數，形如 /api;a=b/flag;c=d，使用分號 ; 進行指定，并以等號 = 指定 key 和 value。解析路徑參數之后會將其使用 Request.addPathParameter 加入到請求信息中，并且將其從 decodeURI 中刪除。

第二步，URL Decode，正常的 URL 解碼。

第三步，Normalization，主要針對 URL 中包含 "", "http://", "/./" 以及 "/../" 的情況，操作過程如下:

1. 將 \ 替換為 /；
2. 將 // 替換為 /；
3. 對于 /. 或者 /.. 結尾的 URI，先在末尾額外添加一個 /；
4. 遞歸解析 URI 中的 /./ 字符串，將其替換為 /；
5. 遞歸解析 URI 中的 /../ 字符串，移動相應的目錄；

在解析 /../ 時如果超出了根目錄會直接返回 false。此外一些其他的錯誤也會返回 false，比如 CoyoteAdapter#ALLOW_BACKSLASH 為 false 卻包含反斜杠，路徑不以 / 或者 \ 開頭等。

第四步，Character decoding，使用 convertURI 對路徑進行字符解碼。因為此時的路徑還是以 ByteChunk 的格式進行存儲的，這一步會將其轉換為 CharChunk；

在依次經過上述處理后，最終的 URI 才會用來進行 Servlet 路由查找。

Bypass Tricks

據此，我們可以得到一系列 “Bypass Tricks”，即用不同的方式路由到同一個 Servlet 中，比如目標路由是 /api/flag 的話，以下請求都能尋址到目標:

? /api;a=b/flag: 通過 Path Parameter 路徑參數變異；
? /api/%66%6C%61%67: 通過 URL 編碼進行變異；
? \api\flag: 通過 Normalization 1 變異，當前需要 Tomcat 配置 ALLOW_BACKSLASH 為 true；
? //api/flag: 通過 Normalization 2/3 變異;
? /api/./flag: 通過 Normalization 4 變異；
? /foo/api/../api/flag: 通過 Normalization 5 變異；

這些變異方法可以相互組合進行使用，另外配合 DefaultServlet 針對磁盤文件和資源的路由也可以組合出其他的 URI。

Resin

雖然 Tomcat 相對常用，但實際場景中也有許多其他的 Web 容器實現。正所謂兼聽則明，偏信則暗，因此這里選取另外一個 Web 容器 Resin 作為對比，看看二者對 URL 的處理和路由有何異同。之所以選擇 Resin 只不過是因為筆者最近正好看到而已。

這里選擇的 Resin 版本是 4.0.58

還是老方法，直接在 Servlet 中下斷點，觀察 Resin 的請求調用棧如下:

doGet:17, FlagServlet (com.example.resin)
service:120, HttpServlet (javax.servlet.http)
service:97, HttpServlet (javax.servlet.http)
doFilter:109, ServletFilterChain (com.caucho.server.dispatch)
doFilter:21, AdminFilter (com.example.resin)
doFilter:127, HttpFilter (javax.servlet.http)
doFilter:89, FilterFilterChain (com.caucho.server.dispatch)
doFilter:156, WebAppFilterChain (com.caucho.server.webapp)
doFilter:95, AccessLogFilterChain (com.caucho.server.webapp)
service:304, ServletInvocation (com.caucho.server.dispatch)
handleRequest:840, HttpRequest (com.caucho.server.http)
dispatchRequest:1367, TcpSocketLink (com.caucho.network.listen)
handleRequest:1323, TcpSocketLink (com.caucho.network.listen)
handleRequestsImpl:1307, TcpSocketLink (com.caucho.network.listen)
handleRequests:1215, TcpSocketLink (com.caucho.network.listen)
handleAcceptTaskImpl:1011, TcpSocketLink (com.caucho.network.listen)
runThread:117, ConnectionTask (com.caucho.network.listen)
run:93, ConnectionTask (com.caucho.network.listen)
handleTasks:175, SocketLinkThreadLauncher (com.caucho.network.listen)
run:61, TcpSocketAcceptThread (com.caucho.network.listen)
runTasks:173, ResinThread2 (com.caucho.env.thread2)
run:118, ResinThread2 (com.caucho.env.thread2)

前部分的調用為 FilterChain 調用，逐個調用已注冊的 Filter。進入第一個 Filter 之前的調用為我們關心的路由解析完成之時。其方法為 com.caucho.server.http.HttpRequest#handleRequest，大致調用代碼如下:

public boolean handleRequest() throws IOException {
  startRequest();
  if (! parseRequest()) {
      return false;
  }
  CharSequence host = getInvocationHost();
  Invocation invocation = getInvocation(host, _uri, _uriLength);

  startInvocation();
  invocation.service(requestFacade, getResponseFacade());
}

其中，startRequest 方法會讀取 _uri、_headerKeys、_headerValues 等屬性；parseRequest 將 buffer 中的請求方法、URI、HTTP 協議以及后續(xù)的 HTTP 頭逐行解析并保存。

在 readRequest 即 HTTP 的第一行時讀取 URI，有相關的代碼片段:

// skip 'http:'
if (ch != '/') {
}
// read URI
while (! isHttpWhitespace[ch]) {
  uriBuffer[uriLength++] = (byte) ch;
  if (readTail <= readOffset) {
    uriBuffer = _uri;
    uriLength = _uriLength;
}
  ch = readBuffer[readOffset++] & 0xff;
}

其中如果第一個字符不是 / 會跳到 URL 路徑中進行讀取。

Invocation

與 Tomcat 的 Wrapper 對應的數據結構在 Resin 中稱為 Invocation，獲得了 Invocation 也就獲得了對應 Servlet 的路由映射。因此我們重點關注 getInvocation 方法的實現，調用過程如下:

? com.caucho.server.http.AbstractHttpRequest#getInvocation

? com.caucho.server.dispatch.InvocationServer#getInvocation
? com.caucho.util.LruCache#get

? com.caucho.server.http.AbstractHttpRequest#buildInvocation

簡單來說，AbstractHttpRequest#getInvocation 會先嘗試從 InvocationServer._invocationCache 緩存中獲取 Invocation 對象，如果不存在則會使用 buildInvocation 新建一個并放入緩存中。_invocationCache 是個 LRU 緩存，鍵為 com.caucho.server.http.InvocationKey 類型。InvocationKey 中包含 host、port、uri 三元組以及 isSecure 標志位，這么做的好處是節(jié)約路由查找時間，對于大型項目而言路由映射往往成百上千，每次請求都進行查找顯然比較耗時。

? com.caucho.server.http.AbstractHttpRequest#buildInvocation
? com.caucho.server.dispatch.InvocationDecoder#splitQueryAndUnescape

splitQueryAndUnescape 中對 Invocation 的 URI 進行了處理和賦值，代碼比較重要，所以這里直接把完整的代碼貼出來:

public void splitQueryAndUnescape(Invocation invocation,
                                  byte []rawURI, int uriLength)
  throws IOException
{
  for (int i = 0; i < uriLength; i++) {
    if (rawURI[i] == '?') {
      i++;

      // XXX: should be the host encoding?
      String queryString = byteToChar(rawURI, i, uriLength - i,
                                      "ISO-8859-1");
      invocation.setQueryString(queryString);

      uriLength = i - 1;
      break;
    }
  }

  String rawURIString = byteToChar(rawURI, 0, uriLength, "ISO-8859-1");
  invocation.setRawURI(rawURIString);

  String decodedURI = normalizeUriEscape(rawURI, 0, uriLength, _encoding);

  if (_sessionSuffix != null) {
    int p = decodedURI.indexOf(_sessionSuffix);

    if (p >= 0) {
      int suffixLength = _sessionSuffix.length();
      int tail = decodedURI.indexOf(';', p + suffixLength);
      String sessionId;

      if (tail > 0)
        sessionId = decodedURI.substring(p + suffixLength, tail);
      else
        sessionId = decodedURI.substring(p + suffixLength);

      decodedURI = decodedURI.substring(0, p);

      invocation.setSessionId(sessionId);

      p = rawURIString.indexOf(_sessionSuffix);
      if (p > 0) {
        rawURIString = rawURIString.substring(0, p);
        invocation.setRawURI(rawURIString);
      }
    }
  }
  else if (_sessionPrefix != null) {
    if (decodedURI.startsWith(_sessionPrefix)) {
      int prefixLength = _sessionPrefix.length();

      int tail = decodedURI.indexOf('/', prefixLength);
      String sessionId;

      if (tail > 0) {
        sessionId = decodedURI.substring(prefixLength, tail);
        decodedURI = decodedURI.substring(tail);
        invocation.setRawURI(rawURIString.substring(tail));
      }
      else {
        sessionId = decodedURI.substring(prefixLength);
        decodedURI = "/";
        invocation.setRawURI("/");
      }

      invocation.setSessionId(sessionId);
    }
  }

  String uri = normalizeUri(decodedURI);

  invocation.setURI(uri);
  invocation.setContextURI(uri);
}

提煉一下關鍵點，URI 的處理流程如下:

1. 找到第一個 ? 符號并將之后的設置為 queryString；
2. normalizeUriEscape: 對路徑進行 URI 解碼；
3. sessionSuffix 提取: 如果 URI 末尾的路徑參數是對應的后綴，則從中提取并設置 SessionID，默認的后綴是 ;jsessionid。注意此時 sessionid 參數及其之后的數據都會從 rawURI 中移除；
4. sessionPrefix 提取: 默認為空即不需要操作；
5. normalizeUri: 對路徑進行歸一化操作，后面細說；

normalizeUriEscape 的 URL 解碼除了傳統(tǒng)的 %dd 解碼，還支持 Resin 特殊的 URL 編碼，其實現如下:

private static int scanUriEscape(ByteToChar converter,
                                 byte []rawUri, int i, int len)
  throws IOException
{
  int ch1 = i < len ? (rawUri[i++] & 0xff) : -1;

  if (ch1 == 'u') {
    ch1 = i < len ? (rawUri[i++] & 0xff) : -1;
    int ch2 = i < len ? (rawUri[i++] & 0xff) : -1;
    int ch3 = i < len ? (rawUri[i++] & 0xff) : -1;
    int ch4 = i < len ? (rawUri[i++] & 0xff) : -1;

    converter.addChar((char) ((toHex(ch1) << 12) +
                              (toHex(ch2) << 8) + 
                              (toHex(ch3) << 4) + 
                              (toHex(ch4))));
  }
  else {
    int ch2 = i < len ? (rawUri[i++] & 0xff) : -1;

    int b = (toHex(ch1) << 4) + toHex(ch2);;

    converter.addByte(b);
  }

  return i;
}

即 %u0067 這種類型的解碼。

注意這里處理完后的 URI 除了刪除 jsessionid，依然帶有其他的 Path Parameter 路徑參數，這部分處理要在后面進行操作。不過在此之前還是按順序先看 normalizeUri 的實現。

normalizeUri

從參數來看，InvocationDecoder.normalizeUri 的實現會根據當前 JVM 運行的操作系統(tǒng)來執(zhí)行不同的歸一化操作，由于這是我們 URI 變異的重點，因此直接將完整的方法實現貼出來:

public String normalizeUri(String uri, boolean isWindows)
  throws IOException
{
  CharBuffer cb = new CharBuffer();

  int len = uri.length();

  if (_maxURILength < len)
    throw new BadRequestException(L.l("The request contains an illegal URL because it is too long."));

  char ch;
  if (len == 0 || (ch = uri.charAt(0)) != '/' && ch != '\\')
    cb.append('/');

  for (int i = 0; i < len; i++) {
    ch = uri.charAt(i);

    if (ch == '/' || ch == '\\') {
    dots:
      while (i + 1 < len) {
        ch = uri.charAt(i + 1);

        if (ch == '/' || ch == '\\')
          i++;
        else if (ch != '.')
          break dots;
        else if (len <= i + 2
                 || (ch = uri.charAt(i + 2)) == '/' || ch == '\\') {
          i += 2;
        }
        else if (ch != '.')
          break dots;
        else if (len <= i + 3
                 || (ch = uri.charAt(i + 3)) == '/' || ch == '\\') {
          int j;

          for (j = cb.length() - 1; j >= 0; j--) {
            if ((ch = cb.charAt(j)) == '/' || ch == '\\')
              break;
          }
          if (j > 0)
            cb.setLength(j);
          else
            cb.setLength(0);
          i += 3;
        } else {
          throw new BadRequestException(L.l("The request contains an illegal URL."));
        }
      }

      while (isWindows && cb.getLength() > 0
             && ((ch = cb.getLastChar()) == '.' || ch == ' ')) {
        cb.setLength(cb.getLength() - 1);

        if (cb.getLength() > 0
            && (ch = cb.getLastChar()) == '/' || ch == '\\') {
          cb.setLength(cb.getLength() - 1);
          // server/003n
          continue;
        }
      }

      cb.append('/');
    }
    else if (ch == 0)
      throw new BadRequestException(L.l("The request contains an illegal URL."));
    else
      cb.append(ch);
  }

  while (isWindows && cb.getLength() > 0
         && ((ch = cb.getLastChar()) == '.' || ch == ' ')) {
    cb.setLength(cb.getLength() - 1);
  }

  return cb.toString();
}

如無特別說明則認為分隔符是 / 或者 \。這里實現路徑歸一化的算法是新建一個 CharBuffer，然后逐字符解析 URI，對于正常字符直接加入到 cb 中，而遇到分隔符后需要考慮特殊的情況。

1. 首先如果 URI 首個字母不是分隔符，則會先在 cb 中添加 /；
2. 對于遇到分隔符的情況:

1. 如果后 1 個字符還是分隔符，則往前進 2 個字符；
2. 如果后 2 個字符是 . + 分隔符，則往前進 2 個字符；
3. 如果后 3 個字符是 .. + 分隔符，則往前進 3 個字符，并且 cb 中回退一級目錄，如果超出了根目錄會將 cb 置空；如果后 2 個字符是 .. 但第 3 個字符不是分隔符，會直接拋異常；
4. 其他情況下 cb 中添加一個分隔符 /；但是在添加之前，對于 Windows 系統(tǒng)，如果 cb 中末尾的字符是 . 或者空格，會將其刪除，同時會刪除末尾的分隔符；

3. 對于其他字符，直接加入 cb 中；
4. 遍歷完成后，對于 Windows 系統(tǒng)，刪除 cb 中末尾的 . 和空格；

這里 normallize 方法的實現還是比較特別的，據此我們可以得到一些特殊的變異方式，具體在后文提及。

ServletMapper

前面獲取到 Invocation 中的 URI 經過了 URL 解碼和路徑歸一化，隨后調用 InvocationServer#buildInvocation 獲取對應 Server 的 Invocation 實例。最終會進入到 ServletMapper 的 mapServlet 進行 Servlet 映射查找。其中經過了 Host、Container 以及 WebApp 的查找，調用棧如下:

mapServlet:234, ServletMapper (com.caucho.server.dispatch)
buildInvocation:4154, WebApp (com.caucho.server.webapp)
buildInvocation:798, WebAppContainer (com.caucho.server.webapp)
buildInvocation:753, Host (com.caucho.server.host)
buildInvocation:319, HostContainer (com.caucho.server.host)
buildInvocation:1064, ServletService (com.caucho.server.cluster)
buildInvocation:250, InvocationServer (com.caucho.server.dispatch)
buildInvocation:223, InvocationServer (com.caucho.server.dispatch)
buildInvocation:1610, AbstractHttpRequest (com.caucho.server.http)
getInvocation:1583, AbstractHttpRequest (com.caucho.server.http)
handleRequest:822, HttpRequest (com.caucho.server.http)

mapServlet 是實際進行 Servlet 查找的實現，其大致代碼如下:

public FilterChain mapServlet(ServletInvocation invocation) throws ServletException {
    String servletName = null;
    String contextURI = invocation.getContextURI();
    if (_servletMap != null) {
        cleanUri = Invocation.stripPathParameters(contextURI);
        ServletMapping servletMap = _servletMap.map(cleanUri, vars);

        if (servletMap != null) {
            servletName = servletMap.getServletName();
        }
    }

    if (servletName == null) {
        InputStream is;
        is = _webApp.getResourceAsStream(contextURI);
        if (is != null) {
            is.close();
            servletName = _defaultServlet;
        }
    }
    MatchResult matchResult = null;
    if (matchResult == null && contextURI.endsWith("j_security_check")) {
      servletName = "j_security_check";
    }

    if (servletName == null) {
      servletName = _defaultServlet;
      vars.clear();
      if (matchResult != null)
        vars.add(matchResult.getContextUri());
      else
        vars.add(contextURI);
      addWelcomeFileDependency(invocation);
    }

    if (servletName == null) {
      log.fine(L.l("'{0}' has no default servlet defined", contextURI));
      return new ErrorFilterChain(404);
    }

    String servletPath = vars.get(0);
    invocation.setServletPath(servletPath);
    invocation.setPathInfo(/*...*/);

    if (servletName.equals("invoker"))
      servletName = handleInvoker(invocation);

    invocation.setServletName(servletName);
    ServletConfigImpl newConfig = _servletManager.getServlet(servletName);
    if (newConfig != null) config = newConfig;

    FilterChain chain = _servletManager.createServletChain(servletName, config, invocation);

    if (chain instanceof PageFilterChain) {
      PageFilterChain pageChain = (PageFilterChain) chain;
      chain = PrecompilePageFilterChain.create(invocation, pageChain);
    }

    return chain;
}

按照執(zhí)行順序，簡單總結一下:

1. 首先使用 ServletInvocation#stripPathParameters 刪除前面提到剩下的路徑參數，即刪除 ; 到 / 之間或者到 URI 末尾的內容，得到的 cleanUri 是最終用于路由查找的 URI；
2. 使用 _servletMap.map 查找對應的 Servlet；
3. 如果沒找到，使用 _webApp.getResourceAsStream 查找對應磁盤文件或者資源文件是否存在，_defaultServlet 為 resin-file；
4. 如果 contextURI 以 j_security_check 結尾，則路由到名稱為 j_security_check 的 Servlet；
5. 最后查找 welcom file，默認包括 index.html、index.jsp 和 index.php；

_servletMap 是 UrlMap<ServletMapping> 類型，其中有一個 RegexEntry 數組，包含了所有(當前應用)已經注冊的 Servlet 映射和路由。

在 Resin 中，每個路由都使用正則表達式來進行匹配，如果多個 Servlet 都匹配中了同一個 URI，則會根據匹配的精度選擇最佳結果，從而符合 Java EE Servlet 標準中的路由映射定義。

? 對于精確匹配，比如 /hello，對應的正則是 ^/hello$；
? 對于路徑匹配，比如 /admin/*，對應的正則是 ^/admin(?=/)|^/admin\z；
? 對于后綴匹配，比如 *.jsp，對應的正則是 ^.*\.jsp(?=/)|^.*\.jsp\z；

其中精確匹配的末尾是 $，而后綴匹配的末尾是 \z，這二者有何不同？經過翻閱文檔得知，\z 表示字符串末尾，而 $ 表示行末，因此 \z 可以匹配換行符而 $ 不能！

繼續(xù)往下看，getResourceAsStream 查找磁盤或者資源文件的實現如下:

public InputStream getResourceAsStream(String uripath)
{
  Path rootDirectory = getRootDirectory();
  Path path = rootDirectory.lookupNative(getRealPath(uripath));

    if (path.canRead())
      return path.openRead();
    else {
      String resource = "META-INF/resources" + uripath;
      return getClassLoader().getResourceAsStream(resource);
    }
}

這和預料中的差不多，后面確定 Servlet 都使用了字符串名稱，比如 resin-file、j_security_check，實際查找是通過 _servletManager.getServlet(servletName) 獲取對應的 Servlet 實例，由此我們得知 ServletManager#_servlets 字典中保存了所有注冊的 Servlet:

? j_security_check -> com.caucho.server.security.FormLoginServlet
? resin-xtp -> com.caucho.jsp.XtpServlet
? resin-jsp -> com.caucho.jsp.JspServlet
? resin-jspx -> com.caucho.jsp.JspServlet
? resin-file -> com.caucho.servlets.FileServlet
? resin-xtp -> com.caucho.jsp.XtpServlet -> com.caucho.quercus.servlet.QuercusServlet
? Hello -> com.example.resin.HelloServlet
? txtServlet -> com.example.resin.TxtServlet
? flagServlet -> com.example.resin.FlagServlet
? adminServlet -> com.example.resin.AdminServlet

后四個是筆者自己寫的 Servlet，其他的都是 Resin 默認添加的。默認的 Servlet resin-file 對應 FileServlet，也負責讀取靜態(tài)文件或者資源的 Servlet。至此就完成了 Servlet 的查找和映射，隨后只需要構建 FilterChain 并逐級調用 Filter 后即可到達對應的業(yè)務代碼中。

Bypass Tricks

類似于 Tomcat，根據對上述 Resin 源碼的分析，我們也可以得出一些 Bypass Tricks，在原始 URI /api/flag 的基礎上進行變異以實現正常路由到相同 Servlet 的目的:

? hack/api/flag: 基于 readRequest 中判斷 URI 的第一個字符不為 / 的變異；
? /api/fla%67: 基于 normalizeUriEscape 解碼的變異；
? /api/fla%u0067: 基于 scanUriEscape 中特殊 URL 編碼的變異；
? /api//flag: 基于 normalizeUri 2.1 的變異；
? /api/./flag: 基于 normalizeUri 2.2 的變異；
? ../api/flag: 基于 normalizeUri 2.3 的變異；
? /api\flag: 基于 normalizeUri 2.4 的變異，對于分隔符都會轉換為 /；
? /api/flag%20(空格)，/api/flag.: 基于 normalizeUri 2.4 的變異，僅在 Windows 系統(tǒng)中有效；
? /api/flag;a=b: 針對 stripPathParameters 的變異；
? /api/flag%0a: 基于正則表達式 $ 不匹配換行的變異；

這些變異可以組合使用，從而形成更加豐富的 URI 結果。

小結

通過分析 Tomcat 和 Resin 的 URI 解析流程，我們可以發(fā)現很多共同點，比如都會進行 URL Decode、都會進行路徑歸一化以及刪除路徑參數。同時值得注意的是不同的 Web 容器實現上又有一定差異，比如 Tomcat 在路徑歸一化時超出根目錄的會報錯，而 Resin 則會靜默保留到根目錄；Tomcat 與 Resin 的路由方式不同，前者嚴格按照 Servlet 標準實現，而后者則簡化成正則表達式實現。因此在分析不同的 Web 容器時，除了嘗試常見的 URI 變異方法，還可以針對性地分析對應容器的內部實現，從而找到隱藏更深的問題。

文件系統(tǒng)

在前面分析 Tomcat 和 Resin 容器的時候，可以發(fā)現有一個共同點，即路由在查找所有 Servlet 都失敗的情況下，會使用默認的 Servlet 去進行處理，這個默認 Servlet 的作用就是讀取本地磁盤或者資源中的文件。

因此僅從讀文件的情況下看，我們是否可以利用文件系統(tǒng)尋址文件的特性來變異文件名呢？答案是肯定的。按照慣例來說應該查看對應文件系統(tǒng)的實現來看，但是這涉及到內核源碼的分析，稍微超出了本 Java Boy 的能力范圍，所以這里我直接寫了一個簡單的 Fuzzer 來進行文件名變異:

private void checkName(int i, String filename) {
    File f = new File(filename);
    if (f.exists())
    System.out.println(String.format("%06x pass: [%s]", i, filename));
}

public void fuzzFilename(String originFile) {
    File file = new File(originFile);
    if ( !file.exists() ) {
        return;
    }
    System.out.println(originFile + " exists.");
    Path filePath = Paths.get(originFile);
    String name = filePath.getFileName().toString();
    Path parentPath = filePath.getParent();
    String dirName = parentPath != null ? parentPath.toString() : "";
    System.out.println(String.format("dirName: %s, name: %s", dirName, name));

    for (int i = 0; i <= 0x10FFFF; i++) {
        // if (!Character.isValidCodePoint(i)) continue;
        String fuzzChar = new String(Character.toChars(i));
        checkName(i, originFile + fuzzChar);
        checkName(i, fuzzChar + originFile);
        checkName(i, dirName + fuzzChar + name);
        checkName(i, dirName + fuzzChar + File.separator + name);
        checkName(i, dirName + File.separator + fuzzChar + name);
    }
}

在 Linux/MacOS 中，輸出如下:

dirName: /tmp, name: 1.txt
00002f pass: [/tmp/1.txt/]
00002f pass: [//tmp/1.txt]
00002f pass: [/tmp//1.txt]

值得注意的只有第一個，即在文件末尾加上 / 依然可以尋址到文件。

在 Windows 就比較豐富了，輸出如下:

test\1.txt exists.
dirName: test, name: 1.txt
000020 pass: [test\1.txt ]
00002e pass: [test.\1.txt]
00002e pass: [test\1.txt.]
00002f pass: [test/1.txt]
00002f pass: [test/\1.txt]
00002f pass: [test\/1.txt]
00002f pass: [test\1.txt/]
00005c pass: [test\1.txt\]
00005c pass: [test\1.txt]
00005c pass: [test\\1.txt]

可以看到，在 Windows 中，文件名末尾加空格，.、/ 以及 \ 都不影響文件的定位，甚至目錄末尾也可以加 . 號，這對于請求變異來說無疑增加了許多可能。假設 URL 原本要請求的文件是 /ppp/secret.txt，那么在 Windows 中請求就可以變異成:

? /ppp/SERET.TXT (大小寫不敏感)
? /ppp/secret.txt%20
? /ppp.\seret.txt
? /ppp/seret.txt%2e
? /ppp/seret.txt/
? ...

值得一提的是，在 Windows 常用的 NTFS 文件系統(tǒng)中，還有一種特殊的文件表示成為備用數據流(Alternate Data Streams)，簡稱 ADS。NTFS 中每個文件都有至少一個數據流，即主數據流，數據流的完整表示方法為:

<文件名>:<數據流名>:<數據流類型>

默認數據流名稱為空，因此對于文件 foo.txt 而言其默認數據流的全稱應該是 foo.txt::$DATA，其中數據流名為空，類型為 $DATA。對于目錄而言，不存在默認數據流，但是有默認的目錄流(directory stream)。目錄流類型為 $INDEX_ALLOCATION，默認的流名稱為 $I30，因此下面的目錄表示都是等價的:

? C:\Users
? C:\Users:$I30:$INDEX_ALLOCATION
? C:\Users::$INDEX_ALLOCATION

該特性可以使我們在 Windows 中尋址文件和目錄時候使用更加隱晦的變異方法。

參考文檔:

? 2.1.4 Alternate Data Streams^[5]
? 5.1 NTFS Streams^[6]

URI 標準

既然上面在討論的都是 URL，那自然免不了 “官方” 的解釋。不管是 Servlet 容器還是 HTTP Server，所遵循的都是同一個標準，即:

? RFC3986 - Uniform Resource Identifier (URI): Generic Syntax^[7]

關于該標準相信其他師傅多少都有提及過，只是各自的關注點不同。對于本文而言，關注點在路徑部分。在 3.3 節(jié)中指明，path 以第一個問號 ? 或者井號 # 終止，或者是到 URI 的結尾。各部分的示意圖如下所示:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment
   |   _____________________|__
  / \ /                        \
  urn:example:animal:ferret:nose

所以一般路由尋址的時候是不包含 ? 后面的參數的，但其他的部分實現各有不同，Tomcat 在路徑中包含 # 時會報錯，但 Resin 則會當成合法的路徑一部分去進行路由查找。

路徑部分的 ABNF 定義如下:

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "http://"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>

segment       = *pchar
segment-nz    = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
            ; non-zero-length segment without any colon ":"

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

路徑由段(segments)組成，不同段之間使用斜杠 / 進行分隔。

. 和 .. 稱為 dot-segments，在對路徑進行路由之前需要先去除這些段，在 5.2 章 Relative Resolution 中詳細介紹了如何解析包含 dot-segment 的路徑，甚至還給出了偽代碼實現。

另外 path segment 中通?？梢园Ａ糇址脕硖峁┮恍┳咏M件，比如前面提到過的使用分號 ; 和等號 = 來為每個 segment 提供參數。標準中提到逗號 , 通常也有類似的作用，不過我們之前分析源碼的時候發(fā)現 Tomcat 和 Resin 都對逗號沒有特殊的處理。

鑒權案例

前面提到過，為了降低代碼的耦合性，鑒權一般放到 Filter 中實現而不是在 Servlet 中實現。對于一般的開發(fā)者而言，可以很簡單的寫出一個鑒權 Filter，比如:

@WebFilter(filterName = "AdminFilter", urlPatterns = "/*")
public class AdminFilter extends HttpFilter {
    @Override
    public void doFilter(HttpServletRequest req, HttpServletResponse res, FilterChain chain) throws IOException, ServletException {
        String uri = req.getRequestURI();
        if (uri.startsWith("/api/flag")) {
            res.setStatus(403);
            res.getWriter().write("YOU SHALL NOT PASS: " + uri);
        } else {
            chain.doFilter(req, res);
        }
    }
}

看了前面的 Bypass Tricks，我們知道僅使用 startsWith 做鑒權是多么地不堪一擊，但令人驚訝的是現實中確實存在著許多類似的鑒權代碼。不僅有 startsWith，endsWith，甚至還有使用 indexOf 去比較 URI 進行鑒權的。

某 OA

當然，這類應用在紅隊一次又一次的毒打中逐漸成長了起來，也知道了在對 URI 鑒權之前需要先對其進行一定的清洗和過濾。下面是一個某知名 OA 應用中鑒權部分對 URI 的預處理代碼:

public String path(String path) {
   path = this.uriDecode(path);
   if (path != null && path.indexOf("\\") != -1) {
      path = StringUtil.replace(path, "\\", "/", false);
   }

   if (path != null && path.indexOf("..") != -1) {
      path = StringUtil.replace(path, "\\.{2,}", "");
   }

   if (path != null && path.indexOf("./") != -1) {
      path = StringUtil.replace(path, "./", "", false);
   }

   if (path != null && path.indexOf(";") != -1) {
      path = StringUtil.replace(path, ";.*?/", "/");
   }

   if (path != null && path.indexOf(";") != -1) {
      path = StringUtil.replace(path, ";.*", "");
   }

   if (path != null && path.indexOf("http://") != -1) {
      path = StringUtil.replace(path, "/{2,}", "/");
   }

   if (path != null) {
      path = StringUtil.replace(path, "\\s", "");
   }

   if (!path.equals("") && !path.startsWith("/")) {
      path = "/" + path;
   }

   return path;
}
private String uriDecode(String path) {
  try {
      if (path != null && path.indexOf("%") != -1) {
        return URLDecoder.decode(path);
      }
  } catch (Exception var3) {
      this.writeLog("uri decode error:" + path, true);
      this.writeError(var3);
  }

  return path;
}

首先 URI 解碼，然后替換反斜杠，也別管 URI 的標準了，.. 直接刪掉，路徑參數也直接刪掉，還有空格、// 之類的都進行了處理，看起來是不是無懈可擊？但是其實仔細看一下會發(fā)現還是有一些問題，比如:

? uriDecode 中可能會失敗，導致根本沒解碼成功而返回原始的路徑；
? URI 解碼之后，路徑可能會存在大寫，如果是 WIndows 中請求 JSP 等文件可以正確路由；
? 路徑參數的刪除在 ./ 之后，所以 .;xxx/ 處理后依舊存在 ./；
? 刪除空格 \s 在歸一化 // 之后，所以 /%20/ 依舊可以繞過替換返回 //；
? ...

可見盡管有時候開發(fā)者知道要過濾什么字符，但是手指也不聽大腦使喚，導致寫出的代碼依舊漏洞百出。因此，更為科學的方案是使用知名的、經過檢驗的鑒權框架，而不是嘗試自己處理。

Shiro

Apache Shiro^[8] 是一個簡單易用的認證和鑒權管理框架，雖然本身支持許多場景，但常用于 Web 應用中的身份認證和路徑鑒權。

web.xml 的配置比較簡單，只需要引入 ShiroFilter 并將其映射到所有 URL，注意 filter-mapping 一般要在其他 Filter 之前:

<listener>
    <listener-class>org.apache.shiro.web.env.EnvironmentLoaderListener</listener-class>
</listener>

<filter>
    <filter-name>ShiroFilter</filter-name>
    <filter-class>org.apache.shiro.web.servlet.ShiroFilter</filter-class>
</filter>

<filter-mapping>
    <filter-name>ShiroFilter</filter-name>
    <url-pattern>/*</url-pattern>
    <dispatcher>REQUEST</dispatcher>
    <dispatcher>FORWARD</dispatcher>
    <dispatcher>INCLUDE</dispatcher>
    <dispatcher>ERROR</dispatcher>
    <dispatcher>ASYNC</dispatcher>
</filter-mapping>

具體的配置通過配置文件實現，默認在 WEB-INF/shiro.ini，一個示例如下:

[main]
authc.loginUrl = /login.jsp
filterChainResolver.globalFilters = null

[urls]
/login/* = anon
/admin/* = authc
/api/flag = authc

[users]
evilpan=111111

urls 中使用優(yōu)先匹配，而不是最佳匹配，左邊指定 URL 模式，右邊指定對應的 Filter 名稱。Shiro 中支持的 Filter 如下:

? anon: org.apache.shiro.web.filter.authc.AnonymousFilter
? authc: org.apache.shiro.web.filter.authc.FormAuthenticationFilter
? authcBasic: org.apache.shiro.web.filter.authc.BasicHttpAuthenticationFilter
? authcBearer: org.apache.shiro.web.filter.authc.BearerHttpAuthenticationFilter
? invalidRequest: org.apache.shiro.web.filter.InvalidRequestFilter
? logout: org.apache.shiro.web.filter.authc.LogoutFilter
? noSessionCreation: org.apache.shiro.web.filter.session.NoSessionCreationFilter
? perms: org.apache.shiro.web.filter.authz.PermissionsAuthorizationFilter
? port: org.apache.shiro.web.filter.authz.PortFilter
? rest: org.apache.shiro.web.filter.authz.HttpMethodPermissionFilter
? roles: org.apache.shiro.web.filter.authz.RolesAuthorizationFilter
? ssl: org.apache.shiro.web.filter.authz.SslFilter
? user: org.apache.shiro.web.filter.authc.UserFilter

一般常用的是 authc 表示 SESSION 認證的 Filter，anon 表示可匿名訪問即無需認證的 Filter。對于配置文件的其他字段及其解釋可以參考官方文檔，這里就不詳細介紹了。

我們這里主要關心的是 Shiro 鑒權之前對 URL 路徑做了什么樣的處理。篇幅原因這里直接說結論，處理的方法為 org.apache.shiro.web.util.WebUtils#getPathWithinApplication，調用鏈路回溯如下所示:

getPathWithinApplication:114, WebUtils (org.apache.shiro.web.util)
getPathWithinApplication:105, PathMatchingFilter (org.apache.shiro.web.filter)
pathsMatch:124, PathMatchingFilter (org.apache.shiro.web.filter)
preHandle:195, PathMatchingFilter (org.apache.shiro.web.filter)
doFilterInternal:131, AdviceFilter (org.apache.shiro.web.servlet)
doFilter:154, OncePerRequestFilter (org.apache.shiro.web.servlet)
doFilter:66, ProxiedFilterChain (org.apache.shiro.web.servlet)
executeChain:458, AbstractShiroFilter (org.apache.shiro.web.servlet)
call:373, AbstractShiroFilter$1 (org.apache.shiro.web.servlet)
doCall:90, SubjectCallable (org.apache.shiro.subject.support)
call:83, SubjectCallable (org.apache.shiro.subject.support)
execute:387, DelegatingSubject (org.apache.shiro.subject.support)
doFilterInternal:370, AbstractShiroFilter (org.apache.shiro.web.servlet)
doFilter:154, OncePerRequestFilter (org.apache.shiro.web.servlet)
internalDoFilter:178, ApplicationFilterChain (org.apache.catalina.core)

getPathWithinApplication 的代碼為:

public static String getPathWithinApplication(HttpServletRequest request) {
    return normalize(removeSemicolon(getServletPath(request) + getPathInfo(request)));
}

主要經過兩步，分別是 removeSemicolon 去除分號以及 normalize 路徑歸一化。removeSemicolon 的實現只考慮路徑末尾的分號:

private static String removeSemicolon(String uri) {
    int semicolonIndex = uri.indexOf(';');
    return (semicolonIndex != -1 ? uri.substring(0, semicolonIndex) : uri);
}

不過 normalize 的實現倒是比較完善，注釋中也說明這是從 Tomcat 中 “借鑒” 的:

Normalize operations were was happily taken from org.apache.catalina.util.RequestUtil in Tomcat trunk, r939305

值得注意的是，這里傳入的并不是原始的 URI，而是 getServletPath + getPathInfo，二者分別是:

? HttpServletRequest.getServletPath()，在 Tomcat 中對應 mappingData.wrapperPath.toString()；
? HttpServletRequest.getPathInfo()，在 Tomcat 中對應 mappingData.pathInfo.toString()；

在前面介紹 Tomcat 中路由映射的時候提到，mappingData 中的 wrapper 是在路由之后確定的，也就是說此時 wrapperPath 已經是 Web 容器處理之后的 URI，如果能尋址到 flagServlet，那么其值必然是 /api/flag，優(yōu)雅地解決了 TOCTOU 不一致的問題。

當然這也不意味著無懈可擊，否則 Shiro 也不會再加一層 removeSemicolon 和 normalize。因為 Web 容器具有多樣性，不同路由策略的碎片化同樣也可能造成 Shiro 的繞過。而現在的 getPathWithinApplication 也不是一開始就是這樣的，而是在 CVE-2020-13933 出現之后才改成這樣子，詳情可以參考: ANNOUNCE CVE-2020-13933 Apache Shiro 1.6.0 released^[9]。

在對路徑完成這一步過濾后，使用 pathMatches 進行匹配:

// org.apache.shiro.web.filter.mgt.PathMatchingFilterChainResolver#getChain
final String requestURI = getPathWithinApplication(request);
final String requestURINoTrailingSlash = removeTrailingSlash(requestURI);
for (String pathPattern : filterChainManager.getChainNames()) {
  if (pathMatches(pathPattern, requestURI)) {
    return filterChainManager.proxy(originalChain, pathPattern);
  } else {
    // 針對 Spring Web 的處理
    pathPattern = removeTrailingSlash(pathPattern);
      if (pathMatches(pathPattern, requestURINoTrailingSlash)) {
        return filterChainManager.proxy(originalChain, pathPattern);
      }
  }
}

注意這里針對 Spring 應用進行了特殊處理，因為在 Spring 中 /resource/data 和 /resource/data/ 都能路由到同一個資源，但 Shiro 的 /resource/data 無法匹配到后者，因此將路徑和 pathPattern 都刪除了末尾的 / 去進行二次匹配。這應該是針對 CVE-2021-41303 的補丁，可見即便是成熟的鑒權框架也依然會踩到 URL 鑒權的陷阱。

后記

本文介紹了使用 URL 進行鑒權的一類威脅模型，并以兩個符合標準的 Servlet 容器 Tomcat 和 Resin 為例介紹了二者的路由查找方法，根據路由查找的過程提出了一系列可能的 URL 變異方式；然后對幾個現實中的鑒權案例進行分析，包括某典型應用手搓的鑒權代碼以及成熟的鑒權方案 Shiro，其中都存在或者出現過鑒權繞過的場景，從中我們可以加深對 URL 鑒權的理解，從而寫出更加健壯和安全的代碼。

另外，本文只是介紹了 Servlet 標準應用的路由特性，而現代 Java Web 應用中更多是基于 Spring 生態(tài)的全家桶方案，Spring MVC 包攬了 Web 容器的所有路由并使用自身的尋址實現，且配合 Spring Security 也形成一套 URL 鑒權方案。這部分限于篇幅原因并未在本文中提及，后續(xù)有機會的話會另起一篇文章進行介紹。

彩蛋??: 在分析 Resin 路由的時候，其實暴露了一個 “0day”，涉及到路徑處理的核心順序錯誤，不知道讀者是否能夠發(fā)現？

LINKS

? JSR 340: Java Servlet 3.1 Specification^[10]
? Semantic Attacks - What's in a URL? (2001)^[11]
? Semantic URL attack - wikipedia^[12]
? A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages - Orange^[13]
? Uniform Resource Identifier (URI): Generic Syntax^[14]
? Spring 審計常見 Tricks - panda^[15]

引用鏈接

[1] A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages: https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf
[2] Java 安全研究初探: https://evilpan.com/2023/04/01/java-ee/
[3] HttpServletRequest.getRequestURI(): https://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getRequestURI()
[4] JSR 340: Java Servlet 3.1 Specification: https://jcp.org/en/jsr/detail?id=340
[5] 2.1.4 Alternate Data Streams: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/e2b19412-a925-4360-b009-86e3b8a020c8
[6] 5.1 NTFS Streams: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3?source=recommendations
[7] RFC3986 - Uniform Resource Identifier (URI): Generic Syntax: https://datatracker.ietf.org/doc/html/rfc3986
[8] Apache Shiro: https://shiro.apache.org/web.html
[9] ANNOUNCE CVE-2020-13933 Apache Shiro 1.6.0 released: https://lists.apache.org/thread/4f9m7274ynttzpl3z5lsl4y461v9k0k9
[10] JSR 340: Java Servlet 3.1 Specification: https://jcp.org/en/jsr/detail?id=340
[11] Semantic Attacks - What's in a URL? (2001): https://www.giac.org/paper/gsec/650/semantic-attacks-url/101497
[12] Semantic URL attack - wikipedia: https://en.wikipedia.org/wiki/Semantic_URL_attack
[13] A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages - Orange: https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf
[14] Uniform Resource Identifier (URI): Generic Syntax: https://datatracker.ietf.org/doc/html/rfc3986
[15] Spring 審計常見 Tricks - panda: https://articles.zsxq.com/id_6zww0wx2a0r5.html

淺談 URL 解析與鑒權中的陷阱

前言

威脅模型

Servlet 容器

Tomcat

CoyoAdaptor

Mapper

Servlet 標準

Wrappers

decodeURI

Bypass Tricks

Resin

Invocation

normalizeUri

ServletMapper

Bypass Tricks

小結

文件系統(tǒng)

URI 標準

鑒權案例

某 OA

Shiro

后記

LINKS

引用鏈接