簡單sql字段解析器實現(xiàn)參考
用例:有一段sql語句,我們需要從中截取出所有字段部分,以便進行后續(xù)的類型推斷或者別名字段抽取定義,請給出此解析方法。
想來很簡單吧,因為 sql 中的字段列表,使用方式有限,比如 a as b, a, a b...
1. 解題思路
如果不想做復雜處理,最容易想到的,就是直接用某個特征做分割即可。比如,先截取出 字段列表部分,然后再用逗號',' 分割,就可以得到一個個的字段了。然后再要細分,其實只需要用 as 進行分割就可以了。
看起來好像可行,但是存在許多漏洞,首先,這里面有太多的假設(shè):各種截取部分要求必須符合要求,必須沒有多余的逗號,必須要有as 等等。這明顯不符合要求了。
其二,我們可以換一種轉(zhuǎn)換方式。比如先截取到field部分,然后先以 as 分割,再以逗號分割,然后取最后一個詞作為field。
看起來好像更差了,截取到哪里已經(jīng)完全不知道了。即原文已經(jīng)被破壞殆盡,而且同樣要求要有 as 轉(zhuǎn)換標簽,而且對于函數(shù)覬覦有 as 的場景,就完全錯誤了。
其三,最好還是自行一個個單詞地解析,field 字段無外乎幾種情況,1. 普通字段如 select a; 2. 帶as的普通字段如 select a as b; 3. 帶函數(shù)的字段如 select coalesce(a, b); 4. 帶函數(shù)且?guī)s的字段如 select coalesce(a, b) ab; 5. 函數(shù)內(nèi)帶as的字段如 select cast(a as string) b; ... 我們只需依次枚舉對應的情況,就可以將字段解析出來了。
看起來是個不錯的想法。但是具體實現(xiàn)如何?
2. 具體解析實現(xiàn)
主要分兩個部分,1. 需要定義一個解析后的結(jié)果數(shù)據(jù)結(jié)構(gòu),以便清晰描述字段信息;2. 分詞解析sql并以結(jié)構(gòu)體返回;
我們先來看看整個算法核心:
/*** 功能描述: 簡單sql字段解析器** 樣例如1:* select COALESCE(t1.xno, t2.xno, t3.xno) as xno,* case when t1.no is not null then 1 else null end as xxk001,* case when t2.no is not null then 1 else null end as xxk200,* case when t3.xno is not null then 1 else null end as xx3200* from xxk001 t1* full join xxkj100 t2 on t1.xno = t2.xno* full join xxkj200 t3 on t1.xno = t3.xno;** 樣例如2:* select cast(a as string) as b from ccc;** 樣例如3:* with a as(select cus,x1 from b1), b as (select cus,x2 from b2)* select a.cus as a_cus from a join b on a.cus=b.cus where xxx;** 樣例如4:* select a.xno,b.xx from a_tb as a join b_tb as b on a.id = b.id** 樣例如5:* select cast \t(a as string) a_str, cc (a as double) a_double from x**/public class SimpleSqlFieldParser {/*** 解析一段次標簽sql 中的字段列表** @param sql 原始sql, 需如 select xx from xxx join ... 格式* @return 字段列表*/public static List<SelectFieldClauseDescriptor> parse(String sql) {String columnPart = adaptFieldPartSql(sql);int deep = 0;List<StringBuilder> fieldTokenSwap = new ArrayList<>();StringBuilder currentTokenBuilder = new StringBuilder();List<SelectFieldClauseDescriptor> fieldList = new ArrayList<>();fieldTokenSwap.add(currentTokenBuilder);int len = columnPart.length();char[] columnPartChars = columnPart.toCharArray();for(int i = 0; i < len; i++) {// 空格忽略,換行忽略,tab忽略// 字符串相接// 左(號入棧,++deep;// 右)號出棧,--deep;// deep>0 忽略所有其他直接拼接// as 則取下一個值為fieldName// case 則直接取到end為止;//,號則重置token,構(gòu)建結(jié)果集char currentChar = columnPartChars[i];switch (currentChar) {case '(':++deep;currentTokenBuilder.append(currentChar);break;case ')':--deep;currentTokenBuilder.append(currentChar);break;case ',':if(deep == 0) {addNewField(fieldList, fieldTokenSwap, true);fieldTokenSwap = new ArrayList<>();currentTokenBuilder = new StringBuilder();fieldTokenSwap.add(currentTokenBuilder);break;}currentTokenBuilder.append(currentChar);break;case ' ':case '\t':case '\r':case '\n':if(deep > 0) {currentTokenBuilder.append(currentChar);continue;}if(currentTokenBuilder.length() == 0) {continue;}// original_name as --> aliasif(i + 1 < len) {int j = i + 1;// 收集連續(xù)的空格StringBuilder spaceHolder = new StringBuilder();boolean isNextLeftBracket = false;do {char nextChar = columnPart.charAt(j++);if(nextChar == ' ' || nextChar == '\t'|| nextChar == '\r' || nextChar == '\n') {spaceHolder.append(nextChar);continue;}if(nextChar == '(') {isNextLeftBracket = true;}break;} while (j < len);if(isNextLeftBracket) {currentTokenBuilder.append(currentChar);}if(spaceHolder.length() > 0) {currentTokenBuilder.append(spaceHolder);i += spaceHolder.length();}if(isNextLeftBracket) {// continue next for, function begincontinue;}}if(fieldTokenSwap.size() == 1) {if(fieldTokenSwap.get(0).toString().equalsIgnoreCase("case")) {String caseWhenPart = CommonUtil.readSplitWord(columnPartChars, i, " ", "end");currentTokenBuilder.append(caseWhenPart);if(caseWhenPart.length() <= 0) {throw new BizException("語法錯誤,未找到case..when的結(jié)束符");}i += caseWhenPart.length();}}addNewField(fieldList, fieldTokenSwap, false);currentTokenBuilder = new StringBuilder();fieldTokenSwap.add(currentTokenBuilder);break;// 空格忽略default:currentTokenBuilder.append(currentChar);break;}}// 處理剩余尚未存儲的字段信息addNewField(fieldList, fieldTokenSwap, true);return fieldList;}/*** 新增一個字段描述** @param fieldList 字段容器* @param fieldTokenSwap 候選詞*/private static void addNewField(List<SelectFieldClauseDescriptor> fieldList,List<StringBuilder> fieldTokenSwap,boolean forceAdd) {int ts = fieldTokenSwap.size();if(ts == 1 && forceAdd) {// db.original_name,String fieldName = fieldTokenSwap.get(0).toString();String alias = fieldName;if(fieldName.contains(".")) {alias = fieldName.substring(fieldName.lastIndexOf('.') + 1);}fieldList.add(new SelectFieldClauseDescriptor(fieldName, alias));return;}if(ts < 2) {return;}if(ts == 2) {// original_name alias,if(fieldTokenSwap.get(1).toString().equalsIgnoreCase("as")) {return;}fieldList.add(new SelectFieldClauseDescriptor(fieldTokenSwap.get(0).toString(),fieldTokenSwap.get(1).toString()));}else if(ts == 3) {// original_name as alias,fieldList.add(new SelectFieldClauseDescriptor(fieldTokenSwap.get(0).toString(),fieldTokenSwap.get(2).toString()));}else {throw new BizException("字段語法解析錯誤,超過3個以字段描述信息:" + ts);}}// 截取適配 field 字段信息部分private static String adaptFieldPartSql(String fullSql) {int start = fullSql.lastIndexOf("select ");int end = fullSql.lastIndexOf(" from");String columnPart = fullSql.substring(start + "select ".length(), end);return columnPart.trim();}}
應該說是比較簡單的,一個for, 一個 switch ,就搞定了。其他的,更多的是邏輯判定。
下面我們來看看字段描述類的寫法,其實就是兩個字段,源字段和別名。
/*** 功能描述: sql字段描述 select 字段描述類**/public class SelectFieldClauseDescriptor {private String fieldName;private String alias;public SelectFieldClauseDescriptor(String fieldName, String alias) {this.fieldName = fieldName;this.alias = alias;}public String getFieldName() {return fieldName;}public String getAlias() {return alias;}@Overridepublic boolean equals(Object o) {if (this == o) return true;if (o == null || getClass() != o.getClass()) return false;SelectFieldClauseDescriptor that = (SelectFieldClauseDescriptor) o;return Objects.equals(fieldName, that.fieldName) &&Objects.equals(alias, that.alias);}@Overridepublic int hashCode() {return Objects.hash(fieldName, alias);}@Overridepublic String toString() {return "SelectFieldClauseDescriptor{" +"fieldName='" + fieldName + '\'' +", alias='" + alias + '\'' +'}';}}
它存在的意義,僅僅是為了使用方更方便取值,以為更進一步的解析提供了依據(jù)。
3. 單元測試
其實像寫這種工具類,單元測試最是方便簡單。因為最初的結(jié)果,我們早已預料,以測試驅(qū)動開發(fā)最合適不過了。而且,基本上一出現(xiàn)不符合預期的值時,很快速就定位問題了。
/*** 功能描述: sql字段解析器測試**/public class SimpleSqlFieldParserTest {@Testpublic void testParse() {String sql;List<SelectFieldClauseDescriptor> parsedFieldList;sql = "select COALESCE(t1.xno, t2.xno, t3.xno) as xno,\n" +" case when t1.xno is not null then 1 else null end as xxk001,\n" +" case when t2.xno is not null then 1 else null end as xxk200,\n" +" case when t3.xno is not null then 1 else null end as xx3200\n" +" from xxk001 t1\n" +" full join xxkj100 t2 on t1.xno = t2.xno\n" +" full join xxkj200 t3 on t1.xno = t3.xno;";parsedFieldList = SimpleSqlFieldParser.parse(sql);System.out.println("result:");parsedFieldList.forEach(System.out::println);Assert.assertEquals("字段個數(shù)解析不正確",4, parsedFieldList.size());Assert.assertEquals("字段別名解析不正確","xno", parsedFieldList.get(0).getAlias());Assert.assertEquals("字段別名解析不正確","xx3200", parsedFieldList.get(3).getAlias());sql = "select cast(a as string) as b from ccc;";parsedFieldList = SimpleSqlFieldParser.parse(sql);System.out.println("result:");parsedFieldList.forEach(System.out::println);Assert.assertEquals("字段個數(shù)解析不正確",1, parsedFieldList.size());Assert.assertEquals("字段別名解析不正確","b", parsedFieldList.get(0).getAlias());sql = "with a as(select cus,x1 from b1), b as (select cus,x2 from b2)\n" +" select a.cus as a_cus, cast(a \nas string) as a_cus2, " +"b.x2 b2 from a join b on a.cus=b.cus where xxx;";parsedFieldList = SimpleSqlFieldParser.parse(sql);System.out.println("result:");parsedFieldList.forEach(System.out::println);Assert.assertEquals("字段個數(shù)解析不正確",3, parsedFieldList.size());Assert.assertEquals("字段別名解析不正確","a_cus", parsedFieldList.get(0).getAlias());Assert.assertEquals("字段別名解析不正確","b2", parsedFieldList.get(2).getAlias());sql = "select a.xno,b.xx,qqq from a_tb as a join b_tb as b on a.id = b.id";parsedFieldList = SimpleSqlFieldParser.parse(sql);System.out.println("result:");parsedFieldList.forEach(System.out::println);Assert.assertEquals("字段個數(shù)解析不正確",3, parsedFieldList.size());Assert.assertEquals("字段別名解析不正確","xno", parsedFieldList.get(0).getAlias());Assert.assertEquals("字段別名解析不正確","qqq", parsedFieldList.get(2).getAlias());sql = "select cast (a.a_int as string) a_str, b.xx, coalesce \n( a, b, c) qqq from a_tb as a join b_tb as b on a.id = b.id";parsedFieldList = SimpleSqlFieldParser.parse(sql);System.out.println("result:");parsedFieldList.forEach(System.out::println);Assert.assertEquals("字段個數(shù)解析不正確",3, parsedFieldList.size());Assert.assertEquals("字段別名解析不正確","a_str", parsedFieldList.get(0).getAlias());Assert.assertEquals("字段原始名解析不正確","cast (a.a_int as string)", parsedFieldList.get(0).getFieldName());Assert.assertEquals("字段別名解析不正確","qqq", parsedFieldList.get(2).getAlias());Assert.assertEquals("字段原始名解析不正確","coalesce \n( a, b, c)", parsedFieldList.get(2).getFieldName());}}
至此,一個簡單的字段解析器完成。小工具,供參考!

騰訊、阿里、滴滴后臺面試題匯總總結(jié) — (含答案)
面試:史上最全多線程面試題 !
最新阿里內(nèi)推Java后端面試題
JVM難學?那是因為你沒認真看完這篇文章

關(guān)注作者微信公眾號 —《JAVA爛豬皮》
了解更多java后端架構(gòu)知識以及最新面試寶典


看完本文記得給作者點贊+在看哦~~~大家的支持,是作者源源不斷出文的動力
作者:等你歸去來
出處:https://www.cnblogs.com/yougewe/p/14911443.html
