我的第一個 Python 小項目,開放完整代碼
你好,我是zhenguo
這是4月29日,我發(fā)布的第一個Python小項目,文本句子基于關(guān)鍵詞的KWIC顯示,沒看到粉絲朋友可以看看下面介紹,知道的,直接跳到文章的求解分析和代碼部分。
把所學(xué)知識應(yīng)用于實際問題,才能真正加深對它的認(rèn)識和理解,這就是實踐出真知。從此最基本點出發(fā),我設(shè)計了一個小項目,蠻有意思,也有一定實際應(yīng)用價值。
此小項目我會同步在github庫 python-small-examples中,目前近6100個star,歡迎提交pull request,有機會成為此庫的第13位貢獻(xiàn)者。
https://github.com/jackzhenguo/python-small-examples
進(jìn)行中Python小項目
上下文關(guān)鍵字(KWIC, Key Word In Context)是最常見的多行協(xié)調(diào)顯示格式。
此小項目描述:輸入一系列句子,給定一個給定單詞,每個句子中至少會出現(xiàn)一次給定單詞。目標(biāo)輸出,給定單詞按照KWIC顯示,KWIC顯示的基本要求:待查詢單詞居中,前面pre序列右對齊,后面post序列左對齊,待查詢單詞前和后長度相等,若輸入句子無法滿足要求,用空格填充。
輸入?yún)?shù):輸入句子sentences, 待查詢單詞selword, 滑動窗口長度window_len
舉例,輸入如下六個句子,給定單詞secure,輸出如下字符串:
pre keyword post
welfare , and secure the blessings of
nations , and secured immortal glory with
, and shall secure to you the
cherished . To secure us against these
defense as to secure our cities and
I can to secure economy and fidelity
請補充實現(xiàn)下面函數(shù):
def kwic(sentences: List[str], selword: str, window_len: int) -> str:
"""
:type: sentences: input sentences
:type: selword: selected word
:type: window_len: window length
"""
更多KWIC顯示參考如下:
http://dep.chs.nihon-u.ac.jp/english_lang/tukamoto/kwic_e.html
此項目的完整代碼和分析已發(fā)布在我創(chuàng)建的 Python中文網(wǎng) http://zglg.work 中,歡迎點擊文章最下的閱讀原文,直達(dá)網(wǎng)頁。
以下代碼都經(jīng)過測試,完整可運行。
# encoding: utf-8
"""
@file: kwic_service.py
@desc: providing functions about KWIC presentation
@author: group3
@time: 5/9/2021
"""
import re
from typing import List
獲取關(guān)鍵詞sel_word的窗口,默認(rèn)窗口長度為5
def get_keyword_window(sel_word: str, words_of_sentence: List, length=5) -> List[str]:
"""
find the index of sel_word at sentence, then decide words of @length size
by backward and forward of it.
For example: I am very happy to this course of psd if sel_word is happy, then
returning: [am, very, happy, to, this]
if length is even, then returning [very, happy, to, this]
remember: sel_word being word root
"""
if length <= 0 or len(words_of_sentence) <= length:
return words_of_sentence
index = -1
for iw, word in enumerate(words_of_sentence):
word = word.lower()
if len(re.findall(sel_word.lower(), word)) > 0:
index = iw
break
if index == -1:
# log.warning("warning: cannot find %s in sentence: %s" % (sel_word, words_of_sentence))
return words_of_sentence
# backward is not enough
if index < length // 2:
back_slice = words_of_sentence[:index]
# forward is also not enough,
# showing the sentence is too short compared to length parameter
if (length - index) >= len(words_of_sentence):
return words_of_sentence
else:
return back_slice + words_of_sentence[index: index + length - len(back_slice)]
# forward is not enough
if (index + length // 2) >= len(words_of_sentence):
forward_slice = words_of_sentence[index:len(words_of_sentence)]
# backward is also not enough,
# showing the sentence is too short compared to length parameter
if index - length <= 0:
return words_of_sentence
else:
return words_of_sentence[index - (length - len(forward_slice)):index] + forward_slice
return words_of_sentence[index - length // 2: index + length // 2 + 1] if length % 2 \
else words_of_sentence[index - length // 2 + 1: index + length // 2 + 1]
KWIC顯示邏輯,我放在另外一個方法中,鑒于代碼長度,放在這里文章顯示太長了,所以完整代碼全部歸檔到這里:
http://www.zglg.work/Python-20-topics/python-project1-kwic/
測試代碼
# encoding: utf-8
"""
@file: test_kwic_show.py
@desc:
@author: group3
@time: 5/3/2021
"""
from src.feature.kwic import kwic_show
if __name__ == '__main__':
words = ['I', 'am', 'very', 'happy', 'to', 'this', 'course', 'of', 'psd']
print(kwic_show('English', words, 'I', window_size=1)[0])
print(kwic_show('English', words, 'I', window_size=5)[0])
print(kwic_show('English', words, 'very', token_space_param=5)[0])
print(kwic_show('English', words, 'very', window_size=6, token_space_param=5)[0])
print(kwic_show('English', words, 'very', window_size=1, token_space_param=5)[0])
# test boundary
print(kwic_show('English', words, 'stem', align_param=20)[0])
print(kwic_show('English', words, 'stem', align_param=100)[0])
print(kwic_show('English', words, 'II', window_size=1)[0])
print(kwic_show('English', words, 'related', window_size=10000)[0])
打印結(jié)果
I
I am very happy to
I am very happy to this course of psd
I am very happy to this
very
None
None
None
None我正在做一個關(guān)于KWIC顯示的web工具,目前還在自測中,先給大家看一下顯示效果,后面部署完成后,開放給大家去體驗:

點擊下方 閱讀原文,查看所有完整代碼
