教你使用TensorFlow2對(duì)識(shí)別驗(yàn)證碼
「@Author:Runsen」
驗(yàn)證碼是根據(jù)隨機(jī)字符生成一幅圖片,然后在圖片中加入干擾象素,用戶必須手動(dòng)填入,防止有人利用機(jī)器人自動(dòng)批量注冊(cè)、灌水、發(fā)垃圾廣告等等 。
數(shù)據(jù)集來源:https://www.kaggle.com/fournierp/captcha-version-2-images
圖片是5個(gè)字母的單詞,可以包含數(shù)字。這些圖像應(yīng)用了噪聲(模糊和一條線)。它們是200 x 50 PNG。我們的任務(wù)是嘗試制作光學(xué)字符識(shí)別算法的模型。

在數(shù)據(jù)集中存在的驗(yàn)證碼png圖片,對(duì)應(yīng)的標(biāo)簽就是圖片的名字。
import?os
import?numpy?as?np
import?pandas?as?pd
import?cv2
import?matplotlib.pyplot?as?plt
import?seaborn?as?sns
#?imgaug?圖片數(shù)據(jù)增強(qiáng)
import?imgaug.augmenters?as?iaa
import?tensorflow?as?tf
#?Conv2D?MaxPooling2D?Dropout?Flatten?Dense?BN??GAP
from?tensorflow.keras.layers?import?Conv2D,?MaxPooling2D,?Dropout,?Flatten,?Dense,?Layer,?BatchNormalization,?GlobalAveragePooling2D?
from?tensorflow.keras.optimizers?import?Adam
from?tensorflow.keras?import?Model,?Input?
from?tensorflow.keras.callbacks?import?EarlyStopping,?ReduceLROnPlateau
#?圖片處理器
from?tensorflow.keras.preprocessing.image?import?ImageDataGenerator
import?plotly.express?as?px
import?plotly.graph_objects?as?go
import?plotly.offline?as?pyo
pyo.init_notebook_mode()
對(duì)數(shù)據(jù)進(jìn)行一個(gè)簡(jiǎn)單的分析,統(tǒng)計(jì)圖像中大約出現(xiàn)了什么樣的符號(hào)。
#?數(shù)據(jù)路徑
DIR?=?'../input/captcha-version-2-images/samples/samples'
#?存儲(chǔ)驗(yàn)證碼的標(biāo)簽
captcha_list?=?[]
characters?=?{}
for?captcha?in?os.listdir(DIR):
????captcha_list.append(captcha)
????#?每張驗(yàn)證碼的captcha_code
????captcha_code?=?captcha.split(".")[0]
????for?i?in?captcha_code:
????????#?遍歷captcha_code?
????????characters[i]?=?characters.get(i,?0)?+1
symbols?=?list(characters.keys())
len_symbols?=?len(symbols)
print(f'圖像中只使用了{len_symbols}符號(hào)')
plt.bar(*zip(*characters.items()))
plt.title('Frequency?of?symbols')
plt.show()

如何提取圖像的數(shù)據(jù)建立X,y??
#?如何提取圖像?建立?model??X?的shape??1070?*?50?*?200?*?1?
#?y的shape?5?*?1070?*?19
?
for?i,?captcha?in?enumerate(captcha_list):
????captcha_code?=?captcha.split('.')[0]
????#?cv2.IMREAD_GRAYSCALE?灰度圖
????captcha_cv2?=?cv2.imread(os.path.join(DIR,?captcha),cv2.IMREAD_GRAYSCALE)
????#?縮放
????captcha_cv2?=?captcha_cv2?/?255.0
????#?print(captcha_cv2.shape)?(50,?200)?
????#?將captcha_cv2的(50,?200)?切換成(50,?200,?1)
????captcha_cv2?=?np.reshape(captcha_cv2,?img_shape)
????#?(5,19)
????targs?=?np.zeros((len_captcha,?len_symbols))
????
????for?a,?b?in?enumerate(captcha_code):
????????targs[a,?symbols.index(b)]?=?1
????X[i]?=?captcha_cv2
????y[:,?i]?=?targs
print("shape?of?X:",?X.shape)
print("shape?of?y:",?y.shape)
輸出如下
print("shape?of?X:",?X.shape)
print("shape?of?y:",?y.shape)
通過Numpy中random 隨機(jī)選擇數(shù)據(jù),劃分訓(xùn)練集和測(cè)試集
#?生成隨機(jī)數(shù)
from?numpy.random?import?default_rng
rng?=?default_rng(seed=1)
test_numbers?=?rng.choice(1070,?size=int(1070*0.3),?replace=False)
X_test?=?X[test_numbers]
X_full?=?np.delete(X,?test_numbers,0)
y_test?=?y[:,test_numbers]
y_full?=?np.delete(y,?test_numbers,1)
val_numbers?=?rng.choice(int(1070*0.7),?size=int(1070*0.3),?replace=False)
X_val?=?X_full[val_numbers]
X_train?=?np.delete(X_full,?val_numbers,0)
y_val?=?y_full[:,val_numbers]
y_train?=?np.delete(y_full,?val_numbers,1)
在此驗(yàn)證碼數(shù)據(jù)中,容易出現(xiàn)過擬合的現(xiàn)象,你可能會(huì)想到添加更多的新數(shù)據(jù)、 添加正則項(xiàng)等, 但這里使用數(shù)據(jù)增強(qiáng)的方法,特別是對(duì)于機(jī)器視覺的任務(wù),數(shù)據(jù)增強(qiáng)技術(shù)尤為重要。
常用的數(shù)據(jù)增強(qiáng)操作:imgaug庫。imgaug是提供了各種圖像增強(qiáng)操作的python庫 https://github.com/aleju/imgaug。
imgaug幾乎包含了所有主流的數(shù)據(jù)增強(qiáng)的圖像處理操作, 增強(qiáng)方法詳見github
#?Sequential(C,?R)??尺寸增加了5倍,
#?選取一系列子增強(qiáng)器C作用于每張圖片的位置,第二個(gè)參數(shù)表示是否對(duì)每個(gè)batch的圖片應(yīng)用不同順序的Augmenter?list?????#?rotate=(-8,?8)??旋轉(zhuǎn)
# iaa.CropAndPad ?截取(crop)或者填充(pad),填充時(shí),被填充區(qū)域?yàn)楹谏?/span>
# px:?想要crop(negative values)的或者pad(positive values)的像素點(diǎn)。
#?(top,?right,?bottom,?left)
#?當(dāng)pad_mode=constant的時(shí)候選擇填充的值
aug?=iaa.Sequential([iaa.CropAndPad(
????px=((0,?10),?(0,?35),?(0,?10),?(0,?35)),
????pad_mode=['edge'],
????pad_cval=1
),iaa.Rotate(rotate=(-8,8))])
X_aug_train?=?None
y_aug_train?=?y_train
for?i?in?range(40):
????X_aug?=?aug(images?=?X_train)
????if?X_aug_train?is?not?None:
????????X_aug_train?=?np.concatenate([X_aug_train,?X_aug],?axis?=?0)
????????y_aug_train?=?np.concatenate([y_aug_train,?y_train],?axis?=?1)
????else:
????????X_aug_train?=?X_aug
讓我們看看一些數(shù)據(jù)增強(qiáng)的訓(xùn)練圖像。
fig,?ax?=?plt.subplots(nrows=2,?ncols?=5,?figsize?=?(16,16))
for?i?in?range(10):
????index?=?np.random.randint(X_aug_train.shape[0])
????ax[i//5][i%5].imshow(X_aug_train[index],cmap='gray')


這次使用函數(shù)式API創(chuàng)建模型,函數(shù)式API是創(chuàng)建模型的另一種方式,它具有更多的靈活性,包括創(chuàng)建更為復(fù)雜的模型。
需要定義inputs和outputs
#函數(shù)式API模型創(chuàng)建
captcha?=?Input(shape=(50,200,channels))
x?=?Conv2D(32,?(5,5),padding='valid',activation='relu')(captcha)
x?=?MaxPooling2D((2,2),padding='same')(x)
x?=?Conv2D(64,?(3,3),padding='same',activation='relu')(x)
x?=?MaxPooling2D((2,2),padding='same')(x)
x?=?Conv2D(128,?(3,3),padding='same',activation='relu')(x)
maxpool?=?MaxPooling2D((2,2),padding='same')(x)
outputs?=?[]
for?i?in?range(5):
????x?=?Conv2D(256,?(3,3),padding='same',activation='relu')(maxpool)
????x?=?MaxPooling2D((2,2),padding='same')(x)
????x?=?Flatten()(x)
????x?=?Dropout(0.5)(x)
????x?=?BatchNormalization()(x)
????x?=?Dense(64,?activation='relu')(x)
????x?=?Dropout(0.5)(x)
????x?=?BatchNormalization()(x)
????x?=?Dense(len_symbols?,?activation='softmax'?,?name=f'char_{i+1}')(x)
????outputs.append(x)
????
model?=?Model(inputs?=?captcha?,?outputs=outputs)
#?ReduceLROnPlateau更新學(xué)習(xí)率
reduce_lr?=?ReduceLROnPlateau(patience?=3,?factor?=?0.5,verbose?=?1)
model.compile(loss='categorical_crossentropy',?optimizer=Adam(learning_rate=0.0005),?metrics=["accuracy"])
# EarlyStopping用于提前停止訓(xùn)練的callbacks。具體地,可以達(dá)到當(dāng)訓(xùn)練集上的loss不在減小
earlystopping?=?EarlyStopping(monitor?="val_loss",??
?????????????????????????????mode?="min",?patience?=?10,
??????????????????????????????min_delta?=?1e-4,
?????????????????????????????restore_best_weights?=?True)?
history?=?model.fit(X_train,?[y_train[i]?for?i?in?range(5)],?batch_size=32,?epochs=30,?verbose=1,?validation_data?=?(X_val,?[y_val[i]?for?i?in?range(5)]),?callbacks?=[earlystopping,reduce_lr])



下面對(duì)model進(jìn)行一個(gè)測(cè)試和評(píng)估。
score?=?model.evaluate(X_test,[y_test[0],?y_test[1],?y_test[2],?y_test[3],?y_test[4]],verbose=1)
metrics?=?['loss','char_1_loss',?'char_2_loss',?'char_3_loss',?'char_4_loss',?'char_5_loss',?'char_1_acc',?'char_2_acc',?'char_3_acc',?'char_4_acc',?'char_5_acc']
for?i,j?in?zip(metrics,?score):
????print(f'{i}:?{j}')
具體輸出如下:
11/11?[==============================]?-?0s?11ms/step?-?loss:?0.7246?-?char_1_loss:?0.0682?-?char_2_loss:?0.1066?-?char_3_loss:?0.2730?-?char_4_loss:?0.2636?-?char_5_loss:?0.0132?-?char_1_accuracy:?0.9844?-?char_2_accuracy:?0.9657?-?char_3_accuracy:?0.9408?-?char_4_accuracy:?0.9626?-?char_5_accuracy:?0.9938
loss:?0.7246273756027222
char_1_loss:?0.06818050146102905
char_2_loss:?0.10664034634828568
char_3_loss:?0.27299806475639343
char_4_loss:?0.26359987258911133
char_5_loss:?0.013208594173192978
char_1_acc:?0.9844236969947815
char_2_acc:?0.9657320976257324
char_3_acc:?0.940809965133667
char_4_acc:?0.9626168012619019
char_5_acc:?0.9937694668769836
字母1到字母5的精確值都大于
繪制loss和score
metrics_df?=?pd.DataFrame(history.history)
columns?=?[col?for?col?in?metrics_df.columns?if?'loss'?in?col?and?len(col)>8]
fig?=?px.line(metrics_df,?y?=?columns)
fig.show()

plt.figure(figsize=(15,8))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model?loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train',?'val'],?loc='upper?right',prop={'size':?10})
plt.show()

#?預(yù)測(cè)數(shù)據(jù)
def?predict(captcha):
????captcha?=?np.reshape(captcha?,?(1,?50,200,channels))
????result?=?model.predict(captcha)
????result?=?np.reshape(result?,(5,len_symbols))
????#?取出最大預(yù)測(cè)中的輸出
????label?=?''.join([symbols[np.argmax(i)]?for?i?in?result])
????return?label
????
predict(X_test[2])
#?25277
下面預(yù)測(cè)所有的數(shù)據(jù)
actual_pred?=?[]
for?i?in?range(X_test.shape[0]):
????actual?=?''.join([symbols[i]?for?i?in?(np.argmax(y_test[:,?i],axis=1))])
????pred?=??predict(X_test[i])
????actual_pred.append((actual,?pred))
print(actal_pred[:10])
輸出如下:
[('n4b4m',?'n4b4m'),?('42nxy',?'42nxy'),?('25257',?'25277'),?('cewnm',?'cewnm'),?('w46ep',?'w46ep'),?('cdcb3',?'edcb3'),?('8gf7n',?'8gf7n'),?('nny5e',?'nny5e'),?('gm2c2',?'gm2c2'),?('g7fmc',?'g7fmc')]
sameCount?=?0
diffCount?=?0
letterDiff?=?{i:0?for?i?in?range(5)}
incorrectness?=?{i:0?for?i?in?range(1,6)}
for?real,?pred?in?actual_pred:
????#?預(yù)測(cè)和輸出相同
????if?real?==?pred:
????????sameCount?+=?1
????else:
????????#?失敗
????????diffCount?+=?1
????????#?遍歷
????????incorrectnessPoint?=?0
????????for?i?in?range(5):
????????????if?real[i]?!=?pred[i]:
????????????????letterDiff[i]?+=?1
????????????????incorrectnessPoint?+=?1
????????incorrectness[incorrectnessPoint]?+=?1
x?=?['True?predicted',?'False?predicted']
y?=?[sameCount,?diffCount]
fig?=?go.Figure(data=[go.Bar(x?=?x,?y?=?y)])
fig.show()
在預(yù)測(cè)數(shù)據(jù)中,一共有287個(gè)數(shù)據(jù)預(yù)測(cè)正確。
在這里,我們可以看到出現(xiàn)錯(cuò)誤到底是哪一個(gè)index。
x1?=?["Character?"?+?str(x)?for?x?in?range(1,?6)]
????
fig?=?go.Figure(data=[go.Bar(x?=?x1,?y?=?list(letterDiff.values()))])
fig.show()

為了計(jì)算每個(gè)單詞的錯(cuò)誤數(shù),繪制相關(guān)的條形圖。
x2?=?[str(x)?+?"?incorrect"?for?x?in?incorrectness.keys()]
y2?=?list(incorrectness.values())
fig?=?go.Figure(data=[go.Bar(x?=?x2,?y?=?y2)])
fig.show()

下面繪制錯(cuò)誤的驗(yàn)證碼圖像,并標(biāo)準(zhǔn)正確和錯(cuò)誤的區(qū)別。
fig,?ax?=?plt.subplots(nrows?=?8,?ncols=4,figsize?=?(16,20))
count?=?0
for?i,?(actual?,?pred)?in?enumerate(actual_pred):
????if?actual?!=?pred:
????????img?=?X_test[i]
????????try:
????????????ax[count//4][count%4].imshow(img,?cmap?=?'gray')
????????????ax[count//4][count%4].title.set_text(pred?+?'?-?'?+?actual)
????????????count?+=?1
????????except:
????????????pass



