計算機視覺與機器學(xué)習(xí)等領(lǐng)域不平衡數(shù)據(jù)處理綜述
點擊上方“AI算法與圖像處理”,選擇加"星標(biāo)"或“置頂”
重磅干貨,第一時間送達
目錄
機器學(xué)習(xí)——不平衡數(shù)據(jù)(上采樣和下采樣) 計算機視覺——不平衡數(shù)據(jù)(圖像數(shù)據(jù)增強) NLP——不平衡數(shù)據(jù)(Google交易和分類權(quán)重)
1. 機器學(xué)習(xí)——不平衡數(shù)據(jù)
https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/
https://github.com/NandhiniN85/Class-Imbalancing
#import imblearn library
from imblearn.over_sampling import SMOTENC
oversample = SMOTENC(categorical_features=[0,1,2,3,4,9,10], random_state = 100)
X, y = oversample.fit_resample(X, y)
from sklearn.utils import resample
maxcount = 332
train_nonnull_resampled = train_nonnull[0:0]
for grp in train_nonnull['Loan_Status'].unique():
GrpDF = train_nonnull[train_nonnull['Loan_Status'] == grp]
resampled = resample(GrpDF, replace=True, n_samples=int(maxcount), random_state=123)
train_nonnull_resampled = train_nonnull_resampled.append(resampled)


from imblearn.under_sampling import TomekLinks
undersample = TomekLinks()
X, y = undersample.fit_resample(X, y)
2. 計算機視覺——不平衡數(shù)據(jù)
https://datahack.analyticsvidhya.com/contest/janatahack-computer-vision-hackathon/#ProblemStatement

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
img = load_img('images/0.jpg')
x = img_to_array(img)
x = x.reshape((1,) + x.shape)
print(x.shape)
# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in datagen.flow(x, batch_size=1,
save_to_dir='preview', save_prefix='vehichle', save_format='jpeg'):
i += 1
if i > 19:
break # otherwise the generator would loop indefinitely
https://github.com/NandhiniN85/Class-Imbalancing/blob/main/Computer%20Vision%20-%20Data%20Imbalanced.ipynb
https://keras.io/api/preprocessing/image/
3. NLP——不平衡數(shù)據(jù)
https://github.com/NandhiniN85/Class-Imbalancing/blob/main/NLP%20-%20Class%20Imbalanced.ipynb
from googletrans import Translatortranslator = Translator()
def German_translation(x): print(x) german_translation = translator.translate(x, dest='de') return german_translation.text
def English_translation(x): print(x)
english_translation = translator.translate(x, dest='en') return english_translation.text
x = German_translation("warning for using windows disk space")
English_translation(x)
import numpy as np
from tensorflow import keras
from sklearn.utils.class_weight import compute_class_weight
y_integers = np.argmax(raw_y_train, axis=1)
class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers)
d_class_weights = dict(enumerate(class_weights))
history = model.fit(input_final, raw_y_train, batch_size=32, class_weight = d_class_weights, epochs=8,callbacks=[checkpoint,reduceLoss],validation_data =(val_final, raw_y_val), verbose=1)
# fit the training dataset on the classifier
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto', class_weight='balanced', random_state=100)
https://github.com/NandhiniN85/Class-Imbalancing/blob/main/English_raw200_model_execution.ipynb
結(jié)論
個人微信(如果沒有備注不拉群!) 請注明:地區(qū)+學(xué)校/企業(yè)+研究方向+昵稱
下載1:何愷明頂會分享
在「AI算法與圖像處理」公眾號后臺回復(fù):何愷明,即可下載。總共有6份PDF,涉及 ResNet、Mask RCNN等經(jīng)典工作的總結(jié)分析
下載2:終身受益的編程指南:Google編程風(fēng)格指南
在「AI算法與圖像處理」公眾號后臺回復(fù):c++,即可下載。歷經(jīng)十年考驗,最權(quán)威的編程規(guī)范!
下載3 CVPR2021 在「AI算法與圖像處理」公眾號后臺回復(fù):CVPR,即可下載1467篇CVPR 2020論文 和 CVPR 2021 最新論文
點亮
,告訴大家你也在看
評論
圖片
表情
