前言

?
Hello！小伙伴！
非常感謝您閱讀海轟的文章，倘若文中有錯誤的地方，歡迎您指出～
自我介紹 「?(?ˊ?ˋ)?」
昵稱：海轟
標簽：程序猿｜C++選手｜學生
簡介：因C語言結(jié)識編程，隨后轉(zhuǎn)入計算機專業(yè)，有幸拿過一些國獎、省獎...已保研。目前正在學習C++/Linux/Python
學習經(jīng)驗：扎實基礎 + 多做筆記 + 多敲代碼 + 多思考 + 學好英語！初學Python 小白階段
文章僅作為自己的學習筆記用于知識體系建立以及復習
題不在多學一題懂一題
知其然知其所以然！
?

1 多元回歸

?
注：這里實在沒有找到數(shù)據(jù)集
引用于：https://blog.csdn.net/HHTNAN/article/details/78843722?utm_source=blogxgwz7
以下代碼未驗證
?

1.1 選取數(shù)據(jù)

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl   #顯示中文
def mul_lr():
    pd_data=pd.read_excel('../profile/test.xlsx')
    print('pd_data.head(10)=\n{}'.format(pd_data.head(10)))
font = {
    "family": "Microsoft YaHei"
}
matplotlib.rc("font", **font)
mpl.rcParams['axes.unicode_minus']=False 
sns.pairplot(pd_data, x_vars=['中證500','瀘深300','上證50','上證180'], y_vars='上證指數(shù)',kind="reg", size=5, aspect=0.7)
plt.show()

1.2 構(gòu)建訓練集與測試集，并構(gòu)建模型

from sklearn.model_selection import train_test_split #這里是引用了交叉驗證
from sklearn.linear_model import LinearRegression  #線性回歸
from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt
def mul_lr():   #續(xù)前面代碼
    #剔除日期數(shù)據(jù)，一般沒有這列可不執(zhí)行，選取以下數(shù)據(jù)http://blog.csdn.net/chixujohnny/article/details/51095817
    X=pd_data.loc[:,('中證500','瀘深300','上證50','上證180')]
    y=pd_data.loc[:,'上證指數(shù)']
    X_train,X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2,random_state=100)
    print ('X_train.shape={}\n y_train.shape ={}\n X_test.shape={}\n,  y_test.shape={}'.format(X_train.shape,y_train.shape, X_test.shape,y_test.shape))
    linreg = LinearRegression()
    model=linreg.fit(X_train, y_train)
    print (model)
    # 訓練后模型截距
    print (linreg.intercept_)
    # 訓練后模型權(quán)重（特征個數(shù)無變化）
    print (linreg.coef_)

1.3 模型預測

#預測
y_pred = linreg.predict(X_test)
print (y_pred) #10個變量的預測結(jié)果

1.4 模型評估

    #評價
    #(1) 評價測度
    # 對于分類問題，評價測度是準確率，但這種方法不適用于回歸問題。我們使用針對連續(xù)數(shù)值的評價測度(evaluation metrics)。
    # 這里介紹3種常用的針對線性回歸的測度。
    # 1)平均絕對誤差(Mean Absolute Error, MAE)
    # (2)均方誤差(Mean Squared Error, MSE)
    # (3)均方根誤差(Root Mean Squared Error, RMSE)
    # 這里我使用RMES。
    sum_mean=0
    for i in range(len(y_pred)):
        sum_mean+=(y_pred[i]-y_test.values[i])**2
    sum_erro=np.sqrt(sum_mean/10)  #這個10是你測試級的數(shù)量
    # calculate RMSE by hand
    print ("RMSE by hand:",sum_erro)
    #做ROC曲線
    plt.figure()
      plt.plot(range(len(y_pred)),y_pred,'b',label="predict")
     plt.plot(range(len(y_pred)),y_test,'r',label="test")
    plt.legend(loc="upper right") #顯示圖中的標簽
    plt.xlabel("the number of sales")
    plt.ylabel('value of sales')
    plt.show()

2 logistic回歸

2.1 鳶尾花數(shù)據(jù)集

鳶尾花有三個亞屬，分別是山鳶尾（Iris-setosa）、變色鳶尾（Iris- versicolor）和維吉尼亞鳶尾（Iris-virginica）。

該數(shù)據(jù)集一共包含4個特征變量，1個類別變量。共有150個樣本，iris是鳶尾植物，這里存儲了其萼片和花瓣的長寬，共4個屬性，鳶尾植物分三類。

2.2 繪制散點圖

Demo代碼

import matplotlib.pyplot as plt 
import numpy as np 
from sklearn.datasets import load_iris 
iris = load_iris()
#獲取花卉兩列數(shù)據(jù)集  
DD = iris.data  
X = [x[0] for x in DD]  
Y = [x[1] for x in DD] 
plt.scatter(X[:50], Y[:50], color='red', marker='o', label='setosa') 
plt.scatter(X[50:100], Y[50:100], color='blue', marker='x', label='versicolor') 
plt.scatter(X[100:], Y[100:],color='green', marker='+', label='Virginica') 
plt.legend(loc=2) #左上角 
plt.show()

運行結(jié)果

2.3 邏輯回歸分析

Demo代碼

from sklearn.linear_model import LogisticRegression 
iris = load_iris() 
X = iris.data[:, :2]   #獲取花卉兩列數(shù)據(jù)集 
Y = iris.target 
lr = LogisticRegression(C=1e5)    
lr.fit(X,Y) 
#meshgrid函數(shù)生成兩個網(wǎng)格矩陣  
h = .02  
x_min, x_max = X[:, 0].min()-.5, X[:, 0].max()+.5 
y_min, y_max = X[:, 1].min()-.5, X[:, 1].max()+.5 
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) 
Z = lr.predict(np.c_[xx.ravel(), yy.ravel()]) 
Z = Z.reshape(xx.shape)  
plt.figure(1, figsize=(8,6))  
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired) 
plt.scatter(X[:50,0], X[:50,1], color='red',marker='o', label='setosa')  
plt.scatter(X[50:100,0], X[50:100,1], color='blue', marker='x', label='versicolor')
plt.scatter(X[100:,0], X[100:,1], color='green', marker='s', label='Virginica') 
plt.xlabel('Sepal length')  
plt.ylabel('Sepal width')  
plt.xlim(xx.min(), xx.max())  
plt.ylim(yy.min(), yy.max())  
plt.xticks(())  
plt.yticks(())  
plt.legend(loc=2)   
plt.show()

運行結(jié)果

結(jié)語

學習來源：B站及其課堂PPT，對其中代碼進行了復現(xiàn)

?
https://www.bilibili.com/video/BV12h411d7Dm
參考資料：https://blog.csdn.net/HHTNAN/article/details/78843722?utm_source=blogxgwz7
?

「文章僅作為學習筆記，記錄從0到1的一個過程」

希望對您有所幫助，如有錯誤歡迎小伙伴指正～

Python數(shù)學建模系列（九）：回歸

前言