使用OLS( Ordinary least squares ) 的Linear Regression ... 載入模組 from sklearn import linear_model 建立初始model < model>=linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True,n_jobs=1) 讓model學習 < model>.fit(input, output) 檢視model學習狀況 < model>.score(input, output) 越接近1表示越好 根據input預測output < model>.predict(input) 列出linear model模型AX+b < model>.coef_ A, Estimated coefficients for the linear regression problem < model>.intercept_ b, Independent term in the linear model. refer LinearRegression (Ordinary Least Squares) http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression LinearRegression http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py ....................................................... Example in simple dataset #vi datatrain.txt traffic,packet 10,100 20,200 40,400 50,500 #vi datatest.txt traffic,packet 30,300 60,600 #vi lm.py from sklearn import linear_model import numpy as np import sys import os ###reload train data and training filepath=sys.argv[1] f = open(filepath) dataset = np.loadtxt(f,delimiter=',',skiprows=1) target=dataset[:,0] data_train=dataset[:,1:] f.seek(0) listhead=f.readlines()[0].strip().split(',')[1:] regr = linear_model.LinearRegression() regr.fit(data_train,target) ###reload test data and predicting filepath=sys.argv[2] f = open(filepath) dataset2 = np.loadtxt(f,delimiter=',',skiprows=1) data_test=dataset2[:,1:]
result=regr.predict(data_test) #when packet is 300 and 600 ,predict what is value of traffic print result #python lm.py datatrain.txt datatest.txt [ 30. 60.] 當packet=300,預測traffic為30 當traffic=600,預測traffic為60
........................................... Example in diabetes 進入python互動介面 #python 載入資料 >>> from sklearn import datasets >>> diabetes = datasets.load_diabetes() >>> diabetes_train_data = diabetes.data >>> diabetes_train_target = diabetes.target 將最後10筆給validation使用 >>> diabetes_test_data = diabetes.data[-10:] >>> diabetes_test_target = diabetes.target[-10:] 跑training >>> from sklearn import linear_model >>> regr = linear_model.LinearRegression() >>> regr.fit(diabetes_train_data,diabetes_train_target) LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
預測己知資料 >>>regr.predict(diabetes_test_data) array([ 218.17749233, 60.94590955, 131.09513588, 119.48417359, 52.60848094, 193.01802803, 101.05169913, 121.22505534, 211.8588945 , 53.44819015])
檢視預測結果,與實際結果的狀況 >>> for target_predict,target in zip(regr.predict(diabetes_test_data),diabetes_test_target): >>> print target_predict, target , target_predict-target 218.177492333 173.0 45.1774923325 60.9459095521 72.0 -11.0540904479 131.095135883 49.0 82.0951358828 119.484173585 64.0 55.4841735855 52.6084809435 48.0 4.60848094354 193.018028027 178.0 15.018028027 101.051699131 104.0 -2.94830086911 121.225055336 132.0 -10.7749446642 211.858894501 220.0 -8.14110549902 53.4481901497 57.0 -3.5518098503 估算此條迴歸線的整體誤差 透過平方平均數了解此迴歸線的整體誤差狀況 >>> np.mean((regr.predict(diabetes_test_data)-diabetes_test_target)**2) 2859.6903987680648 以MAPE估算結果 >>>np.mean(np.abs((diabetes_test_target-regr.predict(diabetes_test_data))/diabetes_test_target)) 看Linear Regression產生的線(AX+b) AX+b會讓所有training set的target value誤差總和為最小, 以下顯示training完獲得的A(.coef_)和b(.intercept_) >>> print(regr.coef_) [ -10.01219782 -239.81908937 519.83978679 324.39042769 -792.18416163 476.74583782 101.04457032 177.06417623 751.27932109 67.62538639] >>> print(regr.intercept_) 152.133484163
refer http://beancoder.com/linear-regression-stock-prediction/ |