SKLearn Linear Regression

使用OLS( Ordinary least squares ) 的Linear Regression

載入模組
from sklearn import linear_model

建立初始model
< model>=linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True,n_jobs=1)

讓model學習
< model>.fit(input, output)

檢視model學習狀況
< model>.score(input, output)
越接近1表示越好

根據input預測output
< model>.predict(input)

列出linear model模型AX+b
< model>.coef_
A, Estimated coefficients for the linear regression problem
< model>.intercept_
b, Independent term in the linear model.

refer
LinearRegression (Ordinary Least Squares)
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression
LinearRegression
http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py

……………………………………………….

Example in simple dataset

#vi datatrain.txt
traffic,packet
10,100
20,200
40,400
50,500

#vi datatest.txt
traffic,packet
30,300
60,600

#vi lm.py
from sklearn import linear_model
import numpy as np
import sys
import os

###reload train data and training
filepath=sys.argv[1]
f = open(filepath)
dataset = np.loadtxt(f,delimiter=’,’,skiprows=1)
target=dataset[:,0]
data_train=dataset[:,1:]
f.seek(0)
listhead=f.readlines()[0].strip().split(‘,’)[1:]

regr = linear_model.LinearRegression()
regr.fit(data_train,target)


###reload test data and predicting
filepath=sys.argv[2]
f = open(filepath)
dataset2 = np.loadtxt(f,delimiter=’,’,skiprows=1)
data_test=dataset2[:,1:]

result=regr.predict(data_test) #when packet is 300 and 600 ,predict what is value of traffic
print result


#python lm.py datatrain.txt datatest.txt
[ 30. 60.]
當packet=300,預測traffic為30
當traffic=600,預測traffic為60

…………………………………….

Example in diabetes

進入python互動介面
#python

載入資料
>>> from sklearn import datasets
>>> diabetes = datasets.load_diabetes()
>>> diabetes_train_data = diabetes.data
>>> diabetes_train_target = diabetes.target
將最後10筆給validation使用
>>> diabetes_test_data = diabetes.data[-10:]
>>> diabetes_test_target = diabetes.target[-10:]


跑training
>>> from sklearn import linear_model
>>> regr = linear_model.LinearRegression()
>>> regr.fit(diabetes_train_data,diabetes_train_target)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)


預測己知資料
>>>regr.predict(diabetes_test_data)
array([ 218.17749233, 60.94590955, 131.09513588, 119.48417359,
52.60848094, 193.01802803, 101.05169913, 121.22505534,
211.8588945 , 53.44819015])

檢視預測結果,與實際結果的狀況
>>> for target_predict,target in zip(regr.predict(diabetes_test_data),diabetes_test_target):
>>> print target_predict, target , target_predict-target
218.177492333 173.0 45.1774923325
60.9459095521 72.0 -11.0540904479
131.095135883 49.0 82.0951358828
119.484173585 64.0 55.4841735855
52.6084809435 48.0 4.60848094354
193.018028027 178.0 15.018028027
101.051699131 104.0 -2.94830086911
121.225055336 132.0 -10.7749446642
211.858894501 220.0 -8.14110549902
53.4481901497 57.0 -3.5518098503

估算此條迴歸線的整體誤差
透過平方平均數了解此迴歸線的整體誤差狀況
>>> np.mean((regr.predict(diabetes_test_data)-diabetes_test_target)**2)
2859.6903987680648

以MAPE估算結果
>>>np.mean(np.abs((diabetes_test_target-regr.predict(diabetes_test_data))/diabetes_test_target))


看Linear Regression產生的線(AX+b)
AX+b會讓所有training set的target value誤差總和為最小,
以下顯示training完獲得的A(.coef_)和b(.intercept_)
>>> print(regr.coef_)
[ -10.01219782 -239.81908937 519.83978679 324.39042769 -792.18416163
476.74583782 101.04457032 177.06417623 751.27932109 67.62538639]
>>> print(regr.intercept_)
152.133484163

refer
http://beancoder.com/linear-regression-stock-prediction/