{"id":459,"date":"2012-11-29T20:05:00","date_gmt":"2012-11-29T12:05:00","guid":{"rendered":"http:\/\/note.systw.net\/note\/?p=459"},"modified":"2023-11-02T20:06:31","modified_gmt":"2023-11-02T12:06:31","slug":"classification-and-predication","status":"publish","type":"post","link":"https:\/\/systw.net\/note\/archives\/459","title":{"rendered":"Classification and Predication"},"content":{"rendered":"\n<p>classification and predication<\/p>\n\n\n\n<p>\u4e3b\u8981\u5169\u6b65\u9a5f<br><strong>1model construction(\u5efa\u6a21):<\/strong><br>\u6839\u64da\u5df1\u77e5\u7d50\u679c(training data)\u7522\u751f\u898f\u5247,<br>\u901f\u5ea6\u8f03\u6c92\u5f71\u97ff<br><strong>2using the model in prediction(\u4f7f\u7528\u6a21\u578b):<\/strong><br>\u6839\u64da\u8a72\u898f\u5247\u9810\u6e2cunseen data(\u672a\u77e5\u8cc7\u6599)\u7684\u7d50\u679c<br>\u901f\u5ea6\u5f88\u91cd\u8981,\u56e0\u70ba\u8981\u5373\u6642\u6839\u64da\u6a21\u578b\u5f97\u5230\u7d50\u679c<\/p>\n\n\n\n<p><strong>\u5e38\u898b\u6f14\u7b97\u6cd5\u3000<\/strong><br>decision tree(\u6c7a\u7b56\u6a39):\u89e3\u8b80\u5bb9\u6613<br>bayesian classfication(\u8c9d\u5f0f\u5206\u985e\u6cd5):\u7c21\u55ae,\u6b63\u78ba\u7387\u4e0d\u9ad8<br>back propaqation:\u6b63\u78ba\u7387\u9ad8,\u4f46\u96e3\u89e3\u8b80\u7d50\u679c<br>SVM(support vector machines):\u6b63\u78ba\u7387\u6700\u9ad8,\u4f46\u53ea\u80fd\u8655\u7406YES\u6216NO\u7684\u7d50\u679c<\/p>\n\n\n\n<p><strong>\u5e38\u898b\u6280\u5de7<\/strong><br>Ensemble Learning\uff1a\u6839\u64da\u591a\u500b\u5206\u985e\u6f14\u7b97\u6cd5\u7684\u7d50\u679c\u9032\u884c\u8868\u6c7a\uff0c\u6700\u5f8c\u5c07\u8868\u6c7a\u7d50\u679c\u6574\u5408\u6210\u6700\u5f8c\u7684\u6c7a\u7b56\u3000<\/p>\n\n\n\n<p><strong>\u5e38\u898b\u932f\u8aa4<\/strong>:<br>Underfitting(\u8a13\u7df4\u4e0d\u8db3) : \u6a21\u578b\u592a\u7c21\u55ae,\u7121\u6cd5\u63cf\u8ff0\u6574\u500b\u8907\u96dc\u8cc7\u6599,\u5c0e\u81f4\u5206\u985e\u54c1\u8cea\u4e0d\u4f73<br>Overfitting(\u904e\u5ea6\u5b78\u7fd2): \u6a21\u578b\u592a\u8cbc\u8fd1train data,\u5c0e\u81f4\u5206\u985e\u54c1\u8cea\u4e0d\u4f73<\/p>\n\n\n\n<p>ps:<br>Regularization:\u7528\u4f86\u89e3\u6c7aoverfitting\u7684\u4e00\u7a2e\u65b9\u6cd5&nbsp;<\/p>\n\n\n\n<p><strong>\u9a57\u8b49\u5206\u985e\u54c1\u8cea\u7684\u6307\u6a19<\/strong><br><br>P(Postive)\u548cN(Negotive)\u4ee3\u8868\u7cfb\u7d71\u8a8d\u70ba\u7684\u7d50\u679c<br>T(True)\u548cF(False)\u4ee3\u8868\u7cfb\u7d71\u8a8d\u70ba\u7684\u7d50\u679c\u662f\u5426\u6b63\u78ba&nbsp;<br><br>\u7d50\u5408\u4e0a\u8ff0\u5171\u6709\u4ee5\u4e0b\u56db\u7a2e\u72c0\u614b&nbsp;<br>TP=\u7cfb\u7d71\u8a8d\u70ba\u7d50\u679c\u662fP, \u7b54\u5c0d (\u6293\u5c0d\u5e7e\u500b)<br>FP=\u7cfb\u7d71\u8a8d\u70ba\u7d50\u679c\u662fP, \u4f46\u7d50\u679c\u662f\u932f\u7684 (\u6293\u932f\u5e7e\u500b\/\u8aa4\u5224\/\u5047\u8b66\u5831)<br>TN=\u7cfb\u7d71\u8a8d\u70ba\u7d50\u679c\u662fN, \u7b54\u5c0d<br>FN=\u7cfb\u7d71\u8a8d\u70ba\u7d50\u679c\u662fN, \u4f46\u7d50\u679c\u662f\u932f\u7684&nbsp;(\u6f0f\u6293\u6578\u91cf)<\/p>\n\n\n\n<p>\u4e26\u6709\u4ee5\u4e0b\u5e7e\u7a2e\u516c\u5f0f<br>P=TP+FN =\u5be6\u969b\u8cc7\u6599\u662fP\u7684\u6578\u91cf<br>N=FP+TN =\u5be6\u969b\u8cc7\u6599\u662fN\u7684\u6578\u91cf&nbsp;<br>accuracy=(TP+TN)\/(P+N) = \u7cfb\u7d71\u8a8d\u70baP\u548cN\u7684\u7b54\u5c0d\u6a5f\u7387\u6709\u5e7e%<br>precision= TP\/(TP+FP) =\u7cfb\u7d71\u8a8d\u70ba\u7d50\u679c\u70baP,\u6709\u5e7e%\u6b63\u78ba<br>recall = TP\/P = TP\u4f54\u6574\u500b\u5be6\u969b\u6578\u91cf\u7684\u6a5f\u7387%(\u6574\u9ad4\u4f86\u8aaa,\u7cfb\u7d71\u6293\u5230\u5e7e%\u75c5\u6bd2)<br>F1-measure=2*TP\/(2*TP+FP+FN)=\u6df7\u5408precision\u548crecall&nbsp;<br>TPR(\u547d\u4e2d\u7387)=recall<br>FPR(\u932f\u8aa4\u547d\u4e2d\u7387)=FP\/N<br>ROC=\u5404\u7a2eTPR\u548cFPR\u7d50\u679c\u6240\u7522\u751f\u7684\u4e00\u689d\u66f2\u7dda<br>AUC=ROC\u4e0b\u7684\u9762\u7a4d,\u6578\u5b57\u4ecb\u65bc1~0\u4e4b\u9593,\u8d8a\u5927\u8868\u793a\u7cfb\u7d71\u8a8d\u5b9a\u7d50\u679c\u8d8a\u597d,0.5\u8868\u793a\u7121\u9810\u6e2c\u50f9\u503c<\/p>\n\n\n\n<p>\u4e00\u500b\u7db2\u8def\u5167\u670930\u8a2d\u5099\u88ab\u5165\u4fb5,70\u53f0\u8a2d\u5099\u6c92\u88ab\u5165\u4fb5<br>\u5047\u8a2d\u6709\u67d0\u4e00\u6a5f\u5236\u8a8d\u70ba40\u53f0\u8a2d\u5099\u88ab\u5165\u4fb5,\u4f46\u53ea\u670920\u53f0\u771f\u7684\u88ab\u5165\u4fb5<br>TP=20 (\u6293\u5c0d\u5e7e\u500b)<br>FP=20 (\u6293\u932f\u5e7e\u500b)<br>TN=50<br>FN=10 (\u6f0f\u6293\u6578\u91cf)<br>accuracy=(20+50)\/100=70%<br>precision=20\/(20+20)=50%<br>recall=20\/(20+10)=66%<br>F1=(2*20)\/(2*20+20+10)=57%<\/p>\n\n\n\n<p>\u4e00\u5bb6\u516c\u53f8\u670920\u53f0\u8a2d\u5099\u4e2d\u6bd2,80\u53f0\u8a2d\u5099\u6c92\u4e2d\u6bd2<br>\u5047\u8a2d\u6709\u4e00\u7cfb\u7d71\u5728\u4e4b\u524d\u5c31\u9810\u6e2c\u5230\u67095\u53f0\u8a2d\u5099\u4e2d\u6bd2,\u5176\u4e2d5\u53f0\u771f\u7684\u4e2d\u6bd2<br>TP=5<br>FP=0<br>TN=80<br>FN=15<br>accuracy=(5+80)\/100=85%<br>precision=5\/(5+0)=100%<br>recall=5\/(5+15)=25%<br>F1=(5*2)\/(5*2+0+15)=40%<\/p>\n\n\n\n<p><strong>\u9a57\u8b49\u6578\u503c\u9810\u6e2c\u54c1\u8cea\u7684\u6307\u6a19<\/strong><br>MAD(mean absolute deviation,\u5e73\u5747\u7d55\u5c0d\u504f\u5dee)<br>MSE(mean squared error,\u5747\u65b9\u8aa4\u5dee)<br>MAPE(mean absolute percent error,\u5e73\u5747\u7d55\u5c0d\u767e\u5206\u6bd4\u8aa4\u5dee)&nbsp;<\/p>\n\n\n\n<p>##################################################################<\/p>\n\n\n\n<p><strong>bayesian classfication(\u8c9d\u5f0f\u5206\u985e\u6cd5)<\/strong><\/p>\n\n\n\n<p><strong>1<br>\u8a08\u7b97P(Ci)<\/strong><br>P(Ci)\u8868\u793a\u5728\u8cc7\u6599\u96c6\u4e2d\u51fa\u73fe\u7684\u6a5f\u7387<br><strong>2<br>\u8a08\u7b97P( X | Ci )=directproduct( k , P(Xk|Ci) )<\/strong><br>P(Xk|Ci):\u8868\u793a\u5728Ci\u767c\u751f\u7684\u60c5\u6cc1\u4e0bXk\u7684\u6a5f\u7387<br>\u7b97\u51fa\u4f86\u7684\u6a5f\u7387\u4e0d\u53ef\u70ba0,\u5426\u5247\u6703\u5c07P( X | Ci )\u7684\u503c\u8b8a\u70ba0<br><strong>3<\/strong><br><strong>\u627e\u6700\u5927\u503c<\/strong><br>P(Ci | X) = P( X | Ci)*P(Ci)<br>MAX( P(Ci | X) )<\/p>\n\n\n\n<p>ps:<br><strong>avoid 0\u7684\u5176\u4e2d\u4e00\u500b\u65b9\u6cd5<\/strong><br>P(Xk|Ci) =&gt; P(Xk+e|Ci+distinct(X)*e)<br>distinct(X)=\u8a72X\u985e\u5225\u6709\u5e7e\u7a2e\u4e0d\u540c\u7684\u8cc7\u6599<br>e=\u81ea\u8a02\u7684\u5c0f\u6578,ex:0.1\u62160.01<\/p>\n\n\n\n<p>&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&nbsp;<\/p>\n\n\n\n<p>ex:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>rid&nbsp;<\/td><td>flow&nbsp;<\/td><td>packet&nbsp;<\/td><td>student&nbsp;<\/td><td>over1gb&nbsp;<\/td><td>class:is infected?&nbsp;<\/td><\/tr><tr><td>&nbsp;1<\/td><td>&nbsp;low<\/td><td>high<\/td><td>no<\/td><td>no&nbsp;<\/td><td>no&nbsp;<\/td><\/tr><tr><td>&nbsp;2<\/td><td>&nbsp;low<\/td><td>high<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><\/tr><tr><td>&nbsp;3<\/td><td>&nbsp;medium<\/td><td>high&nbsp;<\/td><td>no&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;4<\/td><td>&nbsp;high<\/td><td>medium&nbsp;<\/td><td>no&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;5<\/td><td>&nbsp;high<\/td><td>low&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;6<\/td><td>&nbsp;high<\/td><td>low&nbsp;<\/td><td>yes&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><\/tr><tr><td>&nbsp;7<\/td><td>&nbsp;medium<\/td><td>low&nbsp;<\/td><td>yes&nbsp;<\/td><td>yes&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;8<\/td><td>&nbsp;low<\/td><td>medium&nbsp;<\/td><td>no&nbsp;<\/td><td>no&nbsp;<\/td><td>no&nbsp;<\/td><\/tr><tr><td>&nbsp;9<\/td><td>&nbsp;low<\/td><td>low&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;10<\/td><td>&nbsp;high<\/td><td>medium&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;11<\/td><td>&nbsp;low<\/td><td>medium&nbsp;<\/td><td>yes&nbsp;<\/td><td>yes&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;12<\/td><td>&nbsp;medium<\/td><td>medium&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;13<\/td><td>&nbsp;medium<\/td><td>high&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><\/tr><tr><td>&nbsp;14<\/td><td>&nbsp;high<\/td><td>medium&nbsp;<\/td><td>no&nbsp;<\/td><td>yes&nbsp;<\/td><td>no&nbsp;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>ps:over1gb\u8868\u793a\u6d41\u91cf\u8d85\u904e1gb<\/p>\n\n\n\n<p><strong>1<br>\u8a08\u7b97P(Ci)<\/strong><br>P(C1)=P(infected=yes),\u6a5f\u7387=9\/14=0.643<br>P(C2)=P(infected=no),\u6a5f\u7387=5\/14=0.357<br><strong>2<br>\u8a08\u7b97P( X | Ci)<br><\/strong>\u5047\u8a2dX=(flow=low, packet=medium , student=yes , over1gb=no)<br>P( X1 | C1) = P( flow=low | infected=yes )=2\/9=0.222<br>P( X2 | C1) = P( packet=medium | infected=yes )=4\/9=0.444<br>P( X3 | C1) = P( student=yes | infected=yes )=6\/9=0.667<br>P( X4 | C1) = P( over1gb=no | infected=yes )=6\/9=0.667<br>P( X | C1)=P(X1|C1)*P(X2|C1)*P(X3|C1)*P(X4|C1)<br>= P(X | infected=yes)=0.222*0.444*0.667*0.667=0.044<br>P( X1 | C2) = P( flow=low | infected=no )=3\/5=0.6<br>P( X2 | C2) = P( packet=medium | infected=no )=2\/5=0.4<br>P( X3 | C2) = P( student=yes | infected=no )=1\/5=0.2<br>P( X4 | C2) = P( over1gb=no | infected=no )=2\/5=0.4<br>P( X | C2)=P(X1|C2)*P(X2|C2)*P(X3|C2)*P(X4|C2)<br>=P(X | infected=no)=0.6*0.4*0.2*0.4=0.019<br><strong>3<br>\u627e\u6700\u5927\u503c<\/strong><br>P(C1| X) = P( X | C1)*P(C1)=0.044*0.643=0.028<br>P(C2| X) = P( X | C2)*P(C2)=0.019*0.357=0.007<br>\u56e0MAX( P(Ci| X) )=P(C1| X) ,\u6240\u4ee5X\u5c6c\u65bcC1<br>=(flow=low, packet=medium , student=yes , over1gb=no)\u5c6c\u65bc(infected=yes)<br>\u63db\u53e5\u8a71\u8aaa<br>\u7576flow\u70balow\uff0c\u800c\u4e14packet\u70bamedium\uff0c\u800c\u4e14\u8eab\u4efd\u70bastudent\uff0c\u800c\u4e14\u6d41\u91cf\u672a\u8d85\u904e1gb\u6642<br>\u5247\u8a72PC\u5df2\u88ab\u611f\u67d3<\/p>\n\n\n\n<p><br>ps:<br>\u907f\u514d0\u7684\u6539\u8b8a<br>\u5047\u8a2de=0.1,X=(flow=low, packet=medium , student=yes , over1gb=no)<br>\u8a08\u7b97 P(X | infected=yes)<br>P( X1 | C1) = P( flow=low | infected=yes )=2\/9=&gt;2.1\/9.3=0.228<br>P( X2 | C1) = P( packet=medium | infected=yes )=4\/9=&gt;4.1\/9.3=0.440<br>P( X3 | C1) = P( student=yes | infected=yes )=6\/9=&gt;6.1\/9.2=0.663<br>P( X4 | C1) = P( over1gb=no | infected=yes )=6\/9=&gt;6.1\/9.2=0.663<br>P( X | C1)=P(X1|C1)*P(X2|C1)*P(X3|C1)*P(X4|C1)<br>= P(X | infected=yes)=0.228*0.44*0.663*0.663=0.044<\/p>\n","protected":false},"excerpt":{"rendered":"<p>classification and predication &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[13],"tags":[],"class_list":["post-459","post","type-post","status-publish","format-standard","hentry","category-dataanalysis"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/comments?post=459"}],"version-history":[{"count":0,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/459\/revisions"}],"wp:attachment":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/media?parent=459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/categories?post=459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/tags?post=459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}