{"id":486,"date":"2014-09-11T20:42:00","date_gmt":"2014-09-11T12:42:00","guid":{"rendered":"http:\/\/note.systw.net\/note\/?p=486"},"modified":"2023-11-02T20:44:07","modified_gmt":"2023-11-02T12:44:07","slug":"mahout-logistic","status":"publish","type":"post","link":"https:\/\/systw.net\/note\/archives\/486","title":{"rendered":"Mahout Logistic"},"content":{"rendered":"\n<p><strong>mahout logistic regression<\/strong><br>mahout\u7684logistic regression\u4f7f\u7528SGD(stochastic gradient descent)\u4f86\u5be6\u505a<\/p>\n\n\n\n<p>\u8a13\u7df4\u8cc7\u6599\u96c6<br><strong>#mahout trainlogistic<\/strong><br>\u5e38\u7528\u53c3\u6578\u5982\u4e0b<br>&#8211;input &lt; file-or-resource&gt; Uses the specified file or resource as input.<br>&#8211;output &lt; file-for-model&gt; Puts the model into the specified file.<br>&#8211;target &lt; variable&gt; Uses the specified variable as the target.<br>&#8211;categories &lt; n&gt; Specifies how many categories the target variable has.<br>&#8211;predictors &lt; v1&gt; &#8230; &lt; vn&gt; Specifies the names of the predictor variables.<br>&#8211;features<br>Sets the size of the internal feature vector to use in building the model. A larger value here can be helpful, especially with text-like input data.<br>&#8211;rate<br>Sets the initial learning rate. This can be large if you have lots of data or use lots of passes because it&#8217;s decreased progressively as data is examined.<br>&#8211;passes<br>Specifies the number of times the input data should be reexamined during training. Small input files may need to be examined dozens of times. Very large input files probably don&#8217;t even need to be completely examined.<\/p>\n\n\n\n<p>\u6e2c\u8a66\u8cc7\u6599\u96c6<br><strong># mahout runlogistic<\/strong><br>\u5e38\u7528\u53c3\u6578\u5982\u4e0b<br>&#8211;auc Prints AUC score for model versus input data after reading data.<br>&#8211;scores Prints target variable value and scores for each input example.<br>&#8211;confusion Prints confusion matrix for a particular threshold (see &#8211;threshold).<br>&#8211;input &lt; input&gt; Reads data records from specified file or resource.<br>&#8211;model &lt; model&gt; Reads model from specified file<br>\u986f\u793a\u7d50\u679c\u5927\u81f4\u5982\u4e0b<br>AUC = 0.57<br>confusion: [[27.0, 13.0], [0.0, 0.0]]<br>\u8aaa\u660e<br>The AUC parameter is between 0 and 1, that means the number of true positives<br>The confusion means [[ TP , FP ], [FN, TN]]<\/p>\n\n\n\n<p>&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;.<\/p>\n\n\n\n<p><br><strong>demo<\/strong><\/p>\n\n\n\n<p>\u89e3\u8aaa<br>\u8cc7\u6599\u96c6\u7684\u6a94\u540d\u70batestclass<br>\u5167\u5bb9\u5982\u4e0b\uff0c3\u500b\u5c6c\u6027,\u548c1\u500btag<br><strong>#vi testclass<\/strong><br>a,b,c,tag<br>1,1,1,1<br>2,2,2,1<br>3,3,3,1<br>4,4,4,1<br>5,5,5,1<br>6,6,6,2<br>7,7,7,2<br>8,8,8,2<br>9,9,9,2<br>10,10,10,2<\/p>\n\n\n\n<p>\u57f7\u884c\u4ee5\u4e0b\u6307\u4ee4\u5efa\u7acb\u4e00\u500b\u8cc7\u6599\u6a21\u578b\uff08\u6b64\u968e\u6bb5\u70ba\u5206\u985e\u904e\u7a0b\u4e2d\u7684\u8a13\u7df4\u968e\u6bb5\uff09<br><strong># mahout trainlogistic &#8211;input testclass &#8211;output model &#8211;target tag &#8211;categories 2 &#8211;predictors a b c &#8211;types numeric &#8211;features 3<\/strong><br>\u8dd1\u51fa\u4ee5\u4e0b\u7d50\u679c<br>Running on hadoop, using \/usr\/lib\/hadoop\/bin\/hadoop and HADOOP_CONF_DIR=<br>MAHOUT-JOB: \/usr\/lib\/mahout\/mahout-examples-0.7.0.1.3.3.0-58-job.jar<br>3<br>tag ~ -91.295*Intercept Term + 6.690*a + 6.690*b + 3.321*c<br>Intercept Term -91.29475<br>a 6.68964<br>b 6.68964<br>c 3.32069<br>6.689641057 -91.294753096 3.320686879<br>14\/02\/20 22:18:45 INFO driver.MahoutDriver: Program took 399 ms (Minutes: 0.00665)<br>ps:Intercept Term\u9810\u8a2d\u70ba1<br><br><strong>\u8cc7\u6599\u6a21\u578b\u70ba<br>-91.295 + 6.690*a + 6.690*b + 3.321*c<\/strong><\/p>\n\n\n\n<p>\u9019\u4ee3\u8868<br>\u5c07a,b,c\u9019\u500b\u5c6c\u6027\u7684\u503c\u5957\u5165\u5de5\u5f0f-91.295 + 6.690*a + 6.690*b + 3.321*c<br>\u5373\u53ef\u5f97\u5230\u7d50\u679c\uff0c\u5982\u4e0b<br>a,b,c,tag=&gt;\u5957\u5165\u516c\u5f0f\u7684\u7d50\u679c<br>1,1,1,1 =&gt; -74.594<br>2,2,2,1 =&gt; -57.893<br>3,3,3,1 =&gt; -41.192<br>4,4,4,1 =&gt; -24.491<br>5,5,5,1 =&gt; -7.79<br>6,6,6,2 =&gt; 8.911<br>7,7,7,2 =&gt; 25.612<br>8,8,8,2 =&gt; 42.313<br>9,9,9,2 =&gt; 59.014<br>10,10,10,2 =&gt; 75.715<br>\u5206\u754c\u7dda(-74.594 + 75.715 )2 = 0.565<br>\u5c0f\u65bc\u5206\u754c\u7dda\u70ba1,\u5927\u65bc\u5206\u754c\u7dda\u70ba2<\/p>\n\n\n\n<p><br>\u6e2c\u8a66\u8a13\u7df4\u7d50\u679c(\u6b63\u5e38\u60c5\u6cc1\u4e0b,AUC\u70ba1)<br><strong># mahout runlogistic &#8211;input testclass &#8211;model model &#8211;auc &#8211;confusion<\/strong><br>Running on hadoop, using \/usr\/lib\/hadoop\/bin\/hadoop and HADOOP_CONF_DIR=<br>MAHOUT-JOB: \/usr\/lib\/mahout\/mahout-examples-0.7.0.1.3.3.0-58-job.jar<br>AUC = 1.00<br>confusion: [[5.0, 0.0], [0.0, 5.0]]<br>entropy: [[-0.0, NaN], [-33.1, -0.0]]<br>14\/03\/15 00:42:05 INFO driver.MahoutDriver: Program took 139 ms (Minutes: 0.0023166666666666665)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>mahout logistic regressionmaho &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[13],"tags":[],"class_list":["post-486","post","type-post","status-publish","format-standard","hentry","category-dataanalysis"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/486","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/comments?post=486"}],"version-history":[{"count":0,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/486\/revisions"}],"wp:attachment":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/media?parent=486"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/categories?post=486"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/tags?post=486"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}