{"id":488,"date":"2014-09-18T20:44:00","date_gmt":"2014-09-18T12:44:00","guid":{"rendered":"http:\/\/note.systw.net\/note\/?p=488"},"modified":"2023-11-02T20:45:07","modified_gmt":"2023-11-02T12:45:07","slug":"mahout-fpgrowth","status":"publish","type":"post","link":"https:\/\/systw.net\/note\/archives\/488","title":{"rendered":"Mahout FPgrowth"},"content":{"rendered":"\n<p>fpgrowth\u904b\u4f5c\u539f\u7406\u53ef\u53c3\u8003<br>frequent pattern analysis( https:\/\/systw.net\/note\/af\/sblog\/more.php?id=265 )<\/p>\n\n\n\n<p><br><strong>mahout fpg<\/strong><br>\u5e38\u898b\u53c3\u6578\u5982\u4e0b<br>-k: \u627e\u51fa\u524dn\u9805pattern, default is 50<br>-regex: This is the regular expression to split every line<br>-method: \u4f7f\u7528sequential \u6216 MapReduce<br>-s &lt; minSupport&gt;: This is the minimum number of times a transaction should be present<\/p>\n\n\n\n<p>&#8230;<\/p>\n\n\n\n<p>demo<\/p>\n\n\n\n<p><strong>#wget http:\/\/fimi.ua.ac.be\/data\/retail.dat<br>#hadoop fs -mkdir retail<br>#hadoop fs -put retail.dat retail\/retail.dat<br>#hadoop fs -ls retail<\/strong><br>-rw-r&#8211;r&#8211; 3 root hdfs 101891 2014-02-05 11:44 retail\/retail.dat<\/p>\n\n\n\n<p><strong>#mahout fpg -i retail\/retail.dat -o retail\/patterns -method mapreduce -regex [&#8216; &#8216;] -s 2<\/strong><br>&#8230;omit&#8230;<br>14\/02\/05 22:35:15 INFO driver.MahoutDriver: Program took 415 ms (Minutes: 0.0069166666666666664)<\/p>\n\n\n\n<p><strong># hadoop fs -ls retail\/patterns<\/strong><br>Found 4 items<br>-rw-r&#8211;r&#8211; 3 root hdfs 101891 2014-02-05 11:44 retail\/patterns\/fList<br>drwxr-xr-x &#8211; root hdfs 0 2014-02-05 11:44 retail\/patterns\/fpgrowth<br>drwxr-xr-x &#8211; root hdfs 0 2014-02-05 11:45 retail\/patterns\/frequentpatterns<br>drwxr-xr-x &#8211; root hdfs 0 2014-02-05 11:44 retail\/patterns\/parallelcounting<br>ps:<br>fList: These are sequence files that contain the occurrence of the item for every item inside the<br>transaction database<\/p>\n\n\n\n<p>\u986f\u793a\u7d50\u679c<br><strong>#mahout seqdumper -i retail\/patterns\/fpgrowth -o patterns.txt<br>#cat patterns.txt<\/strong><br>&#8230;omit&#8230;<br>Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,])<br>Key: 954: Value: ([39, 954],2)<br>Key: 953: Value: ([39, 953],2)<br>Key: 933: Value: ([933],2)<br>Count: 4849<\/p>\n\n\n\n<p>\u8aaa\u660e<br>Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,])<br>It describes the number of associations found between item 0 and others within the whole transaction database.<br>([0],26) means that the item 0 appears in 26 transactions.<br>([39,0],14) confirms that the item 0 coupled with the item 39 appears in 14 transactions<\/p>\n","protected":false},"excerpt":{"rendered":"<p>fpgrowth\u904b\u4f5c\u539f\u7406\u53ef\u53c3\u8003frequent patter &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[13],"tags":[],"class_list":["post-488","post","type-post","status-publish","format-standard","hentry","category-dataanalysis"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/488","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/comments?post=488"}],"version-history":[{"count":0,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/488\/revisions"}],"wp:attachment":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/media?parent=488"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/categories?post=488"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/tags?post=488"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}