fpgrowth運作原理可參考 frequent pattern analysis( http://systw.net/note/af/sblog/more.php?id=265 ) mahout fpg 常見參數如下 -k: 找出前n項pattern, default is 50 -regex: This is the regular expression to split every line -method: 使用sequential 或 MapReduce -s < minSupport>: This is the minimum number of times a transaction should be present
... demo #wget http://fimi.ua.ac.be/data/retail.dat #hadoop fs -mkdir retail #hadoop fs -put retail.dat retail/retail.dat #hadoop fs -ls retail -rw-r--r-- 3 root hdfs 101891 2014-02-05 11:44 retail/retail.dat #mahout fpg -i retail/retail.dat -o retail/patterns -method mapreduce -regex [' '] -s 2 ...omit... 14/02/05 22:35:15 INFO driver.MahoutDriver: Program took 415 ms (Minutes: 0.0069166666666666664) # hadoop fs -ls retail/patterns Found 4 items -rw-r--r-- 3 root hdfs 101891 2014-02-05 11:44 retail/patterns/fList drwxr-xr-x - root hdfs 0 2014-02-05 11:44 retail/patterns/fpgrowth drwxr-xr-x - root hdfs 0 2014-02-05 11:45 retail/patterns/frequentpatterns drwxr-xr-x - root hdfs 0 2014-02-05 11:44 retail/patterns/parallelcounting ps: fList: These are sequence files that contain the occurrence of the item for every item inside the transaction database 顯示結果 #mahout seqdumper -i retail/patterns/fpgrowth -o patterns.txt #cat patterns.txt ...omit... Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,]) Key: 954: Value: ([39, 954],2) Key: 953: Value: ([39, 953],2) Key: 933: Value: ([933],2) Count: 4849 說明 Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,]) It describes the number of associations found between item 0 and others within the whole transaction database. ([0],26) means that the item 0 appears in 26 transactions. ([39,0],14) confirms that the item 0 coupled with the item 39 appears in 14 transactions |