Mahout FPgrowth

fpgrowth運作原理可參考
frequent pattern analysis( https://systw.net/note/af/sblog/more.php?id=265 )


mahout fpg
常見參數如下
-k: 找出前n項pattern, default is 50
-regex: This is the regular expression to split every line
-method: 使用sequential 或 MapReduce
-s < minSupport>: This is the minimum number of times a transaction should be present

demo

#wget http://fimi.ua.ac.be/data/retail.dat
#hadoop fs -mkdir retail
#hadoop fs -put retail.dat retail/retail.dat
#hadoop fs -ls retail

-rw-r–r– 3 root hdfs 101891 2014-02-05 11:44 retail/retail.dat

#mahout fpg -i retail/retail.dat -o retail/patterns -method mapreduce -regex [‘ ‘] -s 2
…omit…
14/02/05 22:35:15 INFO driver.MahoutDriver: Program took 415 ms (Minutes: 0.0069166666666666664)

# hadoop fs -ls retail/patterns
Found 4 items
-rw-r–r– 3 root hdfs 101891 2014-02-05 11:44 retail/patterns/fList
drwxr-xr-x – root hdfs 0 2014-02-05 11:44 retail/patterns/fpgrowth
drwxr-xr-x – root hdfs 0 2014-02-05 11:45 retail/patterns/frequentpatterns
drwxr-xr-x – root hdfs 0 2014-02-05 11:44 retail/patterns/parallelcounting
ps:
fList: These are sequence files that contain the occurrence of the item for every item inside the
transaction database

顯示結果
#mahout seqdumper -i retail/patterns/fpgrowth -o patterns.txt
#cat patterns.txt

…omit…
Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,])
Key: 954: Value: ([39, 954],2)
Key: 953: Value: ([39, 953],2)
Key: 933: Value: ([933],2)
Count: 4849

說明
Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,])
It describes the number of associations found between item 0 and others within the whole transaction database.
([0],26) means that the item 0 appears in 26 transactions.
([39,0],14) confirms that the item 0 coupled with the item 39 appears in 14 transactions