mahout fpgrowth


fpgrowth運作原理可參考
frequent pattern analysis( http://systw.net/note/af/sblog/more.php?id=265 )


mahout fpg
常見參數如下
-k: 找出前n項pattern, default is 50
-regex: This is the regular expression to split every line
-method: 使用sequential 或 MapReduce
-s < minSupport>: This is the minimum number of times a transaction should be present

...

demo

#wget http://fimi.ua.ac.be/data/retail.dat
#hadoop fs -mkdir retail
#hadoop fs -put retail.dat retail/retail.dat
#hadoop fs -ls retail

-rw-r--r-- 3 root hdfs 101891 2014-02-05 11:44 retail/retail.dat

#mahout fpg -i retail/retail.dat -o retail/patterns -method mapreduce -regex [' '] -s 2
...omit...
14/02/05 22:35:15 INFO driver.MahoutDriver: Program took 415 ms (Minutes: 0.0069166666666666664)

# hadoop fs -ls retail/patterns
Found 4 items
-rw-r--r-- 3 root hdfs 101891 2014-02-05 11:44 retail/patterns/fList
drwxr-xr-x - root hdfs 0 2014-02-05 11:44 retail/patterns/fpgrowth
drwxr-xr-x - root hdfs 0 2014-02-05 11:45 retail/patterns/frequentpatterns
drwxr-xr-x - root hdfs 0 2014-02-05 11:44 retail/patterns/parallelcounting
ps:
fList: These are sequence files that contain the occurrence of the item for every item inside the
transaction database

顯示結果
#mahout seqdumper -i retail/patterns/fpgrowth -o patterns.txt
#cat patterns.txt

...omit...
Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,])
Key: 954: Value: ([39, 954],2)
Key: 953: Value: ([39, 953],2)
Key: 933: Value: ([933],2)
Count: 4849

說明
Key: 0: Value: ([0],26), ([39, 0],14), ([39, 48, 41, 32, 616, 0, 1314],2), ([39, 41, 0,])
It describes the number of associations found between item 0 and others within the whole transaction database.
([0],26) means that the item 0 appears in 26 transactions.
([39,0],14) confirms that the item 0 coupled with the item 39 appears in 14 transactions

2014-09-08 22:42:46發表 0000-00-00 00:00:00修改   

數據分析

程式開發
計算機組織與結構
資料結構與演算法
Database and MySql
manage tool
windows
unix-like
linux service
network
network layer3
network layer2
network WAN
network service
作業系統
數位鑑識
資訊安全解決方案
資訊安全威脅
Cisco security
Cisco network
Cisco layer3
Cisco layer2



  登入      [牛的大腦] | [單字我朋友] Powered by systw.net