mahout Itembased Collaborative Filtering
#mahout recommenditembased
Usage:
-i < input>
-o < output>
-n < number of Recommendations> 推數量
-b 不需評分欄位,只要user,item這兩個欄位即可
-s < similarityClassname >常用的有以下
SIMILARITY_COOCCURRENCE,
SIMILARITY_LOGLIKELIHOOD,
SIMILARITY_TANIMOTO_COEFFICIENT,
SIMILARITY_CITY_BLOCK,
SIMILARITY_COSINE,
SIMILARITY_PEARSON_CORRELATION,
SIMILARITY_EUCLIDEAN_DISTANCE
………………………..
DEMO
user x book 評價表(5為最高評價,1為最低評價)
book 1 | book 2 | book 3 | |
user 1 | 5 | 4 | 5 |
user 2 | 4 | 5 | 4 |
user 3 | 5 | 4 | |
user 4 | 1 | 2 | |
user 5 | 2 | 1 | 1 |
#vi recom.data
1,1,5
1,2,4
1,3,5
2,1,4
2,2,5
2,3,4
3,1,5
3,2,4
4,1,1
4,2,2
5,1,2
5,2,1
5,3,1
# hadoop fs -mkdir testdata
# hadoop fs -put recom.data testdata
# hadoop fs -ls -R testdata
-rw-r–r– 3 root hdfs 288374 2014-02-05 21:53 testdata/recom.data
#mahout recommenditembased -i testdata -o output -s SIMILARITY_EUCLIDEAN_DISTANCE
…omit…
{–booleanData=[false], –endPhase=[2147483647], –input=[tasteinput], –maxPrefsPerUser=[10], –maxPrefsPerUserInItemSimilarity=[1000], –maxSimilaritiesPerItem=[100], –minPrefsPerUser=[1], –numRecommendations=[10], –output=[tasteoutput], –similarityClassname=[SIMILARITY_EUCLIDEAN_DISTANCE], –startPhase=[0], –tempDir=[temp]}
14/03/02 09:52:37 INFO common.AbstractJob: Command line arguments: {–booleanData=[false], –endPhase=[2147483647], –input=[tasteinput], –maxPrefsPerUser=[1000], –minPrefsPerUser=[1], –output=[temp/preparePreferenceMatrix], –ratingShift=[0.0], –startPhase=[0], –tempDir=[temp]}
…omit…
File Input Format Counters
Bytes Read=287
File Output Format Counters
Bytes Written=32
14/09/04 05:46:56 INFO driver.MahoutDriver: Program took 434965 ms (Minutes: 7.249416666666667)
#hadoop fs -cat output/part-r-00000
3 [3:4.4787264]
4 [3:1.5212735]