Mahout Recommend

mahout Itembased Collaborative Filtering

#mahout recommenditembased
Usage:
-i < input>
-o < output>
-n < number of Recommendations> 推數量
-b 不需評分欄位,只要user,item這兩個欄位即可
-s < similarityClassname >常用的有以下
 SIMILARITY_COOCCURRENCE,
 SIMILARITY_LOGLIKELIHOOD,
 SIMILARITY_TANIMOTO_COEFFICIENT,
 SIMILARITY_CITY_BLOCK,
 SIMILARITY_COSINE,
 SIMILARITY_PEARSON_CORRELATION,
 SIMILARITY_EUCLIDEAN_DISTANCE


………………………..

DEMO

user x book 評價表(5為最高評價,1為最低評價) 

 book 1 book 2 book 3 
user 1  5
user 2  4 
user 3  5 
user 4  1 
user 5  2 


#vi recom.data
1,1,5
1,2,4
1,3,5
2,1,4
2,2,5
2,3,4
3,1,5
3,2,4
4,1,1
4,2,2
5,1,2
5,2,1
5,3,1

# hadoop fs -mkdir testdata
# hadoop fs -put recom.data testdata
# hadoop fs -ls -R testdata

-rw-r–r– 3 root hdfs 288374 2014-02-05 21:53 testdata/recom.data

#mahout recommenditembased -i testdata -o output -s SIMILARITY_EUCLIDEAN_DISTANCE
…omit…
{–booleanData=[false], –endPhase=[2147483647], –input=[tasteinput], –maxPrefsPerUser=[10], –maxPrefsPerUserInItemSimilarity=[1000], –maxSimilaritiesPerItem=[100], –minPrefsPerUser=[1], –numRecommendations=[10], –output=[tasteoutput], –similarityClassname=[SIMILARITY_EUCLIDEAN_DISTANCE], –startPhase=[0], –tempDir=[temp]}
14/03/02 09:52:37 INFO common.AbstractJob: Command line arguments: {–booleanData=[false], –endPhase=[2147483647], –input=[tasteinput], –maxPrefsPerUser=[1000], –minPrefsPerUser=[1], –output=[temp/preparePreferenceMatrix], –ratingShift=[0.0], –startPhase=[0], –tempDir=[temp]}
…omit…
File Input Format Counters
Bytes Read=287
File Output Format Counters
Bytes Written=32
14/09/04 05:46:56 INFO driver.MahoutDriver: Program took 434965 ms (Minutes: 7.249416666666667)


#hadoop fs -cat output/part-r-00000
3 [3:4.4787264]
4 [3:1.5212735]