推荐系统自己实践-----基于用户的推荐

    最近,计划把推荐系统的几种方法一一用《集体智慧编程》这本书的代码实现。一来是为了自己更加熟练python语言的用法,然后可以更好的去理解这些推荐系统的概念。今天是第一个,是基于用户的推荐。怎么去理解呢?就是利用用户之间的关系去推荐。还不明白?那就引用《推荐系统实践》上说的。每当新学期,刚进实验室的学弟学妹总会问学长学姐们,应该需要去看哪些书?看哪些论文等等。我们作为学长学姐的肯定要告诉学弟学妹们应该看那些。第一,学弟学妹们信任我们,所以我们要负责任的去推荐;第二,他们跟我们是一个实验室的,基本方向和要学的东西是一样的,所以有共同的兴趣和爱好。聪明的你们肯定明白了什么事基于用户的推荐吧!

  好的,原理说完,那就来代码吧!

# this is for user-based collaborative filter recommendation algorithms
# i hope it  can  help  us.i learn it from this book named
#'programming collective intalligence'
#if you have some question ,please let me know.


#my email:wbglearn@gmail.com
#my qq is 354475072
#my blog is http://blog.csdn.net/wbgxx333


from  math import sqrt

#computer distance between person1 and person2
def sim_distance(prefs,person1,person2):
  #get the list of shared_items
   si={}
   for item in prefs[person1]:
       if item in prefs[person2]:
           si[item]=1

   # if they have no ratings in common, return 0
   if len(si)==0: return 0

   # Add up the squares of all the differences
   sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2)
                        for item in prefs[person1] if item in prefs[person2]])

   return 1/(1+sum_of_squares)

#compute the distance of pearson corrlation coefficient for p1 and p2
def sim_pearson(prefs,p1,p2):
    #get the list of mutally rated items
    si={}
    for item in prefs[p1]:
        for item in prefs[p2]:
            si[item]=1

    #if they are no ratings in common ,return 0
    if len(si)==0:
        return 0

    #sum calculations
    n=len(si)

    #sums of  the squares
    sum1=sum([prefs[p1][it] for it in si])
    sum2=sum([prefs[p2][it] for it in si])

    #sums of the squares
    sum1sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2sq=sum([pow(prefs[p2][it],2) for it in si])

    #sum of the products
    psum=sum([prefs[p1][it]*prefs[p2][it] for it in si])

    #calculate r(pearson score)
    num=psum-(sum1*sum2/n)
    den=sqrt((sum1sq-pow(sum1,2)/n)*(sum2sq-pow(sum2,2)/n))
    if den==0:
        return 0

    r=num/den


    return r


#return the best mathes for person from the prefs dicyionary.
#number of results and similarity function are optional params.
def topmathes(prefs,person,n=5,similarity=sim_pearson):
    scores=[(similarity(prefs,person,other),other)
            for other in prefs if other!=person]
    scores.sort()
    scores.reverse()
    return scores[0:n]

#get recommendations for a person by using a weighted average of
#every other user's ranking
def getrecommendations(prefs,person,similarity=sim_pearson):
    totals={}
    simsums={}
    for other in prefs:
        #donot compare me to myself
        if other==person: continue
        sim=similarity(prefs,person,other)

        #ignore scores of zero or lower
        if sim<=0:continue
        for  item in prefs[other]:

            #only score movies i havenot seen yet
            if item not in prefs[person] or prefs[person][item]==0:
                #similarity *score
                totals.setdefault(item,0)
                totals[item]+=prefs[otner][item]*sim
                #sum of similarities
                simsums.setdefault(item,0)
                simsums[item]+=sim

    #creat the normalized list
    rankings=[(total/simsums[item],item) for item,total in totals.items()]

    #reurn the sorted list
    rankings.sort()
    rankings.reverse()
    return rankings

def loadmovielens(path='D:/Python27/data'):
   #get movie titles
   movies={}
   for line in open(path+'/u.item'):
       (id,title)=line.split('|')[0:2]
       movies[id]=title

   # Load data
   prefs={}
   for line in open(path+'/u.data'):
        (user,movieid,rating,ts)=line.split('\t')
        prefs.setdefault(user,{})
        prefs[user][movies[movieid]]=float(rating)
   return prefs
     

 代码就在上面。具体的介绍可以去看《集体智慧编程》这本书。当然也可以问我。呵呵……

 此外,需要说明的是这里只是简单的实现了这个功能。当遇到不同的数据时,你自己需要去看数据的格式,然后去加载数据。

附上实验结果:

>>> prefs=loadMovieLens()
>>> prefs['87']
{'Birdcage, The (1996)': 4.0, 'E.T. the Extra-Terrestrial (1982)': 3.0, 'Bananas (1971)': 5.0, 'Sting, The (1973)': 5.0, 'Bad Boys (1995)': 4.0, 'In the Line of Fire (1993)': 5.0, 'Star Trek: The Wrath of Khan (1982)': 5.0, 'Speechless (1994)': 4.0, 'Mission: Impossible (1996)': 4.0, 'Return of the Pink Panther, The (1974)': 4.0, 'Under Siege (1992)': 4.0, 'I.Q. (1994)': 5.0, 'Evil Dead II (1987)': 2.0, 'Heat (1995)': 3.0, 'Naked Gun 33 1/3: The Final Insult (1994)': 4.0, 'Star Trek III: The Search for Spock (1984)': 4.0, 'Executive Decision (1996)': 3.0, 'Endless Summer 2, The (1994)': 3.0, 'Serial Mom (1994)': 1.0, 'Butch Cassidy and the Sundance Kid (1969)': 5.0, 'GoldenEye (1995)': 4.0, 'Private Benjamin (1980)': 4.0, 'Boot, Das (1981)': 4.0, "City Slickers II: The Legend of Curly's Gold (1994)": 3.0, 'Heathers (1989)': 3.0, 'That Old Feeling (1997)': 4.0, 'Brady Bunch Movie, The (1995)': 2.0, 'Good, The Bad and The Ugly, The (1966)': 5.0, 'Down Periscope (1996)': 4.0, "Ulee's Gold (1997)": 3.0, 'Jeffrey (1995)': 3.0, 'Strange Days (1995)': 3.0, 'Dave (1993)': 4.0, 'Demolition Man (1993)': 3.0, 'Reality Bites (1994)': 3.0, 'Big Green, The (1995)': 3.0, 'Get Shorty (1995)': 5.0, 'Manchurian Candidate, The (1962)': 4.0, 'Batman & Robin (1997)': 4.0, 'Stargate (1994)': 5.0, 'Dead Man Walking (1995)': 4.0, 'Clear and Present Danger (1994)': 5.0, 'Net, The (1995)': 5.0, 'Ed Wood (1994)': 3.0, 'Fugitive, The (1993)': 5.0, 'Clockwork Orange, A (1971)': 4.0, 'Victor/Victoria (1982)': 4.0, "Joe's Apartment (1996)": 2.0, 'Magnificent Seven, The (1954)': 5.0, 'Star Wars (1977)': 5.0, 'To Die For (1995)': 3.0, 'Bridge on the River Kwai, The (1957)': 5.0, 'Maverick (1994)': 3.0, 'Full Metal Jacket (1987)': 4.0, 'Vegas Vacation (1997)': 4.0, 'Pulp Fiction (1994)': 4.0, 'Strictly Ballroom (1992)': 3.0, 'Days of Thunder (1990)': 5.0, 'Something to Talk About (1995)': 2.0, 'Son in Law (1993)': 4.0, 'That Thing You Do! (1996)': 4.0, "Schindler's List (1993)": 4.0, 'Tommy Boy (1995)': 4.0, 'Jimmy Hollywood (1994)': 3.0, 'Clueless (1995)': 4.0, 'Wizard of Oz, The (1939)': 5.0, 'Dances with Wolves (1990)': 5.0, 'Multiplicity (1996)': 3.0, 'Young Frankenstein (1974)': 5.0, 'Jack (1996)': 3.0, 'Big Squeeze, The (1996)': 2.0, 'Godfather, The (1972)': 4.0, 'Barcelona (1994)': 3.0, 'Milk Money (1994)': 4.0, 'Mrs. Doubtfire (1993)': 4.0, 'Cops and Robbersons (1994)': 3.0, 'So I Married an Axe Murderer (1993)': 2.0, 'Groundhog Day (1993)': 5.0, 'Four Weddings and a Funeral (1994)': 5.0, 'Home Alone (1990)': 4.0, 'Terminator 2: Judgment Day (1991)': 5.0, 'Boomerang (1992)': 3.0, 'Ace Ventura: Pet Detective (1994)': 4.0, 'Great White Hype, The (1996)': 3.0, 'Die Hard: With a Vengeance (1995)': 4.0, 'Fargo (1996)': 5.0, 'Fish Called Wanda, A (1988)': 5.0, 'Prefontaine (1997)': 5.0, 'Young Guns (1988)': 3.0, 'Empire Strikes Back, The (1980)': 5.0, 'Citizen Kane (1941)': 4.0, 'Dumb & Dumber (1994)': 4.0, 'Crow, The (1994)': 3.0, 'Swimming with Sharks (1995)': 3.0, '2001: A Space Odyssey (1968)': 5.0, 'Matilda (1996)': 3.0, 'Man of the House (1995)': 3.0, 'Star Trek: The Motion Picture (1979)': 3.0, 'Return of the Jedi (1983)': 5.0, 'Grumpier Old Men (1995)': 4.0, 'Jurassic Park (1993)': 5.0, 'Treasure of the Sierra Madre, The (1948)': 4.0, 'Renaissance Man (1994)': 5.0, 'Program, The (1993)': 3.0, "Monty Python's Life of Brian (1979)": 4.0, 'Sneakers (1992)': 4.0, 'Twister (1996)': 4.0, 'GoodFellas (1990)': 4.0, "Dante's Peak (1997)": 3.0, 'Adventures of Priscilla, Queen of the Desert, The (1994)': 3.0, 'Switchblade Sisters (1975)': 2.0, 'Dragonheart (1996)': 4.0, 'Lightning Jack (1994)': 3.0, 'River Wild, The (1994)': 4.0, 'Raiders of the Lost Ark (1981)': 5.0, 'Air Up There, The (1994)': 3.0, "Pyromaniac's Love Story, A (1995)": 3.0, 'Young Guns II (1990)': 2.0, 'Die Hard (1988)': 4.0, 'Top Gun (1986)': 5.0, 'Truth About Cats & Dogs, The (1996)': 4.0, 'While You Were Sleeping (1995)': 5.0, 'Braveheart (1995)': 4.0, 'Raising Arizona (1987)': 3.0, 'Batman (1989)': 3.0, 'To Kill a Mockingbird (1962)': 4.0, 'Mother (1996)': 2.0, 'Kingpin (1996)': 4.0, 'Supercop (1992)': 3.0, 'Dunston Checks In (1996)': 1.0, 'Deer Hunter, The (1978)': 3.0, 'Up in Smoke (1978)': 3.0, 'Cool Hand Luke (1967)': 5.0, 'Wyatt Earp (1994)': 3.0, 'Annie Hall (1977)': 4.0, 'Blues Brothers, The (1980)': 5.0, 'True Lies (1994)': 5.0, 'Independence Day (ID4) (1996)': 5.0, 'Professional, The (1994)': 4.0, "It's a Wonderful Life (1946)": 5.0, 'Blade Runner (1982)': 4.0, 'Low Down Dirty Shame, A (1994)': 3.0, 'Baby-Sitters Club, The (1995)': 2.0, 'Sabrina (1995)': 4.0, 'I Love Trouble (1994)': 3.0, 'Mask, The (1994)': 3.0, 'Indiana Jones and the Last Crusade (1989)': 5.0, 'Nine Months (1995)': 4.0, 'French Kiss (1995)': 5.0, 'Shawshank Redemption, The (1994)': 5.0, 'Batman Returns (1992)': 3.0, 'Addams Family Values (1993)': 2.0, 'Junior (1994)': 4.0, 'Adventures of Robin Hood, The (1938)': 5.0, 'Mars Attacks! (1996)': 3.0, 'Waterworld (1995)': 4.0, 'Major Payne (1994)': 3.0, 'Con Air (1997)': 4.0, 'Sleepers (1996)': 4.0, 'Air Force One (1997)': 3.0, 'Alien (1979)': 4.0, 'Nutty Professor, The (1996)': 4.0, 'Coneheads (1993)': 4.0, 'Raging Bull (1980)': 3.0, "Singin' in the Rain (1952)": 4.0, 'In the Army Now (1994)': 4.0, 'Glory (1989)': 4.0, 'Star Trek IV: The Voyage Home (1986)': 5.0, 'Forget Paris (1995)': 4.0, 'M*A*S*H (1970)': 5.0, 'Platoon (1986)': 3.0, 'House Arrest (1996)': 3.0, 'Speed 2: Cruise Control (1997)': 3.0, 'Terminator, The (1984)': 5.0, 'To Wong Foo, Thanks for Everything! Julie Newmar (1995)': 3.0, 'Cliffhanger (1993)': 3.0, 'Speed (1994)': 5.0, 'Desperado (1995)': 3.0, 'Michael (1996)': 4.0, 'Conan the Barbarian (1981)': 3.0, 'Hoop Dreams (1994)': 4.0, 'Mighty Aphrodite (1995)': 3.0, 'Twelve Monkeys (1995)': 4.0, 'Sleepless in Seattle (1993)': 5.0, 'My Favorite Year (1982)': 3.0, 'Sleeper (1973)': 4.0, 'Searching for Bobby Fischer (1993)': 4.0, 'Apocalypse Now (1979)': 4.0, 'Addicted to Love (1997)': 4.0, 'Hot Shots! Part Deux (1993)': 4.0, 'Quiet Man, The (1952)': 5.0, 'Babe (1995)': 5.0, 'When Harry Met Sally... (1989)': 5.0, 'Star Trek: First Contact (1996)': 4.0, 'American President, The (1995)': 5.0, 'Shadow, The (1994)': 3.0, 'Muppet Treasure Island (1996)': 3.0, 'Santa Clause, The (1994)': 4.0, 'Dead Poets Society (1989)': 5.0, 'First Wives Club, The (1996)': 2.0, 'Lost World: Jurassic Park, The (1997)': 3.0, 'Inkwell, The (1994)': 3.0, 'Broken Arrow (1996)': 3.0, 'Hard Target (1993)': 4.0, 'Grease (1978)': 4.0, 'This Is Spinal Tap (1984)': 5.0, 'Back to the Future (1985)': 5.0, "Weekend at Bernie's (1989)": 3.0, 'Cowboy Way, The (1994)': 3.0, 'Striptease (1996)': 2.0}
>>> getRecommendations(prefs,'87')[0:30]
[(5.0, 'They Made Me a Criminal (1939)'), (5.0, 'Star Kid (1997)'), (5.0, 'Santa with Muscles (1996)'), (5.0, 'Saint of Fort Washington, The (1993)'), (5.0, 'Marlene Dietrich: Shadow and Light (1996) '), (5.0, 'Great Day in Harlem, A (1994)'), (5.0, 'Entertaining Angels: The Dorothy Day Story (1996)'), (5.0, 'Boys, Les (1997)'), (4.89884443128923, 'Legal Deceit (1997)'), (4.815019082242709, 'Letter From Death Row, A (1998)'), (4.7321082983941425, 'Hearts and Minds (1996)'), (4.696244466490867, 'Pather Panchali (1955)'), (4.652397061026758, 'Lamerica (1994)'), (4.538723693474813, 'Leading Man, The (1996)'), (4.535081339106103, 'Mrs. Dalloway (1997)'), (4.532337612572981, 'Innocents, The (1961)'), (4.527998574747079, 'Casablanca (1942)'), (4.510270149719864, 'Everest (1998)'), (4.493967755428439, 'Dangerous Beauty (1998)'), (4.485151301801342, 'Wallace & Gromit: The Best of Aardman Animation (1996)'), (4.463287461290222, 'Wrong Trousers, The (1993)'), (4.450979436941035, 'Kaspar Hauser (1993)'), (4.431079071179518, 'Usual Suspects, The (1995)'), (4.427520682864959, 'Maya Lin: A Strong Clear Vision (1994)'), (4.414870784592075, 'Wedding Gift, The (1994)'), (4.377445252656464, 'Affair to Remember, An (1957)'), (4.376071110447771, 'Good Will Hunting (1997)'), (4.376011099001396, 'As Good As It Gets (1997)'), (4.374146179500976, 'Anna (1996)'), (4.367437266504598, 'Close Shave, A (1995)')]

 好了,欢迎指正。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值