给予DeepMatch框架进行召回实战

本文详细介绍了阿里巴巴的DeepMatch框架在推荐系统召回阶段的应用,涉及数据预处理、安装步骤、以及与deepctr结合的实战示例,包括movielens数据集的实例分析和faiss加速相似度搜索的探讨。
摘要由CSDN通过智能技术生成

什么是DeepMatch?

众所周知,推荐系统包括召回(match)->粗排(rank)->精排(rank)->重排(rerank)。阿里巴巴的浅梦大神对于召回和排序分别开发了两套框架deepctr, deepmatch。

如何安装deepctr和deepmatch?

截止到2021-10-02,deepmatch只支持tf到1.x版本,tf-2.0.0及以上版本暂时不支持deepmatch,且deepmatch依赖于deepctr的0.8.2版本。安装deepctr:pip install deepctr[cpu]以及pip install deepctr[gpu],安装tensorflow:pip install tensorflow==1.14.0,安装deepmatch:pip install -U deepmatch

什么是faiss?

faiss是为稠密向量提供高效相似度搜索和聚类的框架。由Facebook AI Research研发。 具有以下特性。

1、提供多种检索方法
2、速度快
3、可存在内存和磁盘中
4、C++实现,提供Python封装调用。
5、大部分算法支持GPU实现

召回实战

接下来我们利用deepmatch以及deepctr进行召回实战。
movielens_sample.txt

user_id,movie_id,rating,timestamp,title,genres,gender,age,occupation,zip
1,1193,5,978300760,One Flew Over the Cuckoo's Nest (1975),Drama,F,1,10,48067
1,661,3,978302109,James and the Giant Peach (1996),Animation|Children's|Musical,F,1,10,48067
1,914,3,978301968,My Fair Lady (1964),Musical|Romance,F,1,10,48067
1,3408,4,978300275,Erin Brockovich (2000),Drama,F,1,10,48067
1,2355,5,978824291,"Bug's Life, A (1998)",Animation|Children's|Comedy,F,1,10,48067
1,1197,3,978302268,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,F,1,10,48067
1,1287,5,978302039,Ben-Hur (1959),Action|Adventure|Drama,F,1,10,48067
1,2804,5,978300719,"Christmas Story, A (1983)",Comedy|Drama,F,1,10,48067
1,594,4,978302268,Snow White and the Seven Dwarfs (1937),Animation|Children's|Musical,F,1,10,48067
1,919,4,978301368,"Wizard of Oz, The (1939)",Adventure|Children's|Drama|Musical,F,1,10,48067
1,595,5,978824268,Beauty and the Beast (1991),Animation|Children's|Musical,F,1,10,48067
1,938,4,978301752,Gigi (1958),Musical,F,1,10,48067
1,2398,4,978302281,Miracle on 34th Street (1947),Drama,F,1,10,48067
1,2918,4,978302124,Ferris Bueller's Day Off (1986),Comedy,F,1,10,48067
1,1035,5,978301753,"Sound of Music, The (1965)",Musical,F,1,10,48067
1,2791,4,978302188,Airplane! (1980),Comedy,F,1,10,48067
1,2687,3,978824268,Tarzan (1999),Animation|Children's,F,1,10,48067
1,2018,4,978301777,Bambi (1942),Animation|Children's,F,1,10,48067
1,3105,5,978301713,Awakenings (1990),Drama,F,1,10,48067
1,2797,4,978302039,Big (1988),Comedy|Fantasy,F,1,10,48067
1,2321,3,978302205,Pleasantville (1998),Comedy,F,1,10,48067
1,720,3,978300760,Wallace & Gromit: The Best of Aardman Animation (1996),Animation,F,1,10,48067
1,1270,5,978300055,Back to the Future (1985),Comedy|Sci-Fi,F,1,10,48067
1,527,5,978824195,Schindler's List (1993),Drama|War,F,1,10,48067
1,2340,3,978300103,Meet Joe Black (1998),Romance,F,1,10,48067
1,48,5,978824351,Pocahontas (1995),Animation|Children's|Musical|Romance,F,1,10,48067
1,1097,4,978301953,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi,F,1,10,48067
1,1721,4,978300055,Titanic (1997),Drama|Romance,F,1,10,48067
1,1545,4,978824139,Ponette (1996),Drama,F,1,10,48067
1,745,3,978824268,"Close Shave, A (1995)",Animation|Comedy|Thriller,F,1,10,48067
1,2294,4,978824291,Antz (1998),Animation|Children's,F,1,10,48067
1,3186,4,978300019,"Girl, Interrupted (1999)",Drama,F,1,10,48067
1,1566,4,978824330,Hercules (1997),Adventure|Animation|Children's|Comedy|Musical,F,1,10,48067
1,588,4,978824268,Aladdin (1992),Animation|Children's|Comedy|Musical,F,1,10,48067
1,1907,4,978824330,Mulan (1998),Animation|Children's,F,1,10,48067
1,783,4,978824291,"Hunchback of Notre Dame, The (1996)",Animation|Children's|Musical,F,1,10,48067
1,1836,5,978300172,"Last Days of Disco, The (1998)",Drama,F,1,10,48067
1,1022,5,978300055,Cinderella (1950),Animation|Children's|Musical,F,1,10,48067
1,2762,4,978302091,"Sixth Sense, The (1999)",Thriller,F,1,10,48067
1,150,5,978301777,Apollo 13 (1995),Drama,F,1,10,48067
1,1,5,978824268,Toy Story (1995),Animation|Children's|Comedy,F,1,10,48067
1,1961,5,978301590,Rain Man (1988),Drama,F,1,10,48067
1,1962,4,978301753,Driving Miss Daisy (1989),Drama,F,1,10,48067
1,2692,4,978301570,Run Lola Run (Lola rennt) (1998),Action|Crime|Romance,F,1,10,48067
1,260,4,978300760,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,F,1,10,48067
1,1028,5,978301777,Mary Poppins (1964),Children's|Comedy|Musical,F,1,10,48067
1,1029,5,978302205,Dumbo (1941),Animation|Children's|Musical,F,1,10,48067
1,1207,4,978300719,To Kill a Mockingbird (1962),Drama,F,1,10,48067
1,2028,5,978301619,Saving Private Ryan (1998),Action|Drama|War,F,1,10,48067
1,531,4,978302149,"Secret Garden, The (1993)",Children's|Drama,F,1,10,48067
1,3114,4,978302174,Toy Story 2 (1999),Animation|Children's|Comedy,F,1,10,48067
1,608,4,978301398,Fargo (1996),Crime|Drama|Thriller,F,1,10,48067
1,1246,4,978302091,Dead Poets Society (1989),Drama,F,1,10,48067
2,1193,5,978298413,One Flew Over the Cuckoo's Nest (1975),Drama,M,56,16,70072
2,3105,4,978298673,Awakenings (1990),Drama,M,56,16,70072
2,2321,3,978299666,Pleasantville (1998),Comedy,M,56,16,70072
2,1962,5,978298813,Driving Miss Daisy (1989),Drama,M,56,16,70072
2,1207,4,978298478,To Kill a Mockingbird (1962),Drama,M,56,16,70072
2,2028,4,978299773,Saving Private Ryan (1998),Action|Drama|War,M,56,16,70072
2,1246,5,978299418,Dead Poets Society (1989),Drama,M,56,16,70072
2,1357,5,978298709,Shine (1996),Drama|Romance,M,56,16,70072
2,3068,4,978299000,"Verdict, The (1982)",Drama,M,56,16,70072
2,1537,4,978299620,Shall We Dance? (Shall We Dansu?) (1996),Comedy,M,56,16,70072
2,647,3,978299351,Courage Under Fire (1996),Drama|War,M,56,16,70072
2,2194,4,978299297,"Untouchables, The (1987)",Action|Crime|Drama,M,56,16,70072
2,648,4,978299913,Mission: Impossible (1996),Action|Adventure|Mystery,M,56,16,70072
2,2268,5,978299297,"Few Good Men, A (1992)",Crime|Drama,M,56,16,70072
2,2628,3,978300051,Star Wars: Episode I - The Phantom Menace (1999),Action|Adventure|Fantasy|Sci-Fi,M,56,16,70072
2,1103,3,978298905,Rebel Without a Cause (1955),Drama,M,56,16,70072
2,2916,3,978299809,Total Recall (1990),Action|Adventure|Sci-Fi|Thriller,M,56,16,70072
2,3468,5,978298542,"Hustler, The (1961)",Drama,M,56,16,70072
2,1210,4,978298151,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War,M,56,16,70072
2,1792,3,978299941,U.S. Marshalls (1998),Action|Thriller,M,56,16,70072
2,1687,3,978300174,"Jackal, The (1997)",Action|Thriller,M,56,16,70072
2,1213,2,978298458,GoodFellas (1990),Crime|Drama,M,56,16,70072
2,3578,5,978298958,Gladiator (2000),Action|Drama,M,56,16,70072
2,2881,3,978300002,Double Jeopardy (1999),Action|Thriller,M,56,16,70072
2,3030,4,978298434,Yojimbo (1961),Comedy|Drama|Western,M,56,16,70072
2,1217,3,978298151,Ran (1985),Drama|War,M,56,16,70072
2,434,2,978300174,Cliffhanger (1993),Action|Adventure|Crime,M,56,16,70072
2,2126,3,978300123,Snake Eyes (1998),Action|Crime|Mystery|Thriller,M,56,16,70072
2,3107,2,978300002,Backdraft (1991),Action|Drama,M,56,16,70072
2,3108,3,978299712,"Fisher King, The (1991)",Comedy|Drama|Romance,M,56,16,70072
2,3035,4,978298625,Mister Roberts (1955),Comedy|Drama|War,M,56,16,70072
2,1253,3,978299120,"Day the Earth Stood Still, The (1951)",Drama|Sci-Fi,M,56,16,70072
2,1610,5,978299809,"Hunt for Red October, The (1990)",Action|Thriller,M,56,16,70072
2,292,3,978300123,Outbreak (1995),Action|Drama|Thriller,M,56,16,70072
2,2236,5,978299220,Simon Birch (1998),Drama,M,56,16,70072
2,3071,4,978299120,Stand and Deliver (1987),Drama,M,56,16,70072
2,902,2,978298905,Breakfast at Tiffany's (1961),Drama|Romance,M,56,16,70072
2,368,4,978300002,Maverick (1994),Action|Comedy|Western,M,56,16,70072
2,1259,5,978298841,Stand by Me (1986),Adventure|Comedy|Drama,M,56,16,70072
2,3147,5,978298652,"Green Mile, The (1999)",Drama|Thriller,M,56,16,70072
2,1544,4,978300174,"Lost World: Jurassic Park, The (1997)",Action|Adventure|Sci-Fi|Thriller,M,56,16,70072
2,1293,5,978298261,Gandhi (1982),Drama,M,56,16,70072
2,1188,4,978299620,Strictly Ballroom (1992),Comedy|Romance,M,56,16,70072
2,3255,4,978299321,"League of Their Own, A (1992)",Comedy|Drama,M,56,16,70072
2,3256,2,978299839,Patriot Games (1992),Action|Thriller,M,56,16,70072
2,3257,3,978300073,"Bodyguard, The (1992)",Action|Drama|Romance|Thriller,M,56,16,70072
2,110,5,978298625,Braveheart (1995),Action|Drama|War,M,56,16,70072
2,2278,3,978299889,Ronin (1998),Action|Crime|Thriller,M,56,16,70072
2,2490,3,978299966,Payback (1999),Action|Thriller,M,56,16,70072
2,1834,4,978298813,"Spanish Prisoner, The (1997)",Drama|Thriller,M,56,16,70072
2,3471,5,978298814,Close Encounters of the Third Kind (1977),Drama|Sci-Fi,M,56,16,70072
2,589,4,978299773,Terminator 2: Judgment Day (1991),Action|Sci-Fi|Thriller,M,56,16,70072
2,1690,3,978300051,Alien: Resurrection (1997),Action|Horror|Sci-Fi,M,56,16,70072
2,3654,3,978298814,"Guns of Navarone, The (1961)",Action|Drama|War,M,56,16,70072
2,2852,3,978298958,"Soldier's Story, A (1984)",Drama,M,56,16,70072
2,1945,5,978298458,On the Waterfront (1954),Crime|Drama,M,56,16,70072
2,982,4,978299269,Picnic (1955),Drama,M,56,16,70072
2,1873,4,978298542,"Mis�rables, Les (1998)",Drama,M,56,16,70072
2,2858,4,978298434,American Beauty (1999),Comedy|Drama,M,56,16,70072
2,1225,5,978298391,Amadeus (1984),Drama,M,56,16,70072
2,515,5,978298542,"Remains of the Day, The (1993)",Drama,M,56,16,70072
2,442,3,978300025,Demolition Man (1993),Action|Sci-Fi,M,56,16,70072
2,2312,3,978299046,Children of a Lesser God (1986),Drama,M,56,16,70072
2,265,4,978299026,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance,M,56,16,70072
2,1408,3,978299839,"Last of the Mohicans, The (1992)",Action|Romance|War,M,56,16,70072
2,1084,3,978298813,Bonnie and Clyde (1967),Crime|Drama,M,56,16,70072
2,3699,2,978299173,Starman (1984),Adventure|Drama|Romance|Sci-Fi,M,56,16,70072
2,480,5,978299809,Jurassic Park (1993),Action|Adventure|Sci-Fi,M,56,16,70072
2,1442,4,978299297,Prefontaine (1997),Drama,M,56,16,70072
2,2067,5,978298625,Doctor Zhivago (1965),Drama|Romance|War,M,56,16,70072
2,1265,3,978299712,Groundhog Day (1993),Comedy|Romance,M,56,16,70072
2,1370,5,978299889,Die Hard 2 (1990),Action|Thriller,M,56,16,70072
2,1801,3,978300002,"Man in the Iron Mask, The (1998)",Action|Drama|Romance,M,56,16,70072
2,1372,3,978299941,Star Trek VI: The Undiscovered Country (1991),Action|Adventure|Sci-Fi,M,56,16,70072
2,2353,4,978299861,Enemy of the State (1998),Action|Thriller,M,56,16,70072
2,3334,4,978298958,Key Largo (1948),Crime|Drama|Film-Noir|Thriller,M,56,16,70072
2,2427,2,978299913,"Thin Red Line, The (1998)",Action|Drama|War,M,56,16,70072
2,590,5,978299083,Dances with Wolves (1990),Adventure|Drama|Western,M,56,16,70072
2,1196,5,978298730,Star Wars: Episode V - The Empire Strikes Back (1980),Action|Adventure|Drama|Sci-Fi|War,M,56,16,70072
2,1552,3,978299941,Con Air (1997),Action|Adventure|Thriller,M,56,16,70072
2,736,4,978300100,Twister (1996),Action|Adventure|Romance|Thriller,M,56,16,70072
2,1198,4,978298124,Raiders of the Lost Ark (1981),Action|Adventure,M,56,16,70072
2,593,5,978298517,"Silence of the Lambs, The (1991)",Drama|Thriller,M,56,16,70072
2,2359,3,978299666,Waking Ned Devine (1998),Comedy,M,56,16,70072
2,95,2,978300143,Broken Arrow (1996),Action|Thriller,M,56,16,70072
2,2717,3,978298196,Ghostbusters II (1989),Comedy|Horror,M,56,16,70072
2,2571,4,978299773,"Matrix, The (1999)",Action|Sci-Fi|Thriller,M,56,16,70072
2,1917,3,978300174,Armageddon (1998),Action|Adventure|Sci-Fi|Thriller,M,56,16,70072
2,2396,4,978299641,Shakespeare in Love (1998),Comedy|Romance,M,56,16,70072
2,3735,3,978298814,Serpico (1973),Crime|Drama,M,56,16,70072
2,1953,4,978298775,"French Connection, The (1971)",Action|Crime|Drama|Thriller,M,56,16,70072
2,1597,3,978300025,Conspiracy Theory (1997),Action|Mystery|Romance|Thriller,M,56,16,70072
2,3809,3,978299712,What About Bob? (1991),Comedy,M,56,16,70072
2,1954,5,978298841,Rocky (1976),Action|Drama,M,56,16,70072
2,1955,4,978299200,Kramer Vs. Kramer (1979),Drama,M,56,16,70072
2,235,3,978299351,Ed Wood (1994),Comedy|Drama,M,56,16,70072
2,1124,5,978299418,On Golden Pond (1981),Drama,M,56,16,70072
2,1957,5,978298750,Chariots of Fire (1981),Drama,M,56,16,70072
2,163,4,978299809,Desperado (1995),Action|Romance|Thriller,M,56,16,70072
2,21,1,978299839,Get Shorty (1995),Action|Comedy|Drama,M,56,16,70072
2,165,3,978300002,Die Hard: With a Vengeance (1995),Action|Thriller,M,56,16,70072
2,1090,2,978298580,Platoon (1986),Drama|War,M,56,16,70072
2,380,5,978299809,True Lies (1994),Action|Adventure|Comedy|Romance,M,56,16,70072
2,2501,5,978298600,October Sky (1999),Drama,M,56,16,70072
2,349,4,978299839,Clear and Present Danger (1994),Action|Adventure|Thriller,M,56,16,70072
2,457,4,978299773,"Fugitive, The (1993)",Action|Thriller,M,56,16,70072
2,1096,4,978299386,Sophie's Choice (1982),Drama,M,56,16,70072
2,920,5,978298775,Gone with the Wind (1939),Drama|Romance|War,M,56,16,70072
2,459,3,978300002,"Getaway, The (1994)",Action,M,56,16,70072
2,1527,4,978299839,"Fifth Element, The (1997)",Action|Sci-Fi,M,56,16,70072
2,3418,4,978299809,Thelma & Louise (1991),Action|Drama,M,56,16,70072
2,1385,3,978299966,Under Siege (1992),Action,M,56,16,70072
2,3451,4,978298924,Guess Who's Coming to Dinner (1967),Comedy|Drama,M,56,16,70072
2,3095,4,978298517,"Grapes of Wrath, The (1940)",Drama,M,56,16,70072
2,780,3,978299966,Independence Day (ID4) (1996),Action|Sci-Fi|War,M,56,16,70072
2,498,3,978299418,Mr. Jones (1993),Drama|Romance,M,56,16,70072
2,2728,3,978298881,Spartacus (1960),Drama,M,56,16,70072
2,2002,5,978300100,Lethal Weapon 3 (1992),Action|Comedy|Crime|Drama,M,56,16,70072
2,1784,5,978298841,As Good As It Gets (1997),Comedy|Drama,M,56,16,70072
2,2943,4,978298372,Indochine (1992),Drama|Romance,M,56,16,70072
2,2006,3,978299861,"Mask of Zorro, The (1998)",Action|Adventure|Romance,M,56,16,70072
2,318,5,978298413,"Shawshank Redemption, The (1994)",Drama,M,56,16,70072
2,1968,2,978298881,"Breakfast Club, The (1985)",Comedy|Drama,M,56,16,70072
2,3678,3,978299250,"Man with the Golden Arm, The (1955)",Drama,M,56,16,70072
2,1244,3,978299143,Manhattan (1979),Comedy|Drama|Romance,M,56,16,70072
2,356,5,978299686,Forrest Gump (1994),Comedy|Romance|War,M,56,16,70072
2,1245,2,978299200,Miller's Crossing (1990),Drama,M,56,16,70072
2,3893,1,978299535,Nurse Betty (2000),Comedy|Thriller,M,56,16,70072
2,1247,5,978298652,"Graduate, The (1967)",Drama|Romance,M,56,16,70072
3,2355,5,978298430,"Bug's Life, A (1998)",Animation|Children's|Comedy,M,25,15,55117
3,1197,5,978297570,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,M,25,15,55117
3,1270,3,978298231,Back to the Future (1985),Comedy|Sci-Fi,M,25,15,55117
3,1961,4,978297095,Rain Man (1988),Drama,M,25,15,55117
3,260,5,978297512,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,M,25,15,55117
3,3114,3,978298103,Toy Story 2 (1999),Animation|Children's|Comedy,M,25,15,55117
3,648,3,978297867,Mission: Impossible (1996),Action|Adventure|Mystery,M,25,15,55117
3,1210,4,978297600,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War,M,25,15,55117
3,1259,5,978298296,Stand by Me (1986),Adventure|Comedy|Drama,M,25,15,55117
3,2858,4,978297039,American Beauty (1999),Comedy|Drama,M,25,15,55117
3,480,4,978297690,Jurassic Park (1993),Action|Adventure|Sci-Fi,M,25,15,55117
3,1265,2,978298316,Groundhog Day (1993),Comedy|Romance,M,25,15,55117
3,590,4,978297439,Dances with Wolves (1990),Adventure|Drama|Western,M,25,15,55117
3,1196,4,978297539,Star Wars: Episode V - The Empire Strikes Back (1980),Action|Adventure|Drama|Sci-Fi|War,M,25,15,55117
3,1198,5,978297570,Raiders of the Lost Ark (1981),Action|Adventure,M,25,15,55117
3,593,3,978297018,"Silence of the Lambs, The (1991)",Drama|Thriller,M,25,15,55117
3,2006,4,978297757,"Mask of Zorro, The (1998)",Action|Adventure|Romance,M,25,15,55117
3,1968,4,978297068,"Breakfast Club, The (1985)",Comedy|Drama,M,25,15,55117
3,3421,4,978298147,Animal House (1978),Comedy,M,25,15,55117
3,1641,2,978298430,"Full Monty, The (1997)",Comedy,M,25,15,55117
3,1394,4,978298147,Raising Arizona (1987),Comedy,M,25,15,55117
3,3534,3,978297068,28 Days (2000),Comedy,M,25,15,55117
3,104,4,978298486,Happy Gilmore (1996),Comedy,M,25,15,55117
3,2735,4,978297867,"Golden Child, The (1986)",Action|Adventure|Comedy,M,25,15,55117
3,1431,3,978297095,Beverly Hills Ninja (1997),Action|Comedy,M,25,15,55117
3,3868,3,978298486,"Naked Gun: From the Files of Police Squad!, The (1988)",Comedy,M,25,15,55117
3,1079,5,978298296,"Fish Called Wanda, A (1988)",Comedy,M,25,15,55117
3,2997,3,978298147,Being John Malkovich (1999),Comedy,M,25,15,55117
3,1615,5,978297710,"Edge, The (1997)",Adventure|Thriller,M,25,15,55117
3,1291,4,978297600,Indiana Jones and the Last Crusade (1989),Action|Adventure,M,25,15,55117
3,653,4,978297757,Dragonheart (1996),Action|Adventure|Fantasy,M,25,15,55117
3,2167,5,978297600,Blade (1998),Action|Adventure|Horror,M,25,15,55117
3,1580,3,978297663,Men in Black (1997),Action|Adventure|Comedy|Sci-Fi,M,25,15,55117
3,3619,2,978298201,"Hollywood Knights, The (1980)",Comedy,M,25,15,55117
3,1049,4,978297805,"Ghost and the Darkness, The (1996)",Action|Adventure,M,25,15,55117
3,1261,1,978297663,Evil Dead II (Dead By Dawn) (1987),Action|Adventure|Comedy|Horror,M,25,15,55117
3,552,4,978297837,"Three Musketeers, The (1993)",Action|Adventure|Comedy,M,25,15,55117
3,1266,5,978297396,Unforgiven (1992),Western,M,25,15,55117
3,733,5,978297757,"Rock, The (1996)",Action|Adventure|Thriller,M,25,15,55117
3,1378,5,978297419,Young Guns (1988),Action|Comedy|Western,M,25,15,55117
3,1379,4,978297419,Young Guns II (1990),Action|Comedy|Western,M,25,15,55117
3,3552,5,978298459,Caddyshack (1980),Comedy,M,25,15,55117
3,1304,5,978298166,Butch Cassidy and the Sundance Kid (1969),Action|Comedy|Western,M,25,15,55117
3,2470,4,978297777,Crocodile Dundee (1986),Adventure|Comedy,M,25,15,55117
3,3168,4,978297570,Easy Rider (1969),Adventure|Drama,M,25,15,55117
3,2617,2,978297837,"Mummy, The (1999)",Action|Adventure|Horror|Thriller,M,25,15,55117
3,3671,5,978297419,Blazing Saddles (1974),Comedy|Western,M,25,15,55117
3,2871,4,978297539,Deliverance (1972),Adventure|Thriller,M,25,15,55117
3,2115,4,978297777,Indiana Jones and the Temple of Doom (1984),Action|Adventure,M,25,15,55117
3,1136,5,978298079,Monty Python and the Holy Grail (1974),Comedy,M,25,15,55117
3,2081,4,978298504,"Little Mermaid, The (1989)",Animation|Children's|Comedy|Musical|Romance,M,25,15,55117

main.py

import pandas as pd
from deepctr.feature_column import SparseFeat, VarLenSparseFeat
from preprocess import gen_data_set, gen_model_input
from sklearn.preprocessing import LabelEncoder
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import Model

from deepmatch.models import *
from deepmatch.utils import sampledsoftmaxloss

# 以movielens数据为例,取200条样例数据进行流程演示

data = pd.read_csvdata = pd.read_csv("./movielens_sample.txt")
sparse_features = ["movie_id", "user_id",
                   "gender", "age", "occupation", "zip", ]
SEQ_LEN = 50
negsample = 0

# 1. 首先对于数据中的特征进行ID化编码,然后使用 `gen_date_set` and `gen_model_input`来生成带有用户历史行为序列的特征数据

features = ['user_id', 'movie_id', 'gender', 'age', 'occupation', 'zip']
feature_max_idx = {}
for feature in features:
    lbe = LabelEncoder()
    data[feature] = lbe.fit_transform(data[feature]) + 1
    feature_max_idx[feature] = data[feature].max() + 1

user_profile = data[["user_id", "gender", "age", "occupation", "zip"]].drop_duplicates('user_id')

item_profile = data[["movie_id"]].drop_duplicates('movie_id')

user_profile.set_index("user_id", inplace=True)

user_item_list = data.groupby("user_id")['movie_id'].apply(list)

train_set, test_set = gen_data_set(data, negsample)

train_model_input, train_label = gen_model_input(train_set, user_profile, SEQ_LEN)
test_model_input, test_label = gen_model_input(test_set, user_profile, SEQ_LEN)

# 2. 配置一下模型定义需要的特征列,主要是特征名和embedding词表的大小

embedding_dim = 16

user_feature_columns = [SparseFeat('user_id', feature_max_idx['user_id'], embedding_dim),
                        SparseFeat("gender", feature_max_idx['gender'], embedding_dim),
                        SparseFeat("age", feature_max_idx['age'], embedding_dim),
                        SparseFeat("occupation", feature_max_idx['occupation'], embedding_dim),
                        SparseFeat("zip", feature_max_idx['zip'], embedding_dim),
                        VarLenSparseFeat(SparseFeat('hist_movie_id', feature_max_idx['movie_id'], embedding_dim,
                                                    embedding_name="movie_id"), SEQ_LEN, 'mean', 'hist_len'),
                        ]

item_feature_columns = [SparseFeat('movie_id', feature_max_idx['movie_id'], embedding_dim)]

# 3. 定义一个YoutubeDNN模型,分别传入用户侧特征列表`user_feature_columns`和物品侧特征列表`item_feature_columns`。然后配置优化器和损失函数,开始进行训练。

K.set_learning_phase(True)

model = YoutubeDNN(user_feature_columns, item_feature_columns, num_sampled=5, user_dnn_hidden_units=(64, 16))
# model = MIND(user_feature_columns,item_feature_columns,dynamic_k=True,p=1,k_max=2,num_sampled=5,user_dnn_hidden_units=(64,16),init_std=0.001)

model.compile(optimizer="adagrad", loss=sampledsoftmaxloss)  # "binary_crossentropy")

history = model.fit(train_model_input, train_label,  # train_label,
                    batch_size=256, epochs=1, verbose=1, validation_split=0.0, )

# 4. 训练完整后,由于在实际使用时,我们需要根据当前的用户特征实时产生用户侧向量,并对物品侧向量构建索引进行近似最近邻查找。这里由于是离线模拟,所以我们导出所有待测试用户的表示向量,和所有物品的表示向量。

test_user_model_input = test_model_input
all_item_model_input = {"movie_id": item_profile['movie_id'].values, "movie_idx": item_profile['movie_id'].values}

# 以下两行是deepmatch中的通用使用方法,分别获得用户向量模型和物品向量模型
user_embedding_model = Model(inputs=model.user_input, outputs=model.user_embedding)
item_embedding_model = Model(inputs=model.item_input, outputs=model.item_embedding)
# 输入对应的数据拿到对应的向量
user_embs = user_embedding_model.predict(test_user_model_input, batch_size=2 ** 12)
# user_embs = user_embs[:, i, :]  i in [0,k_max) if MIND
item_embs = item_embedding_model.predict(all_item_model_input, batch_size=2 ** 12)

print(user_embs.shape)
print(item_embs.shape)

# 5. [可选的]如果有安装faiss库的同学,可以体验以下将上一步导出的物品向量构建索引,然后用用户向量来进行ANN查找并评估效果

test_true_label = {line[0]: [line[2]] for line in test_set}
import numpy as np
import faiss
from tqdm import tqdm
from deepmatch.utils import recall_N

index = faiss.IndexFlatIP(embedding_dim)
# faiss.normalize_L2(item_embs)
index.add(item_embs)
# faiss.normalize_L2(user_embs)
D, I = index.search(user_embs, 50)
s = []
hit = 0
for i, uid in tqdm(enumerate(test_user_model_input['user_id'])):
    try:
        pred = [item_profile['movie_id'].values[x] for x in I[i]]
        filter_item = None
        recall_score = recall_N(test_true_label[uid], pred, N=50)
        s.append(recall_score)
        if test_true_label[uid] in pred:
            hit += 1
    except:
        print(i)
print("recall", np.mean(s))
print("hr", hit / len(test_user_model_input['user_id']))

preprocess.py

import random
import numpy as np
from tqdm import tqdm
from tensorflow.python.keras.preprocessing.sequence import pad_sequences

def gen_data_set(data, negsample=0):

    data.sort_values("timestamp", inplace=True)
    item_ids = data['movie_id'].unique()

    train_set = []
    test_set = []
    for reviewerID, hist in tqdm(data.groupby('user_id')):
        pos_list = hist['movie_id'].tolist()
        rating_list = hist['rating'].tolist()

        if negsample > 0:
            candidate_set = list(set(item_ids) - set(pos_list))
            neg_list = np.random.choice(candidate_set,size=len(pos_list)*negsample,replace=True)
        for i in range(1, len(pos_list)):
            hist = pos_list[:i]
            if i != len(pos_list) - 1:
                train_set.append((reviewerID, hist[::-1], pos_list[i], 1, len(hist[::-1]),rating_list[i]))
                for negi in range(negsample):
                    train_set.append((reviewerID, hist[::-1], neg_list[i*negsample+negi], 0,len(hist[::-1])))
            else:
                test_set.append((reviewerID, hist[::-1], pos_list[i],1,len(hist[::-1]),rating_list[i]))

    random.shuffle(train_set)
    random.shuffle(test_set)

    print(len(train_set[0]),len(test_set[0]))

    return train_set,test_set

def gen_data_set_sdm(data, seq_short_len=5, seq_prefer_len=50):

    data.sort_values("timestamp", inplace=True)
    train_set = []
    test_set = []
    for reviewerID, hist in tqdm(data.groupby('user_id')):
        pos_list = hist['movie_id'].tolist()
        genres_list = hist['genres'].tolist()
        rating_list = hist['rating'].tolist()
        for i in range(1, len(pos_list)):
            hist = pos_list[:i]
            genres_hist = genres_list[:i]
            if i <= seq_short_len and i != len(pos_list) - 1:
                train_set.append((reviewerID, hist[::-1], [0]*seq_prefer_len, pos_list[i], 1, len(hist[::-1]), 0,
                                  rating_list[i], genres_hist[::-1], [0]*seq_prefer_len))
            elif i != len(pos_list) - 1:
                train_set.append((reviewerID, hist[::-1][:seq_short_len], hist[::-1][seq_short_len:], pos_list[i], 1, seq_short_len,
                len(hist[::-1])-seq_short_len, rating_list[i], genres_hist[::-1][:seq_short_len], genres_hist[::-1][seq_short_len:]))
            elif i <= seq_short_len and i == len(pos_list) - 1:
                test_set.append((reviewerID, hist[::-1], [0] * seq_prefer_len, pos_list[i], 1, len(hist[::-1]), 0,
                                  rating_list[i], genres_hist[::-1], [0]*seq_prefer_len))
            else:
                test_set.append((reviewerID, hist[::-1][:seq_short_len], hist[::-1][seq_short_len:], pos_list[i], 1, seq_short_len,
                len(hist[::-1])-seq_short_len, rating_list[i], genres_hist[::-1][:seq_short_len], genres_hist[::-1][seq_short_len:]))

    random.shuffle(train_set)
    random.shuffle(test_set)

    print(len(train_set[0]), len(test_set[0]))

    return train_set, test_set

def gen_model_input(train_set,user_profile,seq_max_len):

    train_uid = np.array([line[0] for line in train_set])
    train_seq = [line[1] for line in train_set]
    train_iid = np.array([line[2] for line in train_set])
    train_label = np.array([line[3] for line in train_set])
    train_hist_len = np.array([line[4] for line in train_set])

    train_seq_pad = pad_sequences(train_seq, maxlen=seq_max_len, padding='post', truncating='post', value=0)
    train_model_input = {"user_id": train_uid, "movie_id": train_iid, "hist_movie_id": train_seq_pad,
                         "hist_len": train_hist_len}

    for key in ["gender", "age", "occupation", "zip"]:
        train_model_input[key] = user_profile.loc[train_model_input['user_id']][key].values

    return train_model_input, train_label

def gen_model_input_sdm(train_set, user_profile, seq_short_len, seq_prefer_len):

    train_uid = np.array([line[0] for line in train_set])
    short_train_seq = [line[1] for line in train_set]
    prefer_train_seq = [line[2] for line in train_set]
    train_iid = np.array([line[3] for line in train_set])
    train_label = np.array([line[4] for line in train_set])
    train_short_len = np.array([line[5] for line in train_set])
    train_prefer_len = np.array([line[6] for line in train_set])
    short_train_seq_genres = np.array([line[8] for line in train_set])
    prefer_train_seq_genres = np.array([line[9] for line in train_set])

    train_short_item_pad = pad_sequences(short_train_seq, maxlen=seq_short_len, padding='post', truncating='post',
                                        value=0)
    train_prefer_item_pad = pad_sequences(prefer_train_seq, maxlen=seq_prefer_len, padding='post', truncating='post',
                                         value=0)
    train_short_genres_pad = pad_sequences(short_train_seq_genres, maxlen=seq_short_len, padding='post', truncating='post',
                                        value=0)
    train_prefer_genres_pad = pad_sequences(prefer_train_seq_genres, maxlen=seq_prefer_len, padding='post', truncating='post',
                                        value=0)

    train_model_input = {"user_id": train_uid, "movie_id": train_iid, "short_movie_id": train_short_item_pad,
        "prefer_movie_id": train_prefer_item_pad, "prefer_sess_length": train_prefer_len, "short_sess_length":
        train_short_len, 'short_genres': train_short_genres_pad, 'prefer_genres': train_prefer_genres_pad}

    for key in ["gender", "age", "occupation", "zip"]:
        train_model_input[key] = user_profile.loc[train_model_input['user_id']][key].values

    return train_model_input, train_label

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值