读研这么久搞了不少推荐算法相关的东西,但重心都放在算法研究和实现上了。所以一直想做一个完整的带有人机交互的推荐网站,可惜的是之前能找到的资料都是JAVA WEB,对环境要求高,而且用的技术栈太多,相当繁琐,导致一直没有成功。最近终于在网上看到一本神书《Practical Recommender Systems; Kim Falk;January 2019》。不仅有算法,还有前后端实现。所以,中央决定了,就让我来写个利用Django做WEB框架的电影推荐系统。(PYTHON3+DJANGO)
一.项目整体结构
1.Main Page(主页)
- 卡牌式展示电影的区域
- 每个电影的详情介绍
- 要有个人推荐功能
- 要有基于电影类型分类的列表
2.详情页
- 电影海报
- 电影描述
- 电影评分
3.类别页
- 有和主页一样的结构
- 基于类别的专门的推荐
具体来说,MovieGEEKS是前端交互用的。Analytics是连接作用,监控其他部分,后面再说。collector是用来收集用户信息的(收集隐私的,哈哈哈),数据放到evidence数据库中。Recs是核心,提供推荐给网站。Recommendation builder是预先计算的推荐(离线推荐?暂时这么翻译吧)。大概主页长这样:
好!到此大概对整体框架有个感觉了。
二. 创建项目
我用Pycharm先创建一个Django项目,起名为MovieAngelSite
三. 创建应用
创建app,movieangel
在terminal里输入:
python manage.py startapp movieangel
四. 定义模型类
- 有一个数据表,就有一个模型类与之对应
- 打开models.py文件,定义模型类
- 引入包from django.db import models
- 模型类继承自models.Model类
- 说明:不需要定义主键列,在生成时会自动添加,并且值为自动增长
- 当输出对象时,会调用对象的str方法
- 自动生成的表名为app名和模型的小写名称的组合(用下划线_组合)
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models
# Create your models here.
class Genre(models.Model):
name = models.CharField(max_length=64)
def __str__(self):
return self.name
class Movie(models.Model):
movie_id = models.CharField(max_length=16, unique=True, primary_key=True)
title = models.CharField(max_length=128)
year = models.IntegerField(null=True)
genres = models.ManyToManyField(Genre, related_name='movies', db_table='movie_genre')
def __str__(self):
return self.title
五. 生成数据表
- 激活模型:编辑settings.py文件,将movieangel应用加入到installed_apps中
- 修改USE_TZ = False
- 生成迁移文件:根据模型类生成sql语句
python manage.py makemigrations
- 执行迁移:执行sql语句生成数据表
python manage.py migrate
六. 视图
- 在django中,视图对WEB请求进行回应
- 视图接收reqeust对象作为第一个参数,包含了请求的信息
- 视图就是一个Python函数,被定义在views.py中
首先,去https://www.themoviedb.org/account/signup这个网站注册下,拿到自己的api_key。 然后再文件根目录下,创建文件.prs,在里面加上
{ "themoviedb_apikey": "你的API_KEY"}
注意要加上“ ”双引号。
接着,在movieangel的views.py里加上
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import uuid, random
import json
from django.shortcuts import render
from django.views.decorators.csrf import ensure_csrf_cookie
from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger
from movieangel.models import Movie, Genre
# Create your views here.
# 默认首页
@ensure_csrf_cookie
def index(request):
genre_selected = request.GET.get('genre')
api_key = get_api_key()
if genre_selected:
selected = Genre.objects.filter(name=genre_selected)[0]
movies = selected.movies.order_by('-year', 'movie_id')
else:
movies = Movie.objects.order_by('-year', 'movie_id')
genres = get_genres()
page_number = request.GET.get("page", 1)
page, page_end, page_start = handle_pagination(movies, page_number)
context_dict = {'movies': page,
'genres': genres,
'api_key': api_key,
'session_id': session_id(request),
'user_id': user_id(request),
'pages': range(page_start, page_end),
}
return render(request, 'movieangel/index.html', context_dict)
# 分页
def handle_pagination(movies, page_number):
paginate_by = 18
paginator = Paginator(movies, paginate_by)
try:
page = paginator.page(page_number)
except PageNotAnInteger:
page_number = 1
page = paginator.page(page_number)
except EmptyPage:
page = paginator.page(paginator.num_pages)
page_number = int(page_number)
page_start = 1 if page_number < 5 else page_number - 3
page_end = 6 if page_number < 5 else page_number + 2
return page, page_end, page_start
# 获得API_KEY
def get_api_key():
# Load credentials
cred = json.loads(open(".prs").read())
return cred['themoviedb_apikey']
# 获得所有电影类型名
def get_genres():
return Genre.objects.all().values('name').distinct()
# 设置session_id
def session_id(request):
if not "session_id" in request.session:
request.session["session_id"] = str(uuid.uuid1())
return request.session["session_id"]
# 设置usr_id,没有就随机出来一个
def user_id(request):
user_id = request.GET.get("user_id")
if user_id:
request.session['user_id'] = user_id
if not "user_id" in request.session:
request.session['user_id'] = random.randint(10000000000, 90000000000)
print("ensured id: ", request.session['user_id'])
return request.session['user_id']
配置URLconf
- 在Django中,定义URLconf包括正则表达式、视图两部分
- Django使用正则表达式匹配请求的URL,一旦匹配成功,则调用应用的视图
- 注意:只匹配路径部分,即除去域名、参数后的字符串
- 在MovieAngelSite/urls.py修改为
from django.conf.urls import url,include
from django.contrib import admin
from movieangel import views
urlpatterns = [
url(r'^$', views.index, name='index'),
url(r'^movies/', include('movieangel.urls')),
url(r'^admin/', admin.site.urls),
]
加入模板
- 模板是html页面,可以根据视图中传递的数据填充值
- 创建模板的目录,static目录
- 修改settings.py文件
加入
TEMPLATE_DIR = os.path.join(BASE_DIR, 'templates')
STATIC_DIR = os.path.join(BASE_DIR, 'static')
修改
'DIRS': [TEMPLATE_DIR, ],
增加
STATICFILES_DIRS = [STATIC_DIR, ]
修改base.html
<!DOCTYPE html>
<html lang="en">
<head>
{% load staticfiles %}
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>MovieGEEKs</title>
<!-- Bootstrap -->
<link href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css" rel="stylesheet">
<link href="http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
<script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.2/jquery.min.js"></script>
<script src="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
<script src="{% static 'js/collector.js' %}"></script>
<script>
function get_url(movieid) {
return 'https://api.themoviedb.org/3/find/tt' + movieid + '?external_source=imdb_id&api_key={{ api_key }}'
}
</script>
{% block head %}{% endblock head %}
</head>
<body>
<div class="container">
<nav class="navbar navbar-inverse no-padding">
<div class="navbar-header">
<button type="button" class="navbar-toggle"
data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">Movie GEEKs</a>
<a href="/analytics/user/{{user_id}}/">User: {{ user_id }} </a>
</div>
<!-- Search -->
<div class="nav nav-pills pull-right">
<form class="navbar-form" action="/movies/search/">
<div class="form-group" style="display:inline;">
<input type="text" class="form-control" placeholder="Search" name="q" style="bg-color:'gray'" maxlength="40">
<button class="input-group-addon" style="width: 40px"><i class="glyphicon glyphicon-search" title="Search"></i></button>
</div>
</form>
</div>
</nav>
<!-- end of top -->
<div class="row">
<div class="col-sm-2 sidebar collapse navbar-collapse">
<div class="well">
<strong>Genres</strong><br/>
<ul class="nav nav-sidebar">
{% if genres %}
{% for genre in genres %}
<li> <a href="/movies/genre/{{genre.name}}"
onclick="add_impression({{ user_id }}, 'genre:{{genre.name}}', 0
, '{{ session_id }}',
'{{ csrf_token }}')">{{ genre.name }}</a> </li>
{% endfor %}
{% endif %}
</ul>
</div>
</div>
{% block content %}{% endblock content %}
</div>
</div>
<script>
</script>
</body>
</html>
增加detail.html
{% extends "movieangel/base.html" %}
{% block content %}
<script>
function get_cb_recs(movieid) {
url = '/rec/cb/item/' + movieid + '/'
$.getJSON(url,
function(result) {
result.data.forEach(function(element, index, array) {
$.getJSON(get_url(element.target), function(result) {
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
rec_div = document.createElement('div')
rec_div.setAttribute('class', "col-sm-2 img-responsive")
a = document.createElement('a')
a.setAttribute('href', '/movies/movie/' + element.target)
img = document.createElement('img')
img.setAttribute('src', image_url)
img.setAttribute('class',"img-responsive")
a.appendChild(img)
rec_div.appendChild(a)
recs = document.getElementById('content_based_recs')
recs.appendChild(rec_div)
recs.style.visibility = 'visible'
})
})
});
}
function get_seededrecs(movieid) {
url = '/rec/association_rule/' + movieid + '/'
$.getJSON(url,
function(result) {
result.data.forEach(function(element, index, array) {
$.getJSON(get_url(element.target), function(result) {
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
rec_div = document.createElement('div')
rec_div.setAttribute('class', "col-sm-2 img-responsive")
a = document.createElement('a')
a.setAttribute('href', '/movies/movie/' + element.target)
img = document.createElement('img')
img.setAttribute('src', image_url)
img.setAttribute('class',"img-responsive")
a.appendChild(img)
rec_div.appendChild(a)
recs = document.getElementById('recs')
recs.appendChild(rec_div)
recs.style.visibility = 'visible'
})
})
});
}
function getinfo(movieid) {
url = 'https://api.themoviedb.org/3/find/tt' + movieid + '?external_source=imdb_id&api_key={{ api_key }}'
$.getJSON(url,
function(result) {
title_div = document.getElementById('title')
title_div.innerHTML = '<strong>' + result.movie_results[0].title + '</strong>'
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
img_tag = document.getElementById('poster').setAttribute('src', image_url)
document.getElementById('overview').innerHTML = result.movie_results[0].overview
document.getElementById('release_date').innerHTML = result.movie_results[0].release_date
document.getElementById('lan').innerHTML = result.movie_results[0].original_language
document.getElementById('avg_rating').innerHTML = result.movie_results[0].vote_average
} )}
function item_bought()
{
add_impression({{user_id}}, 'buy', '{{ movie_id }}', '{{ session_id }}','{{ csrf_token }}')
}
$(document).ready(function() {
getinfo('{{ movie_id }}')
})
</script>
<div class="container">
<div class="row">
<div class="col-sm-8">
<div class="row">
<div class="col-sm-12"><h2 id="title"></h2></div></div>
<div class="row">
<div id="img" class="col-sm-6">
<img id="poster" class="img-responsive" src="" alt="movie poster"/>
</div>
<div class="col-sm-6">
<div><strong>Released:</strong><p id="release_date"></p></div>
<div><strong>Description:</strong><p id="overview"></p></div>
<div><strong>Language</strong><p id="lan"></p></div>
<div><strong>Average rating</strong><p id="avg_rating"></p></div>
<div><strong>Genres</strong><p>
{% if movie_genres %}
| {% for genre in movie_genres %}
{{ genre.name}} |
{% endfor %}
{% endif %}
</p></div>
<div>
<a href="#" type="button" onclick="item_bought()" class="btn btn-danger btn-lg">Buy</a>
</div>
</div>
</div>
<div id="recs" class="row" style="visibility: hidden">
<h2>Frequently bought with these</h2>
</div>
<div id="content_based_recs" class="row" style="visibility: hidden">
<h2>Similar content</h2>
</div>
</div>
</div>
<script>
get_seededrecs('{{ movie_id }}')
get_cb_recs('{{ movie_id }}')
</script>
</div>
{% endblock content %}
增加index.html
{% extends "movieangel/base.html" %}
{% block content %}
<script>
function get_cb_recs(movieid) {
url = '/rec/cb/item/' + movieid + '/'
$.getJSON(url,
function(result) {
result.data.forEach(function(element, index, array) {
$.getJSON(get_url(element.target), function(result) {
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
rec_div = document.createElement('div')
rec_div.setAttribute('class', "col-sm-2 img-responsive")
a = document.createElement('a')
a.setAttribute('href', '/movies/movie/' + element.target)
img = document.createElement('img')
img.setAttribute('src', image_url)
img.setAttribute('class',"img-responsive")
a.appendChild(img)
rec_div.appendChild(a)
recs = document.getElementById('content_based_recs')
recs.appendChild(rec_div)
recs.style.visibility = 'visible'
})
})
});
}
function get_seededrecs(movieid) {
url = '/rec/association_rule/' + movieid + '/'
$.getJSON(url,
function(result) {
result.data.forEach(function(element, index, array) {
$.getJSON(get_url(element.target), function(result) {
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
rec_div = document.createElement('div')
rec_div.setAttribute('class', "col-sm-2 img-responsive")
a = document.createElement('a')
a.setAttribute('href', '/movies/movie/' + element.target)
img = document.createElement('img')
img.setAttribute('src', image_url)
img.setAttribute('class',"img-responsive")
a.appendChild(img)
rec_div.appendChild(a)
recs = document.getElementById('recs')
recs.appendChild(rec_div)
recs.style.visibility = 'visible'
})
})
});
}
function getinfo(movieid) {
url = 'https://api.themoviedb.org/3/find/tt' + movieid + '?external_source=imdb_id&api_key={{ api_key }}'
$.getJSON(url,
function(result) {
title_div = document.getElementById('title')
title_div.innerHTML = '<strong>' + result.movie_results[0].title + '</strong>'
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
img_tag = document.getElementById('poster').setAttribute('src', image_url)
document.getElementById('overview').innerHTML = result.movie_results[0].overview
document.getElementById('release_date').innerHTML = result.movie_results[0].release_date
document.getElementById('lan').innerHTML = result.movie_results[0].original_language
document.getElementById('avg_rating').innerHTML = result.movie_results[0].vote_average
} )}
function item_bought()
{
add_impression({{user_id}}, 'buy', '{{ movie_id }}', '{{ session_id }}','{{ csrf_token }}')
}
$(document).ready(function() {
getinfo('{{ movie_id }}')
})
</script>
<div class="container">
<div class="row">
<div class="col-sm-8">
<div class="row">
<div class="col-sm-12"><h2 id="title"></h2></div></div>
<div class="row">
<div id="img" class="col-sm-6">
<img id="poster" class="img-responsive" src="" alt="movie poster"/>
</div>
<div class="col-sm-6">
<div><strong>Released:</strong><p id="release_date"></p></div>
<div><strong>Description:</strong><p id="overview"></p></div>
<div><strong>Language</strong><p id="lan"></p></div>
<div><strong>Average rating</strong><p id="avg_rating"></p></div>
<div><strong>Genres</strong><p>
{% if movie_genres %}
| {% for genre in movie_genres %}
{{ genre.name}} |
{% endfor %}
{% endif %}
</p></div>
<div>
<a href="#" type="button" onclick="item_bought()" class="btn btn-danger btn-lg">Buy</a>
</div>
</div>
</div>
<div id="recs" class="row" style="visibility: hidden">
<h2>Frequently bought with these</h2>
</div>
<div id="content_based_recs" class="row" style="visibility: hidden">
<h2>Similar content</h2>
</div>
</div>
</div>
<script>
get_seededrecs('{{ movie_id }}')
get_cb_recs('{{ movie_id }}')
</script>
</div>
{% endblock content %}
增加search_result.html
{% extends "movieangel/base.html" %}
{% block head %}
<script>
$(document).ready(function(){
$('[data-toggle="popover"]').popover();
$('.movie').on('show.bs.popover', function () {
var contentid;
contentid = $(this).attr("id");
});
});
function getinfo(movie_id, title) {
url = 'https://api.themoviedb.org/3/find/tt' + movie_id + '?external_source=imdb_id&api_key={{ api_key }}'
$.getJSON(url,
function(result) {
if (result.movie_results != null)
{
img_tag = document.getElementById('src_' + movie_id)
image_url = 'http://image.tmdb.org/t/p/w500/'
+ result.movie_results[0].poster_path
a = document.createElement("a");
a.setAttribute('href', "/movies/movie/" + movie_id);
a.setAttribute('onclick', "add_impression({{user_id}}, 'more_details', " +
movie_id + ", '{{ session_id }}','{{ csrf_token }}')")
a.innerHTML= 'more details'
save_for_later_a = document.createElement("a");
save_for_later_a.setAttribute('onclick',
"add_impression({{user_id}}, 'save_for_later', " + movie_id +
", '{{ session_id }}','{{ csrf_token }}')")
save_for_later_a.innerHTML = 'save for later';
popover_div = '<div style="width: 200px;">' +
'<strong>released:</strong> ' + result.movie_results[0].release_date + '<br />' +
'<strong>language:</strong> ' + result.movie_results[0].original_language + '<br />' +
'<strong>avg tweet rating</strong>: '+ result.movie_results[0].vote_average + '<br />' +
save_for_later_a.outerHTML + '<br />' +
a.outerHTML +
'</div>'
popover_content = 'amazing film <br/>' + a.outerHTML
img = document.createElement("img");
img.setAttribute('id', movie_id);
img.setAttribute('class', 'movie img-rounded img-responsive');
img.setAttribute('src', image_url);
img.setAttribute('style','padding: 0px 0px 0px 0px;height: 150px')
img.setAttribute('title', title);
div = document.createElement("div");
div.setAttribute('class', 'col-xs-2');
div.appendChild(img)
document.createElement("div");
div.setAttribute('onclick', "add_impression({{user_id}}, 'details', "
+ movie_id
+ ", '{{ session_id }}','{{ csrf_token }}')")
$('#movies').append(div)
$('#' + movie_id).popover({
html: true,
content: popover_div,
trigger: 'click'
})
}
}
)
}
{% if movies %}
{% for movie in movies %}
{% if movie.movie_id %}getinfo('{{movie.movie_id}}', '{{movie.title}}');{% endif %}
{% endfor %}
{% endif %}
function getTopContent() {
$.getJSON('/rec/chart', function(result) {
var ul = document.getElementById("top_content");
result.forEach(function(element, index, array) {
var li = document.createElement("li")
li.innerHTML = '<a ' +
'onclick=\'PostRecClicked(\"'
+ element.content_id + '\", \"rec:chart\")\''
+ "href='/movies/movie/"+ element.content_id + "'>"
+ (index + 1) + ". "
+ element.title + "</a>";
ul.appendChild(li)
});
})
};
</script>
{% endblock head %}
{% block content %}
<div class="col-xs-8 main max-size">
{% if movies|length > 0 %}
<div id="movies" class="row"></div>
{% else %}
No movies found
{% endif%}
<div class="row">
{% if movies.has_other_pages %}
<ul class="pagination">
{% if movies.has_previous %}
<li><a href="?page={{movies.previous_page_number }}">«</a></li>
{% else %}
<li class="disabled"><span>«</span></li>
{% endif %}
{% for i in pages %}
{% if i == movies.number %}
<li class="active">
<span>{{ i }} <span class="sr-only">(current)</span></span>
</li>
{% else %}
<li>
<a href="?page={{i}}">{{ i}}</a>
</li>
{% endif %}
{% endfor %}
{% if movies.has_next %}
<li><a href="?page={{movies.next_page_number}}">»</a></li>
{% else %}
<li class="disabled"><span>»</span></li>
{% endif %}
</ul>
{% endif %}
</div>
</div>
<div id="right" class="col-xs-2">
<div class="well">
<ol id="top_content" class="nav nav-sidebar"></ol>
</div>
</div>
<script type="text/javascript">
getTopContent();
</script>
{% endblock content %}
增加collector.js,这个作用和面章节会说
function add_impression(user_id, event_type, content_id, session_id, csrf_token) {
$.ajax({
type: 'POST',
url: '/collect/log/',
data: {
"csrfmiddlewaretoken": csrf_token,
"event_type": event_type,
"user_id": user_id,
"content_id": content_id,
"session_id": session_id
},
fail: function(){
console.log('log failed(' + event_type + ')')
}
})};
最后,在目录下加入:
这两个文件。
import os
import urllib.request
from tqdm import tqdm
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'MovieAngelSite.settings')
import django
django.setup()
from movieangel.models import Movie, Genre
def create_movie(movie_id, title, genres):
movie = Movie.objects.get_or_create(movie_id=movie_id)[0]
title_and_year = title.split(sep="(")
movie.title = title_and_year[0]
movie.year = title_and_year[1][:-1]
if genres:
for genre in genres.split(sep="|"):
g = Genre.objects.get_or_create(name=genre)[0]
movie.genres.add(g)
g.save()
movie.save()
return movie
def download_movies():
URL = 'https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/movies.dat'
response = urllib.request.urlopen(URL)
data = response.read()
return data.decode('utf-8')
def delete_db():
print('truncate db')
Movie.objects.all().delete()
Genre.objects.all().delete()
print('finished truncate db')
def populate():
movies = download_movies()
print('movie data downloaded')
for movie in tqdm(movies.split(sep="\n")):
m = movie.split(sep="::")
if len(m) == 3:
create_movie(m[0], m[1], m[2])
if __name__ == '__main__':
print("Starting MovieGeeks Population script...")
delete_db()
populate()
就发这个代码吧,populate_movieangel.py的
这样初步的代码差不多都有了,我们先run一下populate_movieangel,这样所有电影数据就有了,接着在terminal里打,python manage.py runserver,就可以看到网站了~
第一章开篇就这样了~后面几章在代码上不会再这么细了,有兴趣的看看github好了~