原文标题:Listing Embeddings for Similar Listing Recommendations and Real-time Personalization in Search Ranking
By Mihajlo Grbovic, Haibin Cheng, Qing Zhang, Lynn Yang, Phillippe Siclait and Matt Jones
https://medium.com/airbnb-engineering/listing-embeddings-for-similar-listing-recommendations-and-real-time-personalization-in-search-601172f7603e
总结:
In this blog post we describe a Listing Embedding technique we developed and deployed at Airbnb for the purpose of improving Similar Listing Recommendations and Real-Time Personalization in Search Ranking. The embeddings are vector representations of Airbnb homes learned from search sessions that allow us to measure similarities between listings. They effectively encode many listing features, such as location, price, listing type, architecture and listing style, all using only 32 float numbers. We believe that the embedding approach for personalization and recommendation is very powerful and useful for any type of online marketplace on the Web.
* 什么是list: Entire Home, Private Room, Shared Room之类的
- 灵感来自于nlp的word embedding( The networks are trained by directly taking into account the word order and their co-occurrence, based on the assumption that words frequently appearing together in the sentences also share more statistical dependence. ),并且已经应用到很多其他非nlp方向了。比如items that were clicked or purchased or queries and ads that were clicked
- dimensionality维度d=32
- 采用negative sampling 的方法训练词向量。https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
- 冷启动Cold-start Embeddings:3个最邻近To create embeddings for a new listing we find 3 geographically closest listings that do have embeddings, and are of same listing type and price range as the new listing, and calculate their mean vector.
- 评估embedding效果:
- kmeans查看是否embedding包含了地理信息。First, to evaluate if geographical similarity is encoded we performed k-means clustering on learned embeddings.
- 使用cosine检查相似价格范围与类型的list果然enbedding也很相似。confirmed that cosine similarities between listings of same type and price ranges are much higher compared to similarities between listings of different type and price ranges.
- 线下测试list embedding
比较最好的结果和实际客户最新点击的list type。One way of evaluating trained embeddings is to test how good they are in recommending listings that the user would book, based on their most recent click. - a/b test: The A/B test showed that embedding-based solution lead to a 21% increase in Similar Listing carousel CTR and 4.9% more guests discovering the listing they ended up booking in the Similar Listing carousel.
感受:
- list embeddings to calculate similarities between listings 应用到recommendation applications
- word embedding 还没有仔细学过,所以看的收获不是很大。大概意思懂了,只是不知道这样的应用是不是很普遍。