CS209-Course-Notes

Lecture1:

Ads types

Search Ads:

  • logic: match ad’s keywords to user’s query
  • ads format: text, image
  • ads position: main line, side bar, top, bottom of search result

Native Ads:

  • logic: match ad’s keywords to web page or APP’s context
  • ads format: text, image , style should also match context of web page or APP
  • ads position: embedded in original content of web page or APP

Display Ads:

  • logic: match user’s demographic, interests to ad’s category interests collected from user behavior: page dwell time, click, video engagement time
  • ads format: image , animation(gif), video, audio
  • ads position: sidebar, top, bottom of page or App

Ads data structure

What is a campaign:

  • A campaign focuses on a theme or a group of products
  • set a budget
  • choose your audience
  • write your ad including keywords, ad content

Ad:

  • AdID
  • CampaignID
  • Keywords
  • Bid
  • Description
  • LandingPage

Campaign

  • CampaignID
  • Budget*

Search Ads Workflow

Search Ads Workflow

Lecture2:

Information Retrieval(IR)

Finding material(usually documents) of an unstructured nature(usually text) that satisfies an information need from within large collections(stored on computer), e.g. web search, e-mail search, etc.

Inverted index

For each term t , we must store a list of all documents that contain t, which is identified by a docID .

Inverted index construction

  • Tokenization
    Cut character sequence into word tokens
  • Normalization
    Map text and query term to same form: lower case, U.S.A->USA
  • Stemming
    We may wish different forms of a to match: am, are, is->be; cars, car’s, cars’->car
  • Stop words
    Omit very common words like preposition: of, on

How to process a query like A and B, “Alice and Bruce”
- Locate and merge, form is (term : docs)
How to process a query like A B, “star wars”
- New form (term, num of docs; doc1: pos1, pos2, …; doc2: pos1, pos2, …;), calculate distance in shared index.

Application of IR in search ads

  • build inverted index for ad: Key->term in key words, Value->list(Adid)
  • build forward index for ad detail info
  • process query
  • rank ads candidates

How to rank ads candidates?

  • Relevance score = num of words match in our query/ total num of key words

Web service

Web services are client and server applications that communicate over the WWW via Hyper Text Transfer Protocol(HTTP)
Web Services

Component:

  • Client: PC, mobile phone, tablet
  • Protocol: HTTP
  • Web Server: Tomcat, nginx, IIS, Jetty
  • Data Layer: SQL database, NOSQL, document

The HTTP is an application-level protocol for distributed, collaborative, hypermedia information systems. It is used to deliver data on WWW

  • Connectionless
    The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client disconnects from the server and waits for a response. The server processes the request and re-established the connection with the client to send a response back.
  • Media independent
    Any type of data can be sent by HTTP as long as both the client and the server know how to handle the data content.
  • Stateless
    HTTP is connection-less and it is a direct result of HTTP being a stateless protocol. The server and client are aware of each other only during a current request.
    HTTP Header

How web server handle http request?

  • AuthTrans
    Verify any authorization info sent in the request
  • NameTrans
    Translate the logical URL into a local file system path
  • PathCheck
    Check local file system path for validity and check the the requestor has access privileges to the requested resource on the file system
  • ObjectType
    Determing the Multi-purpose Internet Mail Encoding(MIME-type) of the requested resource
  • ParseParams
    Process incoming request data read by the service step
  • Service(generate response)
    generate and return the response to the client
  • Error
    if an error happens, the server log the error message and aborts the process

Map Reduce

  • Map
    Divides the input into ranges and creates a map task to transfer each partition
    input: any string
    output: key, value
  • Shuffle
    Distribute partitions to different machine by key
  • Reduce
    Collects the various results and combines them to answer the larger problem that the master node needs to solve
    input : key, list(value)

Lecture3

Query Rewrite

  • Goal
    Find queries related to the issued one, which would allow us to retrieve relevant ads that were not matched by the original
  • Approach
    Find K nearest neighbors of original query, semantically
    similar queries
  • Intuition
    If we can find vector representation of query, then we can
    calculate similarity by cosine of two vectors

Normally, a customer would generate a query like “an outdoor beach furniture”, we would first find its K-nearest neighbors(similar queries) and compare their similarity. To generate a vector, we would use the input one-hot vector to calculate word vector, and the word vector will calculate the output vector, e.g. the word vector of “ant” times the output layer of word “car” will give us a value which will be put into softmax layer to calculate probability

word2vec

  • skip gram model
    • for a given word in a sentence, what is the probability of each and every other word in our vocabulary appearing anywhere within a small window around the input word
    • for example, given word “trump”, trained model is going to say that words like “president” ,”elect” and “donald” have a high probability of appearing nearby, and unrelated words like “cook” and “movie” have a low probability
  • skip gram model training
    • training data: vocabulary of V unique words
    • input word representation: one-hot vector for each word, this vector will have V components (one for every word in our vocabulary) and we’ll place a “1” in the position corresponding to the word, and 0s in all of the other positions
    • output : a single vector containing, for every word in our vocabulary, the probability that each word would appear near the input word.

How to calculate query rewrite with word2vec?

  • term level: replace query term with similar terms
  • phrase level: replace phrase from query and embed it with similar phrases

Query Intent Extraction

  • Goal
    Generate sub-queries which preserve the intent of the original query the best and allow us to retrieve more relevant ads
  • Approach
    Logistic regression classifier is used to determine the goodness of each sub-query
  • Intuition
    Historically good sub query has more clicks on relevant ads
    which contain terms in sub query
    Query Intent Extraction Example

How to generate sub-queries?

  • remove stop words
  • generate n gram as sub-query (2<= n <= N - 1)

How to quantify good sub-query?
Mutual Click Intent

Feature

MCI Features

  • Click Intent Rank(CIR)
    CIR quantify the contribution of each token to query intent and indicate how important token v is in the query
    intuition: important tokens can generate good sub query
    query: stella artois beer prices

CIR

Apply PageRank algorithm
PageRank for CIR

CIR Features

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值