Lecture1:
Ads types
Search Ads:
- logic: match ad’s keywords to user’s query
- ads format: text, image
- ads position: main line, side bar, top, bottom of search result
Native Ads:
- logic: match ad’s keywords to web page or APP’s context
- ads format: text, image , style should also match context of web page or APP
- ads position: embedded in original content of web page or APP
Display Ads:
- logic: match user’s demographic, interests to ad’s category interests collected from user behavior: page dwell time, click, video engagement time
- ads format: image , animation(gif), video, audio
- ads position: sidebar, top, bottom of page or App
Ads data structure
What is a campaign:
- A campaign focuses on a theme or a group of products
- set a budget
- choose your audience
- write your ad including keywords, ad content
Ad:
- AdID
- CampaignID
- Keywords
- Bid
- Description
- LandingPage
Campaign
- CampaignID
- Budget*
Search Ads Workflow
Lecture2:
Information Retrieval(IR)
Finding material(usually documents) of an unstructured nature(usually text) that satisfies an information need from within large collections(stored on computer), e.g. web search, e-mail search, etc.
Inverted index
For each term
t
, we must store a list of all documents that contain
- Tokenization
Cut character sequence into word tokens - Normalization
Map text and query term to same form: lower case, U.S.A->USA - Stemming
We may wish different forms of a to match: am, are, is->be; cars, car’s, cars’->car - Stop words
Omit very common words like preposition: of, on
How to process a query like
A
and
- Locate and merge, form is (term : docs)
How to process a query like
A
- New form (term, num of docs; doc1: pos1, pos2, …; doc2: pos1, pos2, …;), calculate distance in shared index.
Application of IR in search ads
- build inverted index for ad: Key->term in key words, Value->list(Adid)
- build forward index for ad detail info
- process query
- rank ads candidates
How to rank ads candidates?
- Relevance score = num of words match in our query/ total num of key words
Web service
Web services are client and server applications that communicate over the WWW via Hyper Text Transfer Protocol(HTTP)
Component:
- Client: PC, mobile phone, tablet
- Protocol: HTTP
- Web Server: Tomcat, nginx, IIS, Jetty
- Data Layer: SQL database, NOSQL, document
The HTTP is an application-level protocol for distributed, collaborative, hypermedia information systems. It is used to deliver data on WWW
- Connectionless
The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client disconnects from the server and waits for a response. The server processes the request and re-established the connection with the client to send a response back. - Media independent
Any type of data can be sent by HTTP as long as both the client and the server know how to handle the data content. - Stateless
HTTP is connection-less and it is a direct result of HTTP being a stateless protocol. The server and client are aware of each other only during a current request.
How web server handle http request?
- AuthTrans
Verify any authorization info sent in the request - NameTrans
Translate the logical URL into a local file system path - PathCheck
Check local file system path for validity and check the the requestor has access privileges to the requested resource on the file system - ObjectType
Determing the Multi-purpose Internet Mail Encoding(MIME-type) of the requested resource - ParseParams
Process incoming request data read by the service step - Service(generate response)
generate and return the response to the client - Error
if an error happens, the server log the error message and aborts the process
Map Reduce
- Map
Divides the input into ranges and creates a map task to transfer each partition
input: any string
output: key, value - Shuffle
Distribute partitions to different machine by key - Reduce
Collects the various results and combines them to answer the larger problem that the master node needs to solve
input : key, list(value)
Lecture3
Query Rewrite
- Goal
Find queries related to the issued one, which would allow us to retrieve relevant ads that were not matched by the original - Approach
Find K nearest neighbors of original query, semantically
similar queries - Intuition
If we can find vector representation of query, then we can
calculate similarity by cosine of two vectors
Normally, a customer would generate a query like “an outdoor beach furniture”, we would first find its K-nearest neighbors(similar queries) and compare their similarity. To generate a vector, we would use the input one-hot vector to calculate word vector, and the word vector will calculate the output vector, e.g. the word vector of “ant” times the output layer of word “car” will give us a value which will be put into softmax layer to calculate probability
word2vec
- skip gram model
- for a given word in a sentence, what is the probability of each and every other word in our vocabulary appearing anywhere within a small window around the input word
- for example, given word “trump”, trained model is going to say that words like “president” ,”elect” and “donald” have a high probability of appearing nearby, and unrelated words like “cook” and “movie” have a low probability
- skip gram model training
- training data: vocabulary of V unique words
- input word representation: one-hot vector for each word, this vector will have V components (one for every word in our vocabulary) and we’ll place a “1” in the position corresponding to the word, and 0s in all of the other positions
- output : a single vector containing, for every word in our vocabulary, the probability that each word would appear near the input word.
How to calculate query rewrite with word2vec?
- term level: replace query term with similar terms
- phrase level: replace phrase from query and embed it with similar phrases
Query Intent Extraction
- Goal
Generate sub-queries which preserve the intent of the original query the best and allow us to retrieve more relevant ads - Approach
Logistic regression classifier is used to determine the goodness of each sub-query - Intuition
Historically good sub query has more clicks on relevant ads
which contain terms in sub query
How to generate sub-queries?
- remove stop words
- generate n gram as sub-query (2<= n <= N - 1)
How to quantify good sub-query?
Feature
- Click Intent Rank(CIR)
CIR quantify the contribution of each token to query intent and indicate how important token v is in the query
intuition: important tokens can generate good sub query
query: stella artois beer prices
Apply PageRank algorithm