ABSTRACT
•Challenging
–many possible attribute combinations that need to be forecast
•Address
–only a sub-set of attribute combinations are explicitly forecast and stored
–the other combinations are dynamically forecast on-the-fly using high dimensional attribute correlation models
•Challenging
–many possible attribute combinations that need to be forecast
•Address
–only a sub-set of attribute combinations are explicitly forecast and stored
–the other combinations are dynamically forecast on-the-fly using high dimensional attribute correlation models
INTRODUCTION
•Problem
–How do we forecast arbitrary attribute combinations without excessive computational and space requirements, while still maintaining real-time response?
•Advantage
–Hold new attributes add
–Not suffer from the sparsity problem
–Adapt guaranteed need
•Problem
–How do we forecast arbitrary attribute combinations without excessive computational and space requirements, while still maintaining real-time response?
•Advantage
–Hold new attributes add
–Not suffer from the sparsity problem
–Adapt guaranteed need
Data and Query Model
•Gender = Male, Age= 30, Location = California, Interested In Sports= True, Interested In Finance= False, Planning Vaction = True, Page-Category = Sports, ..., Time = 31 October 2009 11:00pm
•a query as (Page Category = Sports ^(Gender =Male V Age[25, 35]) ^ Time [1 Aug 2009 — 31 Oct 2009])
•Gender = Male, Age= 30, Location = California, Interested In Sports= True, Interested In Finance= False, Planning Vaction = True, Page-Category = Sports, ..., Time = 31 October 2009 11:00pm
•a query as (Page Category = Sports ^(Gender =Male V Age[25, 35]) ^ Time [1 Aug 2009 — 31 Oct 2009])
Forecasting Problem Statement
•count forecast problem
–forecast the number of points in the query region
•sample forecast problem
–forecast a sample of points in the query region
–be used to compute the number of user visits in the query region have already been assigned to previous guaranteed contracts
•count forecast problem
–forecast the number of points in the query region
•sample forecast problem
–forecast a sample of points in the query region
–be used to compute the number of user visits in the query region have already been assigned to previous guaranteed contracts
Solution Overview
•when the query arrives
–we first map a sub-set of the query attributes to an attribute combination that has time-series forecasts to obtain future trend information.
–Then, we multiply this trend count with the correlation ratios (obtained from the correlation model) for the other query attributes to obtain the forecast count for the query
•when the query arrives
–we first map a sub-set of the query attributes to an attribute combination that has time-series forecasts to obtain future trend information.
–Then, we multiply this trend count with the correlation ratios (obtained from the correlation model) for the other query attributes to obtain the forecast count for the query
•Which attribute combinations do we forecast trends for?
–beyond the scope of this paper
•How do we effectively represent correlations in a high-dimensional space?
–a naive Bayesian model
–a partially independent model
–a fully correlated model
–beyond the scope of this paper
•How do we effectively represent correlations in a high-dimensional space?
–a naive Bayesian model
–a partially independent model
–a fully correlated model
System Architecture