The future of computing lies in the management of data. The term "data"
refers not only to structured data, but in a broader sense to semi-structured
and unstructured data. In the past three decades, research in information
retrieval and data management has already transformed the world.
Extracting value out of semi-structured and unstructured data, however,
is
still a problem at large and has huge potential in sculpturing a new
computing landscape. There are two parts to this that I am equally passionate
about: finding new ways to extract value out of data and building efficient
systems that enable the former.
My interest in building systems started in high school, when I co-founded a
business offering bulletin board solutions to large-scale websites. I was
immersed in exploiting techniques to improve system performance and thrilled
that our product outperformed competing solutions. Intrigued by learning more
about the design and implementation of highly scalable systems, I joined a team
at IBM building a new shared-data distributed database product dubbed DB2
pureScale. My work focuses on the efficient use of buffer pools and algorithms
to optimize them in a multi-tiered, distributed context. This experience of
building large systems for real clients makes me contemplate about differences
between ideal academic settings and reality.
Interested in further pursuing the field of data management, I enrolled in
a full year thesis under Professor XXXX's supervision working on a framework for
data integration [1]. I implemented a new SQL-like language for specifying
linkage between data silos. The framework translates user specified queries in
LinQL into SQL, taking into account both syntactic and semantic meaning of data.
This research experience has confirmed my understanding about status quo that a
large amount of value is buried in all kinds of data and we have yet to find
good ways to make use of overwhelming flow of data.
Take emails as an example. Studies show that over 45% of business-critical
information resides in email messages . Collectively, emails reflect the
behaviors and intrinsic social patterns of the people involved. Nonetheless,with
the exception of spam filtering, information in emails is absorbed and consumed
only by the human that actually reads them.
With this belief in mind, I spent a period of time researching emails after
my graduation. I prototyped ideas to facilitate email searching and the
exploration of email social networks. What I had realized is that no single
field alone can fulfill the goal. This area is incredibly interdisciplinary.