Sublinear Algorithms for Distributions and Random Order Streams

Sudipto Guha

University of Pennsylvania

Monday, February 27 at 1:15 in LOM 201

ABSTRACT
In the recent past several models of computation that restrict random access to data have gained in currency and use. The two most notable among them are the Data Streams model and the Property Testing model. In the data streams model an algorithm proceeds in passes and within a pass accesses the data sequentially; any item not explicitly stored in memory is rendered inaccessible in that pass. In the property testing model, the (source of the) data is given as a black box to the algorithm. The algorithm proceeds by observing samples from the black box, and depending on the model, is also allowed to query the box for the value of a specific item. Both these paradigms are very useful and practical abstractions of large datasets and a natural question in this regard has been: what is the comparison of these two models? To date, data streams have been studied in the context of worst case ordering of the input. However, in many scenarios the evolving data stream is not adversarial in nature. A natural question that arises in this regard is: do randomizing over the order of data streams allow us to design better algorithms? In this talk we will investigate the answers to the two questions above. We will review the scenarios where a random order stream is the natural model. We will show that random order streams allow exponentially more power compared to adversarial data streams. Furthermore we will show that random order streams serve as an excellent model for distributions, in the sense that any property testing algorithm for symmetric functions on distributions can be emulated by a one pass random order streams with appropiate correspondence between their respective complexity measures. Return to DMTCS home page.

ABSTRACT

In the recent past several models of computation that restrict random
access to data have gained in currency and use. The two most notable among
them are the Data Streams model and the Property Testing model. In the
data streams model an algorithm proceeds in passes and within a pass
accesses the data sequentially; any item not explicitly stored in memory
is rendered inaccessible in that pass. In the property testing model, the
(source of the) data is given as a black box to the algorithm. The
algorithm proceeds by observing samples from the black box, and depending
on the model, is also allowed to query the box for the value of a specific
item. Both these paradigms are very useful and practical abstractions of
large datasets and a natural question in this regard has been: what is the
comparison of these two models?

To date, data streams have been studied in the context of worst case
ordering of the input. However, in many scenarios the evolving data stream
is not adversarial in nature. A natural question that arises in this
regard is: do randomizing over the order of data streams allow us to
design better algorithms?

In this talk we will investigate the answers to the two questions
above. We will review the scenarios where a random order stream is the
natural model. We will show that random order streams allow exponentially
more power compared to adversarial data streams. Furthermore we will show
that random order streams serve as an excellent model for distributions,
in the sense that any property testing algorithm for symmetric functions
on distributions can be emulated by a one pass random order streams with
appropiate correspondence between their respective complexity measures.

Return to DMTCS home page.