Report on the PORTIA Workshop on Sensitive Data in Medical, Financial,
and Content-Distribution Systems
This workshop took place at the Stanford University Frances C. Arrillaga
Alumni Center on July 8 and 9, 2004. Approximately 40 people participated,
many supported by the PORTIA grant; they included grad students, junior and
senior faculty, national-labs researchers, and representatives of the
software industry and relevant user communities. Sixteen talks were
presented -- both invited and contributed; abstracts
and slides are available for all of them.
In addition to listening to the formal talks, participants were invited to
participate in breakout sessions and to formulate research problems. The
results of these sessions are reported below. Some have since been investigated
by PORTIA researchers and others, and we hope to present the fruits of these
investigations at future PORTIA-related events.
- Is "Trusted Computing" a solution to any of the problems created
by the proliferation of sensitive data?
The trusted computing initiative from Microsoft, Intel, and other
big IT players will provide a tamper-proof hardware platform for
cryptographic processing and remote attestation of software. Professor
Feigenbaum's talk pointed out that platforms of this sort do not
straightforwardly solve "the data-privacy problem." Nonetheless, it is
worth asking how and in what contexts such a platform may facilitate
the handling of sensitive data?
- Certification and compliance checking of sensitive data policies.
Privacy legislation will impose legal obligations on processors of
sensitive data: medical information systems, financial institutions,
even simple Web shopfronts. Is it possible to check that a particular
software program satisfies the requirements? Who will translate
vague, nondeterministic laws into implementable specifications and
computer-enforceable policies? What is the right certification
mechanism? What is an acceptable proof of "due diligence" as far as
computer handling of sensitive data is concerned? How do we know that
statistical database inference is not leaking protected information?
- Privacy (esp. legislatively mandated privacy) versus usability.
Tradeoffs between privacy and legitimate information access needs in
medical research, finance, dissemination of digital information.
Professor Masys pointed out that medical practitioners, including the
ones who have enthusiastically embraced IT systems, are convinced that
potential unavailability of essential medical information is a greater
IT-related threat than unauthorized access of sensitive medical information.
In the next talk of that session, Anna Slomovic pointed out that fragmented
medical records (clearly a potential cause of the unavailability that
Professor Masys drew attention to) are actually more likely to be under
patient control and thus to permit patient privacy. Is there actually an
inherent tradeoff between patient control and availability, or is lack of
physician access much more likely to be the result of medical practitioners'
underinvestment in IT than of patients' privacy concerns?
Is HIPAA stifling medical and epidemiological research? Are concerns
about privacy preventing useful profiling (e.g., information providers
cannot make web experience better for users)? Moving beyond medical data,
is misguided concern about privacy leading to inaccurate conclusions, e.g.,
misprofiling based on partial consumer-preference data ("Amazon thinks I'm
pregnant, and TiVo thinks I'm gay")?
- Notions of privacy and degrees of sensitivity.
What are the right notions of privacy for different applications?
Explicit-permission model? Auditability and verifiable access trail
(the list of people who can look at the data is not restricted a priori,
but every access must be logged and accompanied by justification)?
Weak privacy mechanisms to prevent mass harvesting (easy to learn 1
email, difficult to learn 20 million emails)? "Promise to forget"
privacy? Technological support for protection levels that change over
time (person dying, juvenile delinquency and prison records, etc.)?
How is this different from conventional access control (one distinction:
access control enables binary access/no access decision, sensitive
data introduces fine gradations of "access for what purpose?").
- Is there a "right to privacy?"
Many people believe that we should all have some fairly general form of
"right to privacy," in the sense that we should not have to disclose
personal information simply because it would be convenient for some
organization or other to have that information. Do we also have
some fairly general right to give up some privacy? Declan McCullagh
recently made the following observation: "It is true that there are
potential costs of using Gmail for email storage [...] The question
is whether consumers should have the right to make that choice and
balance the tradeoffs, or whether it will be preemptively denied to
them by privacy fundamentalists out to deny consumers that choice."
The same point applies more generally. Is there a "nanny state"
segment of the privacy-advocacy community, and, if so, how can we
prevent its having undue influence on technology development?
- Privacy and individual responsibility.
Are the privacy-advocacy and security-technology communities putting too
much emphasis on the legal right and the technology ability to withhold
sensitive information? Might the quest to conceal from an organization
some fact that is mission-critical to it be inherently futile in the
"information age"? In his interactions with a powerful organization,
such as a government, an employer, the health-care system, or the
law-enforcement system, shouldn't an individual be guaranteed certain
basic rights regardless of whether the organization is in possession
of the individual's sensitive data? In fighting for the rights of
individuals to conceal sensitive data, are we setting ourselves up for
a "blame the victim" legal regime, in which everyone can be penalized
for every bit of sensitive data that he or she "leaks"?
- Attacks on sensitive data: taxonomy and issues.
Who are we hiding information from? Rational vs. malicious attackers.
Insiders and curious (passive) attackers (IRS employees looking
at neighbors' tax returns, hospital nurses looking at celebrities'
medical records). Abuse for commercial gain (harvesting of emails and
census data for spam, identity theft, semi-legitimate mass marketing)
vs. targeted attacks on specific individuals.
- Automated consistency verification of sensitive data policies.
If the same data are governed by multiple policies, how to check
whether the policies are consistent?
- Lifetime management of sensitive information.
How does the technology support evolution of sensitive data over
long time period: decreasing levels of protection, expiring privacy
policies, changing legislation, etc. There is a need for translation
and archiving as storage media become obsolete. Long-term storage
enables attacks that require significant computational power
(brute-force attacks on old cryptographic keys become possible as
computational power increases).
- Recognizing protected categories in medical records.
Natural language processing? Explicit annotation?
- Technological support for user management of sensitive information.
If individuals are supposed to "own" their sensitive information
(credit reports, medical records, etc.), how is this to be implemented?
Does access to relevant databases require authentication mechanisms
and public-key infrastructures?