Privacy and Health Research

From:
William W. Lowrance, Ph.D.

2. Health Data and Data Holders

Several current changes in the context within which health data are collected and used must be recognized. First, the boundaries between classical medical care and "public health" are becoming ever less distinct. Over the past decades the rubric, "health," has been broadened to include many matters—from hyperactivity in children, to teenagers' nose shape, to memory loss associated with aging—that earlier were not viewed as matters of health, much less of medicine. At the same time, medical science has come to accord much more importance to such ordinary life factors as diet and stress as determinants of health, and therefore addresses them in medical care.(29)

This Report covers the whole range, and where distinctions are not sharp it refers to "health," which, after all, is the end of medicine. "Health data" includes all data collected under physicians' supervision, but also a wide range of other data that relate to health.

An expansive definition such as Lawrence Gostin's is necessary: (30)

The term "health data" is broadly defined as all records that contain information that describes a person's prior, current, or future health status, including [cause of disease], diagnosis, prognosis, or treatment, or methods of reimbursement for health services.

To quote an example from a statute, the newly enacted "U.S. Health Insurance Portability and Accountability Act" reaches broadly, as it must (§1171(4)):

The term "health information" means any information, whether oral or recorded in any form or medium, that—(A) is created or received by a health care provider, health plan, public health authority, employer, life ensurer, school or university, or health care clearinghouse; and (B) relates to the past, present, or future physical or mental health or condition of an individual, the provision of health care to an individual, or the past, present, or future payment for the provision of health care to an individual.

Second, of course, health care has been evolving into systems, andsystems of systems. As a consequence, the traditional clinician's notes, scribbled down or dictated and later transcribed, and then locked up in filing cabinets, increasingly are being recorded along with other files in electronic media, usually networked. (31)

And third, the trend clearly is toward not only recording health information in computerized form, but indeed basing health care around the "lifetime linked-data dossier" on the person. Many advantages are evident for assembling health data from disparate sources, understanding the person's life-and-health trajectory, providing health-promotion input and health care, transmitting orders and analyses, and networking and consulting at distances. Many advantages are evident for billing and paying, administrative review, and research. And as well, there can be many advantages for the patient's own awareness and documentation of his health "story."

Public-health records are being computerized just as quickly. In the future envisioned by seers, many aspects of public-health surveillance (such as scanning for infectious disease outbreaks), compilation of statistics (use of hospital outpatient services...), development of registries (vaccination...), and other analytic collections (effects of pharmaceuticals...) will simply be derived, whenever and in whatever form needed, from the networked lifetime dossiers.

These visionary technical developments, which are very exciting but not without negative aspects, are being explored diligently by many institutions.(32), (33) A potential vulnerability, even the "Achilles heel," of this movement is whether it will be able to deal adequately with privacy, confidentiality, and security.

"DATA" VOCABULARY

Although definitions need not be belabored here, a few concepts and items of vocabulary are necessary.

Data is taken to mean discrete bits of information. As one dictionary has it: "Data are facts or figures from which conclusions may be inferred." For most research now, data are converted into numerical form for processing by computers.

Data-subjects are the people about whom data are collected.

Databases are collections of data, recorded in standardized fashion, ordered for reference or research purposes.

Database research, then, is research that analyzes data in such collections.

Information is data set within a context of meaning. Raw data (such as lists of numbers that stand for blood-enzyme concentrations, or units on a mental-depression scale) make no "sense" as facts unless the measurement method and descriptive scale are known. And before any scientific meaning can be inferred, the data must be tied with data on other characteristics of the data-subjects and the circumstances.

Personally identifiable data are data that are associated with real persons, or that can be associated with real persons by deduction from descriptors such as birthdate, physical characteristics, occupation, residential location, social identification number, or history. Synonyms are "personal data" and "individually identifiable data." Often for brevity the descriptors, such as the person's name, that associate the data with a real person are referred to just as "identifiers."

Processing or handling of data, in an ethical or legal sense, may refer to recording, storing, retrieving, duplicating, transferring, destroying—in effect, any action through which someone may become cognizant of, or move, or alter, data. (34) Verb lists of this kind are unavoidable; privacy of the data-subject can be affected by any such operations.

THE UNIVERSE OF HEALTH DATA

So many kinds of health data are collected that it would be distracting and soporific to do more here than take note of the major categories. But it is essential to recognize: (a) that great research power resides in a diversity of health data, and (b) that privacy issues surround many kinds of data beyond those in primary medical records.

Health data include:

All kinds of data may reveal intimate information. Prescription data, for instance, often indicate the disease, or at least the kind of disease, being treated. Blood-type holds implications about parentage. Just the very fact that a person has entered into a relationship with a psychotherapist, or a drug-abuse treatment center—as revealed, say, by billing records or clinic appointment logs—can be held against the person by employers or others.

Further, besides carrying technical observations relating to the main purpose of an encounter between a person and a healthcare or research system, records may contain subjective remarks on general health or lifestyle ("coughs a lot, probably heavy smoker"), incidental observations ("child has numerous bruises and small burn scars on back" or "spouse opposed to surgery"), or speculations ("taking anabolic steroids?"or "bulimic?").

ESPECIALLY-SENSITIVE DATA

Obviously some kinds of data are felt by data-subjects or the public in general to be especially sensitive. A commonly cited example is that HIV–AIDS data are much more sensitive than, say, data about wrist fracture. Whether sensitivity is somehow justified will always be debatable within the context. But for purposes of ethical practices, policy, and law, widely held public concerns must be recognized and respected appropriately.

Among the categories often taken to be highly sensitive are data about:

But although these are among the more obviously delicate kinds of data, a person may just as well have anxieties about employers or others becoming aware of data regarding asthma, for instance, or epilepsy, cirrhosis of the liver, or a weak back.

Sensitivity may have to do with revelation of a past that a person has moved beyond and does not wish others to know about, or be reminded of himself. It may imply improper or socially marginal behavior. It may stem from resentment at ill fortune in the lottery of life, or from imputation of careless behavior, or implication of disfunctionality. And of course it may stem from fear of negative discrimination.

This raises serious questions for policy. Should distinctions be made among kinds of health data with respect to how they are protected? Should special sensitivities be recognized? Should protections be scaled relative to the potential for social or physical harm, or emotional offense, to data-subjects?

A U.S. Task Force on the Privacy of Private-Sector Health Records expressed this view (with which the author agrees):(35)

The Task Force believes that any file containing health information should be considered a candidate for protection since it is the information itself, and not the form in which it is maintained, which could result in an invasion of privacy if released. ... Although the Task Force agrees that it is appealing to classify information according to sensitivity, it questions whether this is the most effective approach to protecting data that may potentially cause harm to an individual. Disease-specific segregation of records necessitates complicated administrative arrangements.... In addition, the definition of what constitutes a sensitive medical record may differ from decade to decade and from individual to individual. ... Protecting all health records adequately is the issue that must be addressed.

THE DIVERSITY OF DATA HOLDERS

Just as varied as the types of health data, of course, are the types of individuals and organizations who hold or process the data. Data are processed by:

Thus health data are held by a greater variety of organizations than ever before. Data flow, often at very high volume, within and among many of these organizations.

Although physicians, and staff nominally under their supervision, still collect much of the most intimate data, they are not necessarily any longer in position to control the movements, uses, or fate of the data. Data from a routine patient encounter with the healthcare system quickly are transmitted among care-providers and their local institutions, various technical support services, the paying institutions, and a variety of supervisors, inspectors, auditors, and researchers—many far removed from the data-subject, many not medically certified, and possibly many not sworn to confidentiality. Eventually the encounter may be examined in practice review, filed into statistical tabulations, recorded into ongoing registries, or scrutinized in research.

DATABASES USEFUL FOR RESEARCH

Among the most important resources for research are databases and registries of health experience. Some are highly specialized but not very large; some are broad and enormous. Some are maintained only for research; some are primarily maintained for administrative or other purposes but are available for research. They may be organized by illness (leprosy...), by exposure (oral contraceptives...), by mode of intervention (kidney transplant...), by general healthcare experience (nursing home stay...), or by population (residents of Saskatchewan).

Perhaps the largest collection of health databases in the world is the set of U.S. "Medicare" database systems, which every year processes the records of over 600 million reimbursement claims. (Medicare is the Federal health insurance program for people age 65 and over, people with serious disabilities, and people suffering from serious kidney disease.) The Medicare databases, which are managed by the Health Care Financing Administration (HCFA), contain enrollment and eligibility data, claims for payment, data on the ways healthcare services are used, and many specialized data (such as on end-stage renal disease). (36)

Much very useful research is performed on HCFA data, which as collected is personally identifiable. Public-use files are made available in which, HCFA certifies, "all identifiers have been encrypted, ranged, or blanked." For research projects which meet the criteria for release of identifiable data, HCFA supplies data under Release Agreements pursuant to "routine uses" announced under the Privacy Act. The protections are strict. (See page 59 regarding the Privacy Act, and page 68 regarding conditions on use of Medicare data.)

"Medicaid" databases also are important research resources. (Medicaid programs are regimes under which the States pay for basic health care for low-income, blind, or otherwise disadvantaged people, using joint Federal–State funds.) Like Medicare data, Medicaid data are administrative and billing records.

Although the data may not be of highest quality and are not fully standardized nationally, they nonetheless provide large amounts of diverse information about health and health care about millions of patients "in the real world." Sophisticated computer programs allow searching for data on patient age and sex, diagnoses, use of medicines and medical procedures, costs, and other factors. Researchers are allowed access to the data under restrictive conditions. (37)

Health databases useful for research are maintained in many places. (38) In Europe, just to mention a few examples to suggest their variety, they include the 30 Regional Centers of the French Pharmacovigilance System, the Danish Psychiatric Central Register, the Crohn's Disease Register for the Brussels region, and the Prescription Event Monitoring System run by the Drug Safety Research Unit in Southampton. All of these hold personally identifiable data, as they must.

THE INTERNATIONAL FLOW OF DATA

Health data are zipped around the world all day every day, by government research agencies, pharmaceutical firms, academic researchers, and many others. Data on Americans are transferred, American institutions do much data-transferring, and data are transferred for important American purposes.

A great many health data are imported into the U.S., and many are exported. Such U.S. agencies as the Centers for Disease Control and Prevention, working cooperatively in and with many other countries, import personally identifiable data, under safeguards. The National Heart, Lung, and Blood Institute, in joint programs with Canada and European countries, exchanges data internationally, under safeguards. So does the National Cancer Institute.

Huge volumes of clinical-trial data collected in medical centers are transferred all the time, on behalf of companies that develop and manufacture pharmaceuticals, diagnostics, and medical devices, and the National Institutes of Health, and the World Health Organization, and many others working to improve medical "tools." So are drug, device, and vaccine adverse-effect reports, which provide essential feedback.

Thus personally identifiable health-research data are exchanged internationally, for very good reasons, all the time, and inevitably this international data flow will increase. The importance of pressing for uniform international standards for protecting privacy, confidentiality, and security, is evident.


[Previous]

[Table of Contents]

[Next]

Comments/suggestions about the HHS Data Council web pages should be directed to the Data Council Web Master.

"" Return to the Data Council home page .

Last updated 7/23/97.