The Database Paradox:
Unlimited Information and the False Blessing of 'Objectivity'
For a conference at my college I was asked to think about how my teaching —not my research— would be affected by rapid, cheap, and simple access by computer to all the published literature of the human race. Forget what impediments stand in the way of this hypothetical future and imagine that your campus has the means for you and your students to locate, search, sort, copy, and store anything in digital form that has ever been in print. How would you answer?

I am a computer enthusiast. However, while I find this hypothetical future terribly exciting for research, I do not find it unambiguously good or exciting for teaching.

First it is well to admit some of the large advantages.

  1. When more full texts are on-line, we could assign what we wanted without regard to whether it was in print or in paperback. Without resort to Kinko's or its competitors, we could assemble a reading list of just the right fragments. Teachers would find new flexibility in designing their syllabi; rich and poor students would find themselves on equal footing for at least one important resource.

    Even if the texts we wanted to assign were not on-line already, we could put them on-line with a scanner for the cost of a student assistant and royalties. Stevens Institute of Technology did this recently with 2700 pages of texts by and about Galileo. There will be copyright problems, but they definitely don't belong under advantages.

  2. Students already know that they can pursue any topics that interest them by using the library. But some barrier of intimidation prevents many from doing so, even when their curiosity is intense. We may hope that some of the advances in information technology will tear down this barrier and open the doors of knowledge to those who have in effect been waiting outside with longing. I suspect that the evolution of this technology is in the direction of ever greater user-friendliness. However, I can imagine serious ergonomic mistakes that would make the barrier even higher than it is now.

  3. Cheap electronic access to information may have a levelling effect on colleges and universities. It cannot make a bad college good, but it can neutralize some of the advantages of a large library or large city. Even if this happens only in a very small degree, small colleges in the country should begin to see more students who would previously have gone to top schools with huge libraries. Mobile faculty will feel more free to live where the quality of life suits their taste and remain fully plugged into the world of their discipline; immobile faculty will have one less grievance.

  4. Similarly, there should be less isolation of campuses. Students and faculty should enter a larger intellectual community than just their own institution, and participate in electronic exchanges, dialogues, bulletin boards, and conferences with like-minded peers from all over the world. In the classroom, this will hurt bad teachers who will be more easily exposed; it should help teachers who are on top of things such that no news from the front will discomfit them.

This revolution will deprive the classroom teacher of the license to bluff with impunity, to fall too far behind new developments, and to control students' information and perspective on the subject. It will reward at least one kind of good teacher: those who sincerely desire that students should become intellectually autonomous.

Some worries

I have as many fears as hopes, most of which could be shared by teachers in any discipline. My discipline is philosophy, and because I teach it in a way that is heavily text-based, I have one fear that could be called provincial, local to disciplines whose work with undergraduates revolves much less around research than around a body of difficult and important texts. Like many teachers of philosophy, literature, classics, and religious studies, I assign work that uses the library chiefly to enhance a student's experience of a primary text. So I would use the hypothesized access to information much less and much differently from colleagues in any of the sciences, who require research papers or literature surveys, and even from colleagues in philosophy who focus less on primary texts and more on battles in the journals.

While on-line texts will permit a new kind of close reading, with multiple windows and fast searches, I fear that an older kind of close reading will fade away. For this purpose, "close reading" is not code for a kind of analysis that some scholars think is methodologically flawed. It is a consequence of being able to hold a well-produced book in one's hands. Something about the fixity, visual clarity, tactile reality, and even smell, of good books puts students (and me) in the attitude of scholarship. Printed text can be beautiful; it can be a work of craft and art that can be appreciated apart from its content. Books have margins for writing in, personalizing, and appropriating. Printed text can be taken under a tree where there is no electricity, and lingered over long after batteries would expire. Not for a long time will computers be as convenient as a person using a book for random access; turning back and forth between two pages for comparison (once one has found the two pages) is as fast as one cares to have it, and no command structure and limited windowing capability stand in the way. And it's silent.

I worry that plagiarism would become easier to perform and more difficult to detect. First, there would be more literature more easily accessible from which to steal. Second, there might well be a data base specially devoted to term papers. Even now there are commercial cheating services that distribute term papers in hard-copy. Third, the chances would increase that the teacher would not know the source from which the student stole. It is possible, but only after considerable lag-time, that text-search algorithms would be so sophisticated and speedy that plagiarism would become easier to detect by the same technology that made it easier to commit. But the suspicious professor, unless she had a vague idea of the source, would have to search across all the relevant data bases, and across all the texts in each.

Perhaps it is just as well that we lack speedy, intelligent text search algorithms. For if we could search all the literature on Aristotle or Kant in useful time to see whether a student plagiarized a phrase, sentence, or paragraph, then we would be plagued with false positives. In any huge data base on a limited topic, there will be real coincidence of thought and language that we could not attribute to plagiarism.

I worry about about copyright problems. Copyright problems will either prevent the information access we are hypothesizing, or they will cramp and distort it. Publishers will not be willing to create such easy access that they cannot collect royalties. But this is not an argument for dispensing with copyright. Both publishers and authors need copyright protection; and if they do, then readers certainly do, and hence, so does scholarship and inquiry. I would not want to participate in an information system that created such disincentives for publishers that they published less high-risk and small-market literature. But if information access evolves in a way that respects intellectual property, then access will never be perfectly free and easy.

(I hope a solution is found that is analogous to that once proposed and partially implemented for videotape. Rather than limit the technology of free copying and wide distribution, a surcharge is put on blank tapes, which is divided among the copyright holders. I would gladly pay a surcharge on blank disks or telephone service if the result were a truly free and decentralized world of information.)

I worry that students will begin to think, "If it's not on-line, then it doesn't exist". Or "...then it isn't worth hunting down." On-line searching is already much easier and faster than print searching. When there is more full-text for which to search on-line than there is in any single print library, then the temptation to begin and end every search at the terminal will be irresistible. This will not matter in the hypothetical future in which all text ever printed is on-line. But in the more plausible futures, it will matter very much. It will make entire categories of literature invisible to scholarship, at least student scholarship: old works other than frequently reprinted classics, works in ideographic languages, works in non-European languages, heavily illustrated works, ephemeral works other than newspapers, works with little or no appeal to the businesses whose subscription fees keep on-line data bases alive, and works whose content, format, or orientation makes them unappealing to the purveyors of standard disciplinary data bases. No one can pretend that this list only includes "unworthy" literature.

I am lucky that I specialize in works written in European languages that contain very little mathematical notation. They are easily digitized. But I am unlucky that neither government nor business has much use for an on-line data base in philosophy. The funding that digitized the philosophical literature already on-line was chancy, and we cannot be sure that our work will continue to be digitized in the future. Chemists, lawyers, doctors, and petroleum engineers will never face this problem. As it is, the English language philosophy journals written before 1940 have not been digitized and may never be. But by the nature of the discipline, we cannot say that essays on Spinoza written in 1930, or 1830, are less worth reading than essays written today. Students who limit their searches to what is on-line will truncate the field. After a few generations of this, in which scholars who truncated their field as students become teachers and lead the next generation to do the same, the field may be permanently stultified. Contrary to the claims of a few contemporary philosophers, ancient philosophy is not obsolete and modern philosophy is not rootless; but technology and a certain lazy faith in its sufficiency may change all that.

Confusing information with education

I worry about how students will cope with the enormous number of citations, abstracts, and full-text articles that will be at their finger-tips. Every scholar needs filters, guides, and critera for sorting and selecting from among that ocean of literature the pieces most likely to be relevant to the topic or task at hand. What filters will students use?

Teachers can function as filters, although they are not the only filters, and are not perhaps even the best filters. Some of the worst filters are authorities who are not teachers, who simply tell their followers how to carve up the expanse of literature and how to interpret it. To use that kind of filter is the negation of liberal learning. What students need are filters that are authoritative but not authoritarian.

Imagine that you want to know whether human beings are products of a separate divine creation or evolution by natural selection. You are diligent, so you go to the libraries and the data bases. You go in good faith. You are an 18 year old college freshman.

You encounter two enormous bodies of literature. You can't read it all, but it's obvious that each side seems to have a complete explanation of the phenomonon of human beings. Each objects profoundly to the other position and seems to have completely satisfactory answers to its objections. Each even has guides or criteria for wending one's way through the enormous literature on the question. You need an authority to tell you which of these enormous bodies of literature is more trustworthy than the other. In the absence of a proper authority, you will use whatever authorities already have influence over you.

What filters do we scholars use in our own research?[Note 1]

In part, we use publishers. This includes not only book and journal publishers, but also data base publishers. We pay them to be discriminating, even if we resent their decisions when we are up for tenure. How do they discriminate? What are their filters?

Most use peer review. But who are peers? The dismal answer is: peers are people like ourselves. The circle has closed quite quickly. Peers are other people with strong opinions about what is good literature and what is not, people with whom we might agree or disagree. Seeking an authoritative filter is not the same as seeking objectivity. We cannot escape our own role in peer review. The closest we can come to objectivity is convergent subjectivity.

This shows at least that there are journals that have an orientation, a standpoint, an ideology, a methodological affiliation. There are movements in each discipline that have their own journals, or even 50 journals. We don't recommend some of them to our students because we know what movement they represent and we don't belong to that movement. We may be wise and discriminating or ignorant and bigotted in these judgments, but we are acting as filters when we choose in this way what to recommend to our students.

Sometimes we use citation analysis as a filter. But we know that a good article may be cited rarely or never, and that bad articles may be cited continually. If we are already masters in our discipline, we know why this happens. Some of it is chance. Some of it is the popularity of the author's topic at the time of publication. If not especially hot at the time it appears, it might be buried forever even if it is true and important. Or it might be buried for 20 years until the world comes round to recognizing why it is true and important. Some of it is the reputation of the author, which may rise and fall capriciously. Some of it is the prestige of the author's school or movement, subject to the same vagaries. Some of it is the size of the journal's circulation. Some of it is the cost of the journal and whether libraries can afford it. Finally, part of the explanation lies in the fads and research interests of grant agencies. These are not always a function of what is most worth knowing. Sometimes they are a function of the superstitions of billionaires, quirks of tax law, or what generals think would be useful in warfare.

If we are disciplinary masters, we can evaluate all this. We know the forces at work in determining what gets published, and what gets prominent among what gets published. We use this knowledge to make intelligent filtering judgments about what we should read in our own research and about what we should recommend to students.

If I have a controversial claim to make it is this: undergraduate students do not possess this knowledge. They lack the mastery of a discipline. Hence they lack the one filter that is authoratitive without being authoritarian. The only authentic filters or maps to the vastness of available information are the disciplines —not anything accessible to the fledgling undergraduate such as serendipidity, the 'neutral' methods of librarianship, the imprimatur of authorities, or the handbooks of cults.

Undergraduate college is where students first learn the mastery of a discipline. For my purposes here, a 'discipline' includes interdisciplinary and multidisciplinary programs of study. Students lack this mastery before they get to college, possess it after college, deepen it in graduate school, but first acquire it as an undergraduate.

Disciplines are the only authoritative filters because they are based on knowledge, not authority. Peer review is the best we can do, despite its fallibility and subjectivity, because it is based as much as possible on disciplinary mastery. But since there are ideological divisions among disciplinary experts, peer review harbors the risk of suppressing heterodox and revolutionary new work. Disciplines are most likely to survive this risk in a free society with a wide and robust field of publications in which new work needn't appeal to the filters of the old regime.

It follows that work validated by peer review will be both ideologically divided and voluminous. Hence, we will need another filter before we decide what to read for our own research and what to recommend to students. Again, we must use our disciplinary mastery for this because we have nothing else that is not simply authoritarian. If we take our filter from another source, we are not navigating freely or reliably through the wilderness of information. We will be controlled by that authority.

In Plato's Meno, Meno challenged Socrates with this dilemma. Why should we look for the truth? For if we know it already, then the search is superfluous; and if we don't know it already, then we wouldn't recognize it if we should ever find it. Our reflections show us how to revise Meno's dilemma for the information age. If disciplines are the only authoritative filters and undergraduates are just acquiring a discipline, then how can a titanic data base serve undergraduate education?

The answer is that for those of us who are already disciplinary experts in something, it is too late. We may find the titanic data base essential for our research, but it is too late to give us that first mastery of a discipline. But if we do not already possess that mastery, then it is too early. We will not have the one authoritative filter which allows us to make intelligent judgments. (This may explain, incidentally, why Dialog is in great demand at graduate schools but has had trouble, despite its deep attractions and utility, interesting undergraduate colleges to make fuller use of it.)

Meno's paradox was summed up very well by a student of mine who wrote in a paper, "If we don't understand other people, then how will we ever understand them?" If we don't know how to judge what we find in a titanic data base because we lack disciplinary mastery, then how will we ever acquire disciplinary mastery from the data base?

Socrates called Meno's dilemma a "trick argument"; I don't. The modern version applies just as much to libraries and textbooks as it does to data bases. Students adrift in a large library know they need help, but not because they fear they might run into interest-driven and methodologically-riven scholarship. But students adrift in a large data base will probably feel even less need for help. The small screen and game-like search commands prevent the sense of being lost; the aura of the computer deflects the fear of being duped. The risk of dupery is worrisome precisely for those students who don't worry about it.

My largest worry is that students will come to confuse information with education. It is my largest worry not because it is the most likely; I think good teaching can overcome it. It is the largest because it is the most grave. The universe of easily accessible information will put a new burden on the classroom teacher to make the distinction between information and education vivid for students. Today there is barely time in bibliographic instruction to get beyond search strategies to the fine points that permit self-moving scholarship. In the hypothetical future practically all our bibliographic instruction time will have to be spent on disciplinary methods for asssessing information and bibliographic leads.

It is possible that my worry is groundless, and that the advent of very wide, very cheap access to huge data bases would put information in perspective and lead students to look for the wisdom to appraise and digest it. However, I suspect that this is only a long-term development, at best, and that in the short term the niftiness of it will overwhelm the comparative triviality of it, for many disciplines, and lead almost everyone to overestimate the net gain.

The ultimate question in education, certainly in my field of philosophy, has never been access to information; it has always been wisdom or the capacity to judge information and to construct knowledge and derive action from those judgments. Access is crucial, however, for almost all the sub-ultimate goals of education, including the important political goals of wide and roughly equal distribution of resources. When we get real access, and get the mother lode, we are likely, temporarily, to make the secondary goals of education primary and forget that the primary goal is not served by our new darling.

I worry that there will be a proliferation of junk data bases —a difficult category to define, since one person's propaganda is another's writ, and one person's scholarship is another person's ideology. It will be in the interest of each cult and movement to have its own data base, just as it will be in the interest of every valid micro-specialization of research. How will students distinguish a large data base of literature on creationism from one equally large on super-conducting ceramics? It is very easy for students to think that, if only one were interested in these esoteric subjects, good information is at hand, and that every pile of citations is as reassuring as every other. If students know that creationism is unscientific, or controversial, they might be armed with useful doubts; if they don't, they are liable to serious deception. How will students evaluate a data base on European history, with no give-away title, that happens to be thick with articles denying that the holocaust occurred and remarkably thin on the contrary position?

Students —and, I believe, some librarians— who are not already educated in a field cannot distinguish the wheat from the chaff in the data bases in that field, but frequently think that they can or that it is not necessary. Students already assume too hastily that access to information is the passport to objectivity, not realizing that ideological divisions require the information to be judged before it can be made useful. They will have the same problem with the diversity of data bases that they have today with the diversity of journals. And although on-line guides or filters will offer assitance, they will suffer from the same diversity of perspective and quality as the information they purport to organize.

I worry that many students will not be prepared to browse in specialty data bases. By lacking the mastery of the discipline they are trying to acquire, they will lack the background knowledge and doubts necessary to distinguish ideology from scholarship. But I am also worried about the opposite, that some students will enter specialty data bases with unjustified doubts. For green undergraduates arcane science and arcane superstition are often on a par. Sometimes this equation lends false legitimacy to supersitition, and sometimes it sheds false doubt on science.

Here we face the paradox again. For the only way to escape this situation is to become educated. The equipollence of science and superstition is a barrier to education that is only removable by education. Those already educated in a field can make responsible (not infallible) judgments about what is ideology and what is scholarship in that field. This is part of what it means to be educated. Those who are still in process and not yet masters of the discipline they are acquiring cannot make such judgments. This makes the acquisition of education paradoxical, but the paradox does not make education impossible. For the student caught in the pincers of this paradox, help does not lie in more thorough indices, better keywords, or search skills.

Nor does it lie in general librarians, as opposed to those who are disciplinary specialists. General librarians are as lacking in disciplinary expertise as the undergraduate student. They can help find anything, but they can help judge very little.

One solution, tempting at first glance, is to avoid specialty data bases and stick to the biggest, "standard" data bases for a given field. However, this overlooks the presence of good material in small data bases, and of bad material in the major data bases. Students would have the impression that the "standardness" of the major data bases somehow vouches for the literature they contain. This is false and misleading. Standard journals publish articles of varying quality and varying ideological perspectives; standard data bases stir the pot further by including journals of varying quality and varying ideological perspectives.

There are two problems here. First, there are honest disagreements among scholars, usually traceable to prior methodological differences. Each 'school' or 'paradigm' has its own journals now, and may well have its own data bases soon. Second, there are the junk data bases. For present purposes it does not matter whether a firm distinction between these categories can be sustained. Nor need we impugn the good faith of writers in the second category. For in a large country there are undoubtedly honest astrologers, anti-semites, and Ph.D. researchers who think that smoking is harmless or that women are congenitally less able than men in mathematics. But when they promote their views in the "marketplace of ideas," and when the marketplace moves onto every desktop, then they set a task for the classroom teacher who wants to empower students to recognize and reject pseudo-science and pseudo-objectivity.

I certainly prefer that all literature should be available than that it should be controlled, even by scholars whom I personally get to nominate. In one sense, it is all available today, and controlled only by the difficulty of using a library. But this difficulty is considerable. It is almost impossible to walk into a print library and lay your hands on 500 citations to work on a topic you know nothing about. When most literature is on-line, this control will lift, and the fake blessing of objectivity associated with computers will touch all of it. So we will experience the problems of naive and indiscriminate citation more then than now. That is not a ground for resisting the advent of information access; it is a call to teachers to watch the scene and anticipate a problem.

One might think that the aura of objectivity associated with computers will undoubtedly decay as the next generation grows up. But I am not so sure. Today's students were raised with print, but that has not demystified print information for them. They do not distinguish among professional journals until they are taught to do so, and are still naive in their deference to judgments that find their way into print.

I worry that bibliographic instruction will overlook the potential for confusing information and education, and will one-sidedly lead students to glory in the riches available to them. The glory will be justified; its one-sidedness will not. The problem will be ignored, and even aggravated, if library use, bibliographic instruction, information access —whatever one wishes to call it— is conceived as technical training in computer use, data base dialing, search commands, and the mysteries of keywords. It will also be ignored if it proceeds by universal rules, independent of disciplinary methods and differences. If non-disciplinary techniques sufficed beyond the most elementary level, then good librarians, simply using good librarianship, would be able to judge all work in all disciplines.

In one sense the task for teachers will not change, since today we teach judgment more than facts, and the assessment of claims more than access to them. The qualities of mind taught in my discipline, and in the other disciplines traditionally grouped among the liberal arts, including the physical sciences, will be needed more than ever to preserve a clear sense of the primary objectives of education, with all their terrible human difficulties and complexity, in the face of the increasingly spectacular secondary services. If we have not put the distinction between assessment and access in the foreground of our teaching, the reason is that the competetion with education from mere information has not yet called us to do so. We should prepare to be called.

In short, nothing will change except that the temptations to resist will become nearly irresistible. There will be many more painless ways to teach badly, more excuses for those who want them, more dazzling distractions to the mission of liberating students from ignorance and cultivating their competency to judge for themselves.

It is possible that the universe of available information will become so large that some teachers might actually control their students' command of it the way the mapmaker controls the explorer in the wilderness. We must resist this temptation: our interest should not be to limit the alternatives that students see, but to make sure that they are able to make responsible judgments. We must educate students for intellectual autonomy, not discipleship, so they can navigate for themselves in the wilderness of information. We must present our considered views, of course, but for an audience that more and more will judge them in light of their alternatives. Enhanced access to information will make us comic figures if we present our own views as if our critics were silent. Educating students for autonomy is not as easy as typing keywords at a terminal and catching a cascade of citations in a basket; it requires real teaching.


