Boolean and Free-text Searching for Legal Research

Boolean and Free-text Searching for Legal Research

Richard Delgado and Jean Stefanic, in “Why Do We Tell the Same Stories?: Law Reform, Critical Librarianship, and the Triple Helix Dilemma”(42 Stan. L. Rev. 207, 214 (1989)), wrote that “Some scholars note that the inability of lawyers to follow the development of the law either nationally or locally threatened stare decisis because of the ‘enormous’ and ‘unrestrained quantity’ of competing reporters, which ‘discouraged research and inevitably led to a conflict among authorities.”

Boolean and Free-text searching are tools used in the context of computer-based searching. “Full-text searching enables a researcher to search for every occurrence in the database of any word or combination of words without a pre-existing index.”[1] “Boolean logic is a syntactical calculus used for the comparison of data items (words and numbers) and combinations of data items. . . . The power of the Boolean search is the ability to match items that have a specific relationship within a document.[2]

In a full-text system, such as Lexis or WESTLAW the use of these conjunctions allows the researcher to create a context – to specify a relationship between the terms for which the researcher is searching.”[3] Although these systems have been praised because they do not rely on a pre-coordinated index, they have also been criticized because they do not provide the novice researcher (a researcher unfamiliar with the conventions of database searching or unfamiliar with the subject he is seeking to research) with the tools to obtain all of the information he may need on a particular area of law.[4] It is important to consider the mechanics of computer-based research in order to understand why it is not well suited to retrieving Legal concepts . “Information in legal databases is organized by words [which are] … placed in a massive alphabetized list, and [their] location … noted; this is called the concordance … the computer essentially compares the words in our request to the concordance, and notes the documents that have the word combinations we have requested … There is no discernible framework … There is no overriding organization of concepts and rules. Searching for concepts and rules is something that computers are notoriously poor at doing.”[5]

Whether a supporter or critic of Boolean or free-text searching, neither approach should be considered the last and most effective tool for creating a uniform information retrieval methodology for law. Free-text searching assumes a certain level of knowledge with respect to the terminology that must be used in the search. In most applications it has not been made to handle synonyms nor consider the legal background of the user (possibly using domestic terminology familiar to him or her).[6] These search mechanisms are useful in a national law context, because the framework for the law is already understood, and terms can be used with a level of confidence[7] and security that they will produce complete and relevant research results. In the context of international sales law, a uniform terminology that represents Legal concepts for the purposes of searching must still be created.[8]

For legal information retrieval, at least in some areas of the law (specially international law), an index (based on the terms in the Thesaurus ) should be incorporated into search interfaces to allow the user to see and utilize the framework that has been created for the law.[9] Law librarians have recommended the combination of Boolean searching with editorial features (e.g., indexing, etc.).[10] Possibly, a “mark-up language,”e.g., legal XML,[11] could be used to incorporate the relationships established in the Thesaurus to ensure high recall[12] of relevant[13] documents. Whichever alternative is adopted, computer-assisted Legal Research in its present form does not justify the abandonment of the precoordinated index.

Notes

1. Robert C. Berring, Full-Text Databases and Legal Research: Backing into the Future, 1 High Tech. L. J. 27,28 (1986).

2. Boolean combinations of descriptors can also exist. Free-text searching can independently function without Boolean operators.

3. Daniel D. Dabney, The Curse of Thamus: An Analysis of Full-Text Legal Document Retrieval, 78 Law Libr. J. 5, 17 (1986) (“In full-text document retrieval, there is no human subject indexing”)

4. Dabney reports as follows on the current Lexis and Westlaw approach: “Both LEXIS and WESTLAW rely almost exclusively on the ability of the systems to recognize words supplied by the user. The difficulty with this approach is that there is an imperfect correspondence between words and ideas.”Op cit. at 17. Because many judges and practitioners are not likely to use exactly the same words to describe concepts or ideas, West Publishing has tried to compensate by creating a “Full-Text Plus”system. “This system refers to the fact that the WESTLAW database contains the full text of cases plus the same text of headnotes and Digest summaries printed in the National Reporter System . West posits that this addition introduces ‘normalized’ language because the trained editor has again entered the picture. The uniform language in the headnote and syllabus are supposed to compensate for the imprecision of the judicial author. Thus, the searcher can formulate a search strategy knowing that his search phrase will be matched up both with the text of the judicial opinion and with the ‘normalized’ language introduced by West editors in the headnotes and case synopsis.”Id. at fn 68.

5. Barbara Bintliff, From Creativity to Computerese: Thinking Like a Lawyer in the Computer Age, 88 Law library Journal 338, 346 (1996).
“LEXIS and WESTLAW have begun to develop concept-based systems and have introduced ‘natural language’ search interfaces as a step in this direction. We now have Freestyle and WIN, respectively. Natural language moves towards a conceptual search system, with a list of thousands of commonly used legal phrases indexed in addition to words. But natural language requires a complex search interface, which substitutes a series of mechanical judgments for our decision-making process. The computer program ‘identifies’ the ‘concepts,’ which are basically nouns or legal phrases, in the search request, and matches them against its inventory of words and legal phrases. The program identifies other documents with the same concepts and ranks its findings by statistical relevance – primarily by the number of times the concept occurs and how close to the beginning of the document it first occurs.

Like other computer searches, sometimes the results of natural-language searches are extraordinary, and sometimes they are worthless; usually they are somewhere in between. In any event, your ability to think in computerese and the underlying logic of the computer program determines the outcome of your research. This isn’t the bias-free, untouched-by-human-hands results we expect of a computer, for many decisions are made for you by the computer program. Furthermore, many programmers are convinced that a better search, even for conceptual information, can be crafted using the Boolean techniques. One developer of CD-ROM-based legal materials stated that natural-language searching compared to Boolean searching is like using an automatic transmission versus a stick shift. ‘You don’t need to know anything about transmissions to drive an automatic, but all the race cars have stick shifts.’”Russ Armstrong, CD-ROM v. Law books , Law-Lib Discussion List (Jan. 8, 1996).

6. WESTLAW does now provide its users with an option to check a thesaurus of “Related Terms”when a researcher is conducting a search. It therefore permits its users to search with broader terminology, increasing chances of success for the retrieval of relevant information. Although Westlaw does not currently account specifically for the domain, i.e., terminology, of international sales law, it is the sort of technology into which the International Sales Law Thesaurus could easily be incorporated.

7. This confidence is probably unjustified. “Several extensive studies have clearly documented a false sense of security on the part of computer researchers. One study commented that users felt that ‘because the source is ‘technological,’ they are finding everything or, at the very least, finding the best materials. …We have suspended our sense of disbelief when it comes to computers.”Bintliff, supra note 116, at 349, quoting F.W. Lancaster et al., Searching Databases on CD-ROM: Comparison of the Results of End-User Searching with Results from Two Modes of Searching by Skilled Intermediaries, 33 RQ 370, 382 (1994).

8. As Professor Germain puts the problem and a solution: “. . . Search engines are essentially of two kinds, human-mediated ‘intellectual’ indexes and ‘robot’ or automated indexes. In the intellectual indexes, individual web sites are classified by hand according to various classification schemes . . . ‘Robot’ or automated indexes use programs that download every page … so that every word on every page can be indexed by a … search engine … An April 1998 study by the journal Science concludes that search engines are not thorough in finding relevant documents, because they each only index a fraction of the total documents available … The lesson is not to rely on just one engine . . .”Claire M. Germain, Content and Quality of Legal Information and Data on the Internet with a Special Focus on the United States, 27 Int’l J. of Legal Info. 296 (1999) [citations omitted]. For more on difficulties associated with “intellectual”indexes and “robot”or automated indexes, see Section 6 of Graham Greenleaf et al., Moving Access to Law into the 21st Century <https://www2.austlii.edu.au/~graham/AALS/Restatement-A.html>.

9. For example, and recognizing that the analysis of any CISG Article should combine the actual CISG Articles, case law, legislative history and scholarly commentary, the Pace database provides “Annotated Text Pages”that seek to integrate all of this information for each CISG Article at one source.

10. Dabney, supra note 3 at 34 (“The addition of good human indexing to CALR data bases is a promising approach to the problem of improving retrieval performance in such systems . . .”).

11. See Legal XML Standards Development Project at <https://www.legalxml.org>.

12. Recall is the percentage of the total number of relevant documents in a database that are retrieved by the search being studied. See supra 3 at 15.

13. Relevance is the relationship between a question and a document that makes the document important to the person researching the question. Id. Dabney points out that as recall goes up, relevance goes down, and vice versa. This is a problem inherent in most CALR systems. Id. at 16.

See Also

Conclusion

References and Further Reading

About the Author/s and Reviewer/s

Author: international

Mentioned in these Entries

Free internet US legal research, How to search legal journal indexes?, Law books, Law library, Legal Research, Legal Trac, Legal concepts, Legal research: Files: Law and Media Technology, Legal research: resources for libraries, Lexis, National Reporter System, Thesaurus.


Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *