Presentation of Competency E

Competency E: design, query, and evaluate information retrieval systems.

Introduction

The goal of an information retrieval system is to help the user find good documents. (Wolfe & Zhang, 2009)

Manual information retrieval (IR) systems existed at least as far back as 3000 BC, when the Sumerians developed classification systems that helped to identify tablets and its contents. Being able to retrieve relevant information remained a crucial part of managing collections as the centuries went on and scrolls and the codex were invented and improved (Singhal, 2001). Manual IR, or classification, is  discussed in Competency G.

With the advent of computers, automated IR became a real possibility. The idea was first described by Vannevar Bush in 1945, in the article “As We May Think,” and has been modified, expanded, and developed by others over the years. (Singhal, 2001). Current manifestations of automatic IR  include global search engines such as Google and Yahoo, and local engines for desktop, database, website, social media, and OPACs.

As an addition to the quote above, another goal of automatic IR systems is to find and sort information in such a way that prevents information overload. Too much information is just as problematic for the searcher as too little information, and poorly sorted results only compound the problem.

In order to make it easier for the user to search for information, IR systems  should ideally have an easy-to-use interface, be easy to learn, have a good query system that finds relevant information and offer spelling corrections, sort the search results, and display adequate metadata or other information in the search results to allow users to determine whether any particular result is relevant to their information needs.

Design
IR systems vary greatly in their design. While the more complicated systems may offer more powerful searching capabilities and more refined results, they are difficult to learn to use. LexisNexis is an example of a complicated but useful tool that can search specific types of laws or statutes, bring up related legal papers and cases, and help researchers see if there is any precedence for a certain legal argument. Learning how to use LexisNexis is initially overwhelming, though, and can scare some people away, so they require a fair deal more user-investment and library training classes to use. On the other end of the spectrum, the simpler IR systems like Google, are easier to learn and teach, even though the results might not be as refined.

Querying
Boolean searching is perhaps the most basic form of searching for IR systems, using connectors such as AND, NOT, OR, NEAR, in order to include or exclude certain kinds of information from the results. Many systems will provide drop-down menus that help the searcher phrase their query; this is more user-friendly than requiring the searcher to fill in those Boolean keywords themselves. In either case, it assumes the searcher will understand how Boolean searches work. More modern and easier-to-use systems can interpret natural language queries by dropping articles, choosing key words, and retrieving results according to what it guesses the user needs. It is particularly useful of the system can detect misspellings in queries and offer helpful suggestions.

Evaluation
IR systems are evaluated based on their accuracy of searching, sorting capabilities, organization of results. Google has a proprietary algorithm that ranks and sorts the results by their popularity, apparent authority, and by user browsing and searching habits. For some searchers, this is useful. However, the algorithm has been exploited, leading to inaccurate, misleading, or spammy results. Other systems rank results by the number of times a word or combination of words have been used, which has proven itself to be an inefficient and inaccurate way of searching and ranking. Some academic IR systems rank results according to the number of times they have been cited in other articles or papers, which is a good idea for scholars who want to see which articles are the most “authoritative” in the community.

Competency Development

Most of my experience with Competency E has been from the user side of things, either searching for things myself or helping someone else learn how to query various IR systems. Years of being a student and working in various libraries has given me experience with searching OPACs of a variety of libraries, Google and Google Scholar, EBSCO, LexisNexis, and other database and journal searching systems. They all have their strengths and weaknesses–for example, it was difficult to learn how to search for legal documents in LexisNexis, so I learned how to look for things in Google Scholar, and plugged in the specific document titles into LexisNexix order to bring up the appropriate documents. Similarly, trying to look for material in Horizon, an ILS, was much more difficult than looking things up through an AquaBrowser or similar OPAC interface and then plugging the exact terms into Horizon. I would frequently use a combination of resources to make the most use of the knowledge contained in the various databases.

Evidences

While I have not had much experience creating automatic IR systems, I did gain an appreciation for how much thought and effort goes into creating a good database and search query form when I created a very simple database for cat food. I explained the learning process of organizing information and making it findable in the “Work Log and Evaluation” section.

I evaluated components of the Penn State Special Collections’ two IR systems to determine if they they helped people find good information, and if it was easy to use. Unfortunately, both IR systems were difficult to use, and so I made recommendations as to how to improve the user search experience.

Finally, I looked at Cambridge University and the University of Chicago’s special collections departments and how easy it was to use their IR systems. Under the headings of “Cambridge University Library General and Rare Books Organization” and “University of Chicago Library General and Rare Books Organization,” I compared and contrasted the usefulness of both libraries’ IR systems.

Concluding remarks

Information retrieval systems make it possible to store vast amounts of information and still be able to find the important documents, files, or posts. Many online databases, media sites, search engines, and even social media tools have built in IR systems in order to make it easier for searchers to sort through the large amount of information that is published every day. Proper categorization, organization, and a well-designed user interface makes the task easier. Therefore it is important to evaluate IR systems periodically to determine if there are any areas that could be improved.

Attachments

Cat food database

Evaluation of Penn State Special Collections’ information retrieval system

Comparison between University of Chicago and Cambridge University’s information retrieval systems

References

Bush, V. (July 1945). “As we may think.” Atlantic Monthly. 176: 101-108. Retrieved from: http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/

Singhal, A. (2001). “Modern information retrieval: A brief overview.” IEEE Data Engineering Bulletin. 24: 35-43. Retrieved from: ilps.science.uva.nl/Teaching/0405/AR/part2/ir_overview.pdf

Wolfe, S., & Zhang, Y. (2009). “User-centric multi-criteria information retrieval.” Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. DOI: 10.1145/1571941.1572144. Retrieved from: ti.arc.nasa.gov/m/profile/shawn/sigirpp1088-wolfe1.pdf

Last edited on October 15, 2012