1. What are some of the reasons that might warrant the need to use a search system on a website?
A search system would be needed if a site has large amounts of content or very diversified content. These search systems fragment sites and make it much easier for users to find content relating to a specific topic.
2. Why is an Information Architect interested in search systems?
Information Architects are interested in search systems because they have a greater understanding of how search systems function and can implement various functions to have the system produce the most accurate results possible.
3. Describe the core components of a search engine.
The Search Interface is the area where users input their own query for the system. The Search Engine is the component that actually searches through the content. The content is a collection of all of the information that the search engine would be required to search through. The results are the pieces of information that the search engine returns to the user based on their query.
4. What is a search zone? What are the approaches for creating search zones?
Search zones are sections of content that have been indexed separately from the rest of the content to eliminate irrelevant content. Search zones can be created by segregating documents based on their content or by assigning documents logical tags.
5. Explain the difference between recall and precision in terms of search results.
A search systems recall is determined by the number of relevant documents it returns compared to the number of relevant documents in the collection. Precision is based on the number of relevant documents returns compared to the total number of documents in the collection.
6. Consider the following search engines:
a. Search engine A retrieves 600 documents out of a total of 8,200 documents. Out of the 600 documents retrieved, only 500 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.
Recall = 500/923 : 54% : mid-range recall
Precision = 500/600 : 83% : high precision
b. Search engine B retrieves 131 documents out of a total of 8,200 documents. Out of the 131 documents retrieved, all 131 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.
Recall = 131/923 : 14% : low recall
Precision = 131/131 : 100% : exact precision
c. Search engine C retrieves 700 documents out of a total of 8,200 documents. Out of the 700 documents retrieved, 0 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.
Recall = 0/923 : 0% : no recall
Precision = 0/700 : 0% : no precision
d. Search engine D retrieves 5,000 documents out of a total of 8,200 documents. Out of the 5,000 documents retrieved, 923 are relevant out of a total of 923 relevant documents. Calculate the recall and precision rates for the query.
Recall = 923/923: 100%: total recall
Precision = 923/5000: 18% : low precision
7. What is the purpose of a stemming tool? Explain the difference between strong and weak stemming. Provide examples of strong and weak stemming.
A stemming tool expands a term to include other variation of the term. Eg. computer/computers/computation/computing etc. The previous example is a case of strong stemming whereas weak stemming would only go so far as to expand for plurals of a term. Eg. Speaker/Speakers
8. What are two main issues to consider when displaying the results of a search?
Which content components to display for each results returned and how to list of group the results.
9. How many documents should you display in a search result?
The numbers of documents that should be displayed is dependent on how much information is shown for each results. The more information is displayed, the fewer the number of results displayed should be.
10. Describe some approaches for sorting and ranking search results for display.
Sorting alphabetically, sorting chronologically, ranking by relevance, ranking by popularity, ranking by users’/experts’ ratings and ranking with pay-for-placement.
11. When sorting search results alphabetically, why is it a good idea to omit articles such as “a” and “the”?
It is best to omit “a” and “the” because users are more likely search for something like “The Guide To Building” under “G” rather than “A”.
12. How does “best bets” ranking operate?
“Best bets” are manually selected recommended documents that are selected based off of their popularity or are chosen by an expert after their own analysis.
13. What are four key factors to consider when designing a search system interface?
The level of searching expertise and motivation, the types of information needed, the types of information being searched and the amount of information being searched through.
14. What are some of the ways search system designers can help a user when no results are returned for a query?
Offer the user a way of revising their search, provide them with tips and other advice to improve the search, provide a means of browsing content and provide a human contact if all else fails.