INFORMATION RETRIEVAL


Goals and Issues of IR

  • Given query find documents to help answer query
  • IR is not question - answering
  • Helps summarize documents
  • Links related documents
  • Multi- and cross-lingual capabilities desirable
  • Representation of knowledge (text or media?)
  • Queries (type of query language)
  • Evaluation methods (TREC SDR)


Components of an IR System

  • Tokenization into words
  • Removal of function words
  • Phrase identification (noun phrases, names etc.)
  • Feature weighting to indicate importance in text