Fact Based Search Engine: News Fact Finder Utilizing Naive Bayes Classification
Files
Document Type
Book Chapter
Description
There are a number of quality news sources available on the Internet. Searching through all these sources for facts related to a certain subject would be exhaustive for a user. We developed a niche sentence level search engine called News Fact Finder in order to provide users with factual information relevant to the query. Sentence level search is based on the intuition that if all the query words are within the same sentence, that result is more relevant than a result containing the query words in remote parts of the text. We therefore use suffix arrays which excel at exact substring matching to index our database. Our framework uses a Naive Bayes classifier for classification of sentences as facts and opinions. Ranking was performed at the document level, such that a document with many related facts would be ranked higher. News Fact Finder performs competitively on a large collection of news documents in providing relevant fact-based results to users. This is a novel approach to perform quality-based searching, ranking, indexing and categorization of news information.
ISBN
978-3-642-37688-7
Publication Date
2013
Publisher
Springer
Keywords
search engine, word list, news article, inverted index, sentence level, news fact finder
Disciplines
Computer Engineering | Engineering
Faculty
Faculty of Applied Science & Technology (FAST)
Copyright
© Springer- Verlag Berlin Heidelberg 2013
SOURCE Citation
Salmon, Ricardo; Ribeiro, Cristina; and Amarala, Swathi, "Fact Based Search Engine: News Fact Finder Utilizing Naive Bayes Classification" (2013). Books and Websites. 7.
https://source.sheridancollege.ca/fast_books/7
Original Citation
Salmon R., Ribeiro C., Amarala S. (2013) Fact Based Search Engine: News Fact Finder Utilizing Naive Bayes Classification. In: Pasi G., Bordogna G., Jain L. (Eds) Quality Issues in the Management of Web Information. Intelligent Systems Reference Library, vol 50. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37688-7_6