CiteSeer——学术搜索引擎

2012/1/10   点击数:2801

[作者] 魏瑞斌

[单位] inforworld

[摘要] CiteSeerX的兴起与开放获取、电子科研和引文标引系统有着直接的关系。文章通过分析 CiteSeerX的搜索原理和重要功能,与同为学术搜索引擎的Google Scholar进行对比,明确其优势所在,指出它的不足,给出相关改进建议。

[关键词]  搜索引擎 开放获取 数据库



引文搜索引擎CiteSeerX调查评析

刘莎 / 武汉大学信息管理学院

摘要:CiteSeerX的兴起与开放获取、电子科研和引文标引系统有着直接的关系。文章通过分析

CiteSeerX的搜索原理和重要功能,与同为学术搜索引擎的Google Scholar进行对比,明确其优势所在,指出它的不足,给出相关改进建议。

关键词:引文搜索引擎,开放获取,文献检索,数据库,知识库

摘自:数字图书馆论坛:2011年第12期

CiteSeer

http://citeseer.ist.psu.edu/index;jsessionid=5DEC6D0DA123B7BAE8A45A0AB5490EA0

From Wikipedia, the free encyclopedia

CiteSeer was a public search engine and digital library for scientific and academic papers. It is often considered to be the first automated citation indexing system and was considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search. It was replaced byCiteSeerx and all queries to CiteSeer are redirected to it. It was created by researchers Steve Lawrence, Kurt Bollacker and Lee Giles while they were at the NEC Research Institute (now NEC Labs), Princeton, New Jersey, USA. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous citation indexing to permit querying by citation or by document, ranking them by citation impact. After NEC, it was hosted as CiteSeer.IST on the World Wide Web at the College of Information Sciences and Technology, The Pennsylvania State University, and had over 700,000 documents, primarily in the fields of computer and information science and engineering.

CiteSeer freely provided Open Archives Initiative metadata of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP and the ACM Portal.

CiteSeer's goal was to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access movement that is attempting to change academic and scientific publishing to allow greater access to scientific literature.

The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations.

CiteSeer had not been comprehensively updated since 2005 due to limitations in its architecture design. It had a representative sampling of research documents in computer and information science but was limited in coverage because it only has access to papers that are publicly available, usually at an author's homepage, or those are submitted by an author. To overcome these limitations, an modular and open source architecture of CiteSeer was designed.

The new version and design of CiteSeer can be found at the Next Generation CiteSeer, CiteSeerx, website. CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. As such authors whose documents are freely available are more likely to be represented in the index.

http://en.wikipedia.org/wiki/CiteSeerX#Next_Generation_CiteSeer_.28CiteSeerx.29

原文连接:http://blog.sciencenet.cn/home.php?mod=space&uid=113146&do=blog&id=527841