Open Access News

News from the open access movement


Sunday, July 08, 2007

New OA database and search engine for chemistry

ChemXSeer is a new (March 2007) OA database and search engine for chemical literature, formulae, tables, and data.  One of the co-developers is C. Lee Giles, who was also one of the co-developers of CiteSeer.  From the site:

Research in environmental chemistry is becoming increasingly collaborative and multidisciplinary in scope and approach. For example, within the Penn State Center for Environmental Kinetics Analysis (CEKA), researchers are taking a multidisciplinary approach to linking kinetic information in environmental chemistry across spatial and temporal scales. A main goal of such research is to integrate experimental, analytical, and simulation results performed on systems from molecular to field scales in order to approximate the complex physical, chemical, and biological interactions controlling the fate and transport of contaminants better. New scientific questions can be generated when users have access to a broad spectrum of related results. As connections are made among field observations, experimental kinetics, spectroscopic analyses, and model predictions, gaps in the information web will become apparent. Approaches to filling these gaps can then be addressed by the collaborative team. An easily queried, intelligent database will provide access to critically relevant data for a diverse community of users, enabling these users to achieve higher order scientific goals. In short, data collection and synthesis will lead to better science and improved education of scientists.

ChemXSeer is an integrated digital library and database allowing for intelligent search of documents in the chemistry domain and data obtained from chemical kinetics. Currently, we have designed and implemented the following:

  1. Chemical Entity Search : The tool identifies chemical formulae and chemical names, disambiguates the terms from other general terms, and tags them. Novel similarity scores, ranking functions and search methods are used to enable searching for chemical entities.

  2. TableSeer : This tool automatically identifies tables in digital documents and extracts the contents in the cells of the tables. The contents are stored in a queryable table in a database. TableSeer extracts table metadata, and uses a novel ranking function to search for tables relevant to user queries.

  3. Databases : Our data repository contains experimental data obtained from various sources. Our tools can process, store and link data in multiple formats, e.g., Excel, XML, Gaussian, and Charmm. A metadata ad-on can help annotate the data and link multiple datasets.

The metadata is then used to link the data to published articles allow the end-user to search for relevant data.

Built using a novel architecture, our digital library will also utilize novel focused crawling and query expansion and rewriting techniques to utilize the limited resources available at hand and to enhance the quality of the search respectively.