Chemistry International Blank Image
Chemistry International Text Image Link to Chemistry International Blank Image Chemistry International Blank Image Chemistry International Blank Image
Chemistry International Blank Image
Chemistry International Blank Image
Chemistry International Text Image Link to Current Issue
Chemistry International Text Image Link to Past Issues
Chemistry International Text Image Link to Officer's Columns
Chemistry International Text Image Link to Features
Chemistry International Blank Image
Chemistry International Text Image Link to Up for Discussion
Chemistry International Text Image Link to IUPAC Wire
Chemistry International Text Image Link to Project Place
Chemistry International Text Image Link to imPACt
Chemistry International Text Image Link to Bookworm
Chemistry International Text Image Link to Internet Connections
Chemistry International Text Image Link to Conference Call
Chemistry International Text Image Link to Where 2B and Y
Chemistry International Text Image Link to Symposia
Chemistry International Text Image Link to CI Indexes
Chemistry International Text Image Link to CI Editor
Chemistry International Text Image Link to Search Function
Chemistry International Text Image Link to Information


Chemistry International Text Image Link to Previous Issue Chemistry International Text Image Link to Previous Page Chemistry International Text Image Link to This TOC Chemistry International Text Image Link to Next Page Chemistry International Text Image Link to Next Issue

Vol. 28 No. 3
May-June 2006

The Project Place | Information about new, current, and complete IUPAC projects and related initiatives
See also

Defining a Data Standard for Near-Infrared Spectroscopy and Chemometrics

Successful long-term storage and retrieval of analytical data and the more-advanced techniques of data mining and knowledge generation are made possible through the deployment of well-documented, internationally recognized standard data formats.1 At the end of 2001, a group of scientists with a history of international collaboration met to discuss problems they were encountering in exchanging near-infrared (NIR) spectroscopic data. A more serious problem was the inability to move chemometric data, including raw data and calibration models, among software programs from different vendors and those arising out of various research and development efforts. Also, although the 1988 Joint Committee on Atomic and Molecular Physical data—Data eXchange (JCAMP-DX) standard2 had been adopted piecemeal by near-infrared instrument manufacturers, there was no data dictionary targeted specifically at the technological needs of the NIR community.

In response to these issues, a task group was formed and began work in 2002 on gathering information about the broader needs of the community. Several members of the task group had worked together on a European food-research project called Quality Established through Spectroscopic Techniques (QUEST). The QUEST team had sought to tackle the problem of a lack of standardization in the fields of NIR and chemometrics by developing their own project standards, providing a good knowledge base on which future efforts could build. This IUPAC project would use this knowledge base as a starting point, but the solutions that it created would need to be of broader scope, covering a wider range of instrumentation types than that deployed in the food and beverage arena.3,4

The NIR and Chemometrics Data Exchange Standards group meeting was attended by (from left to right) Rasmus Bro (KVL, Denmark), Mohamed Hanafi (ENITIAA/INRA, France), Douglas Rutledge (INA P-G, France), Tony Davies (Waters Informatics, Germany), Gerard Downey (Teagasc, Ireland), Jeremy Shaver (Eigenvector Research, U.S.) and Ian Cowe (FOSS NIR Systems, Sweden).

Most of the initial IUPAC work progressed slowly and was conducted electronically, but in light of the fact that the members of the group are all very active in industry, academia, and government laboratories, it was eventually concluded that a face-to-face meeting would be beneficial. There were several open issues that needed clarification, and, although the NIR work and the chemometrics work had separate objectives, with distinct timelines, having the entire task group work in both areas simultaneously had been problematic because of the different knowledge required for the two efforts. A meeting was thus called in January 2006 in Dublin, Ireland, with the aim of addressing these issues, getting the project back on track, and exploring the possibility of restructuring the task group into two parallel action groups corresponding to the two separate objectives.

It was particularly important to get the group moving again because the two recommendations it would be generating would be required for inclusion as the standard data dictionaries for Phase 2 and Phase 3 of the work on the new XML Analytical Information Markup Language (AnIML) data standards being conducted jointly with American Society for Testing and Materials (ASTM) International Subcommittee E13.15 <>. The meeting in Dublin brought together a good mix of instrument vendors, end users, third-party software providers, and academics.

Bones of Contention
The meeting began by bringing the participants up to speed on the work being conducted, including a review of the efforts of the ASTM E13.15 Subcommittee, which hadn’t been calculated in when the initial project proposal had been drawn up. Extensive constructive debate concerning exactly what information should be addressed by the chemometrics standard cleared up a number of issues that had been slowing progress. One specific issue discussed was the proposed capability of vendor software to export calibration models within the exchange format.

There are major business issues associated with this capability, particularly in the food and agriculture analysis field. The generation and distribution of just these types of calibration models is a major source of revenue for instrument vendors, with thousands of copies of such software sold each year. If the capability to freely distribute these models were built into the instrument software, allowing the models to be exported in an open-standard format, it could undermine, if not eliminate, the essential and profitable development work conducted to produce such models. However, academics developing new chemometric methods wish for exactly this functionality in order to document their activities and compare and contrast them with those of colleagues and the wider scientific audience.

One proposed response to these issues draws on the solution to similar problems faced by vendors of reference spectroscopic databases. In this case, users often want to enhance the software by including and exporting their own reference data. The software packages have this capability and can differentiate between copyrighted vendor databases—which cannot be exported—and user-generated databases—which can. Adopting this solution would mean that vendor-supplied, commercially sensitive chemometrics models would receive the same type of protection, while users would be free to export calibration models that they generated themselves in the new IUPAC/JCAMP-DX chemometrics data file format.

Process Analytical Technology
The need to document chemometrics data in a long-term, stable, vendor-neutral format will steadily increase in the future. This is particularly true in light of the wider adoption of process analytical technologies in regulated industries,
as highlighted by the U.S. Food and Drug Administration’s efforts to actively promote such technologies within the pharmaceutical sector. This risk-based approach to pharmaceutical batch release essentially envisages the software packages using data obtained from the manufacturing plant to make the majority of decisions concerning the release of a particular batch to market.

The need to document chemometrics data in a long-term, stable, vendor-neutral format will steadily increase in the future.

This is a major departure from the current practice, under which a quality assurance chemist must sign a release certificate following a series of lab tests. It is therefore essential that the models on which the software bases its decisions are available for scrutiny at all times and well into the future, long after a particular product, software package, or installation has been decommissioned. Essentially these models, and the data fed into them to generate a decision, will fall under the same Good Manufacturing Practice predicate rules and 21 CFR Part 11 <> electronic-records and electronic-signature rules as do the current analytical results and documentation within the quality assurance environment.

Education and e-Learning
In recent years major steps have been taken to integrate e-learning tools into normal curricula. In the chemometrics field, teachers and trainers have been developing courses with content that includes example calibration data files and the resulting models. The current state of the art is such that e-learning material often needs to be re-worked for each of the various third-party software solutions and instrument-vendor packages. When a standard format finally becomes available, it will greatly ease this burden, allowing trainers to post e-learning materials in the standard format for the trainees to download and install on their own systems, regardless of which chemometrics product they have standardized on.5

NIR File Format Standardization
A number of NIR data files in IUPAC/JCAMP-DX format were examined for compliance with the existing standards and found to require only relatively minor changes for compatibility. Participants also discussed what additions need to be made to the data dictionaries already available from prior JCAMP-DX standards. A draft recommendation is now being prepared. A second draft on chemometrics will also include a comprehensive list of the various pre- and post-processing algorithms commonly used in the field. A Web site has also been created to help broaden the discussion <>.

An Appeal
As with all such standards-development processes, the task group relies heavily on input from the scientific community and would very much encourage readers to follow the work as it progresses and to contact them with constructive ideas to improve and perhaps speed up the development of these two important recommendations.

1. R.J. Lancashire and A.N. Davies, The Quest for a Universal Spectroscopic Data Format. Chemistry International, 28(1), pp. 10–12, 2006.
2. R.S. McDonald and P.A. Wilks Jr., JCAMP-DX: A Standard Form for the Exchange of Infrared Spectra in Computer Readable Form, Appl. Spectrosc., 42(1), pp. 151–162, 1988.
3. A.N. Davies, A Pilot Study of the QUEST Spectral Database, in Food Spectroscopy Progress in Spectral Transfer and Database Development, ISBN: 0 9523455 4 4, p. 26, 1994.
4. A.N. Davies, The QUEST for Food Quality Control, Spectroscopy Europe, 4(3), pp. 27–28, 1992.
5. P. Lampen and A.N. Davies, JCAMP-DX to ORIGIN Utility Tools for Making Spectra Available to Chemometricians, Spectroscopy Europe, 16(5), pp. 28-30, 2004.

This article was authored by Gerard Downey (TEAGASC, Dublin, Ireland), Douglas Rutledge (Institut National de la Recherche Agronomique, Paris-Grignon, France), Peter Lampen (Institute for Analytical Science, Dortmund, Germany), and Tony Davies (Waters Informatics, Frechen, Germany). For more information, contact Tony Davies <[email protected]>, chairman of the Subcommittee on Electronic Data Standards.

Page last modified 25 April 2007.
Copyright © 2003-2007 International Union of Pure and Applied Chemistry.
Questions regarding the website, please contact [email protected]
Link to CI Home Page Link to IUPAC E-News Link to IUPAC Home Page