30 No. 3
||Providing brief overviews of helpful chemistry resources on the Web.
The Periodic Table: Database or XML?
by Daniel Tofan
In the July-August 2004 issue of Chemistry International, I suggested the idea of an XML specification dedicated to exchanging scholarly data among course management systems and, in general, applications dealing with learning general chemistry. The project, under the proposed name Chemical Education Markup Language (ChEdML), is a major undertaking and its success is highly dependent on the willingness of software developers to implement a new standard. While building such consensus may not be entirely feasible, smaller projects that can demonstrate the usefulness of data structuring in chemical education are easier to implement and publish.
The main goal of the Periodic Table Database/XML project is to provide an open source of data about the elements in various formats. The project came to life during the past few months as a group project in a graduate-level course I taught titled “Computers in Chemical Education.” The idea was inspired from the myriad of periodic tables that are now available on the internet. Students needed a way of extracting the data about the elements and putting it in some user friendly electronic format as part of the course requirements. They found no structured way of extracting all such data at once without navigating multiple web pages and filtering out ads. A web search was conducted, looking for a database or an XML specification that would provide properties of the chemical elements in a structured, computer readable form. Very few websites were found to be significantly helpful, and there are scarce attempts to organize the periodic table in XML. The ones that we were able to identify were rather lacking—only a few properties were included for each element. We did not find a complete representation of the periodic table in XML.
While searching for databases, we noticed that many people call “database” a collection of web pages that display information about a subject, in this case the periodic table. What was sought was an actual database product that can be queried in order to extract meaningful information. The only serious product that we found that uses a database was the Periodic Table of Data, a project of the Royal Society of Chemistry.1 A close inspection of this Access 2003 database file (available for free download from the RCS) reveals much redundancy in the construction of the database tables. The database was not an actual relational database but merely a collection of tables having the same field structure, which were apparently being populated in different ways. It is important for a database to be well designed from the start in order to eliminate redundancy, to minimize storage space, and maximize search capabilities. Thus, we decided to implement our own version of the periodic table data in comprehensive form.
The main decision to be made was whether to use a database format or an XML format to store the periodic table in a structured fashion. In an attempt to expand on the ChEdML project, XML was our first choice, and students taking the course were given the task of creating the XML structure and populating it with the most important data about the elements. WebElements2 was chosen as the main source of information. Once a template for the XML structure was agreed upon, students worked in small groups and populated the skeleton with data about the elements. XmlShell3 was used to edit, duplicate, and move the XML fragments in order to expand the common template to include the entire table. The goal of the work was to merge all individual XML files (each representing a group of elements) into one master document that would be subsequently subjected to data validation. Unfortunately, but perhaps to be expected when several different students inexperienced in XML work on a common project, the end result had many inconsistencies. A very basic DTD (document type definition) was created to check the final product, and the data validation step failed. Clearly, we had been using the wrong approach to creating a consistent, accurate representation of the periodic table in XML.
The project was started over and we had the same dilemma of using XML versus a database. XML has the advantage that it is plain text and thus readable by humans as well as computers. However, building and editing XML files, even when using a dedicated editor, proved to be very tedious and error prone. The main reason to use XML is to export data in a structured, open-source format that can be read by other applications. From the point of view of creating the structure and entering the data, we soon realized that a database is by far the better approach. A relational database offers many advantages over XML: it is fast, compact, can enforce data integrity, can be queried in complex ways, offers user friendly forms for data input, and has extensive export capabilities. Careful design of the tables and relationships between tables offer advanced querying capabilities and the ability of grouping data into meaningful categories. More importantly, exporting the data from the database in XML format is possible in a very elegant fashion, through software, thus guaranteeing that the data is XML-valid.
We chose Microsoft Access 2007 as our database product. Access has a very user friendly interface, and thus creation of tables, relationships, and forms was straightforward. The main table, called “Elements,” contains the most basic information about each element (nomenclature, position in table, description, and other factual information). With the exception of atomic number and mass, all other numerical data about the elements were stored in separate tables, grouped by category (bulk properties, thermodynamic properties, electronic properties, etc.). Relationships were built between the main tables and the additional tables using the atomic number (primary key in the “Elements” table) as a foreign key in all tables except one that stores units of measurement for various properties reported. Referential integrity was enforced, thus ensuring that each record is linked to a valid element. No duplicates were allowed for information that is inherently unique, such as atomic number, symbol, or name. Field sizes were restricted to meaningful values in order to save space (for example, “state” was allowed to take one value from the set “s”, “l”, or “g”, thus using only 1 byte instead of 255, the default for a “text” field type). When multiple records were needed per element for one subset of data, but not for the entire table (such as isotopes, which share most elemental properties except nuclear ones), a separate table was created and a one-to-many relationship was built between the main table and the new table. This strategy eliminates data redundancy and complies with modern database design principles.
The database is populated with almost all properties available for the elements. We used the most recent atomic masses published by IUPAC and compiled everything else from data provided by the WebElements site. Using Java programming and XML code libraries such as JDOM,4 we can generate the entire table in XML by running a simple command. The only condition needed is accuracy of the data inside the database. The generated XML is guaranteed to be well formed and valid. This is a tremendous advantage over creating the XML from scratch. In addition, applications can be built to display data on demand and take advantage of the querying capabilities of the database to show only information of interest to the user. The XML exporting capabilities of Access 2007 are also being investigated. Exporting to other formats such as HTML or PDF is a definite possibility.
The project currently maintains a website5 displaying a periodic table that has links for all elements. Each link is a PDF file that displays one element, with a picture of an element sample if available, basic data about the element, crystal structure representations in JMol, and a list of facts and trivia about the element. All element sheets were combined together in one poster representing the periodic table and displayed in our school. Our next step is the implementation of a Java application that will display the periodic table in various ways depending on what information is requested and will generate the data in XML, PDF, or other formats upon demand.
We believe that a project blending database, XML, HTML, and PDF formats in one place is unique and represents a useful source of information about the elements, readily available to the chemistry community.
Many people contributed to this project. The bulk of the data entry and the Periodic Table poster was the responsibility of Jennifer Imel, a senior B.A. Chemistry student at Eastern Kentucky University. The original XML fragments were populated by nine undergraduate and graduate students who took my course. Most element slides available now on our website were completed by students in General Chemistry I as part of our initial project titled “Learn About the Elements.” Images displayed on the slides were provided by Fred Bayer.6 I am grateful to all who participated and continue to work on this useful project.
Daniel Tofan <[email protected]> is an assistant professor in the Department of Chemistry at Eastern Kentucky University in Richmond, Kentucky, USA.
last modified 5 June 2008.
Copyright © 2003-2008 International Union of Pure and
Questions regarding the website, please contact [email protected]