Murray-Rust, P., Rzepa, H. S., Tyrrell, S. M., & Zhang, Y. (2004). Representation and use of Chemistry in the Global Electronic Age. Org. Biomol. Chem. Retrieved September 9, 2006, from

Start Date: September 9, 2006
Finished Date: September 10, 2006

Notes: Direct quotations in italics; my commentary follows in bold.


  1. I read the article straight through.
  2. I read the article a second time and copy and paste quotations I think are important to my understanding of the work. Some of these quotations will end up in the article, but many will not. This is likened to the "notebook" approach I used before I had this crazy idea.
  3. I think about the quotations, and I add my comments in bold. Sometimes there is lag time between when I place the Q here and when I get a chance to go back and write about it.
  4. Since this is a "notebook," I don't concern myself with grammar and mechanics as I will when I write the proposal and article. I clean up as I go.
  5. I encourage the authors (if possible) and any other experts to look over my comments to make sure I am fairly and accurately representing their work.

Quotations for Explorations:

"Many communities (media, finance, music, government) are making rapid advances in conveying instant services or information. One coherent vision of this new environment is epitomised by Berners-Lee's "Semantic Web" (SW)1 where knowledge is instantly available and computers as well as humans can reason from it to make decisions."

This is where tech communicators fit in to the bigger picture; as more information becomes available, there will be a need for people to sort through the technical information. In fact, as the general public becomes more aware and interested in "science" made popular by TV shows like CSI, there will be a need for translators. Chemists don't have time to translate to the general internet surfer. More importantly, they don't have time to sit down with technical writers and review philosophy; in going to a tech writer, they trust that the writer has an understanding of the field. Most tech writers (most everyone) uses Google to search for information. If chemistry moves toward an openly accessible format, this information will be available to anyone interested. Without translation, the information can be easily corrupted.

" "[we need to] get the best information in the minimum quantity in the shortest time, from the people who are producing the information to the people who want it, whether they know they want it or not" (our emphasis)."

They place emphasis on the line "whether they know they want it or not," and this is an important distinction. Technical communicators need to know that while technical information may not be needed now. it will become crucial in the future. The fact that nothing about OSS (in any of the sciences) is published in TCR literature is proof that we aren't prepared to deal with the writing demands we will face as these folks need grants, manuals, or policy documents written.

"An impressive chemical use of the Internet and high-throughput computing was demonstrated by Richards and co-workers3. To quote from their site

'Anyone, anywhere with access to a personal computer, could help find a cure for cancer by giving "screensaver time" from their computers to the world's largest ever computational project, which will screen 3.5 billion molecules for cancer-fighting potential [...] over 2.6 million computers have joined the project with over 320,000 years of CPU power used ... Through a process called "virtual screening", special analysis software will identify molecules that interact with these proteins ... The process is similar to finding the right key to open a special lock - by looking at millions upon millions of molecular keys.'"

Imagine the potential if all diseases received the same treatment!

"This particular project illustrates the concept of the Grid, a linking of vast computational power for immediate use. In science this is anticipated by the construction of the Global Grid (and nationally in the UK, the eScience project) where instant access to trusted information and services is possible. The combination of the Grid and the Semantic Web is seen as culminating in a Semantic Grid, in which vast power and knowledge are combined."

As more information become available, the more need there will be on many levels to disseminate the information.

Historically, it took months for data to be extracted through human "harvesting" of data;
"With the technology and access to data available at the time, it could typically take several months to extract sufficient information for a single system."

"The chemical community needs to be able to operate on a wide range of problems without having to engineer each of them separately; in effect there is a need to incorporate semantics and ontologies into a generic set of tools for this purpose. Here we suggest that the Semantic Web can provide such an infrastructure."

**If such a system is created, post-grant documentation will require working knowledge of how the system works.

    • "It will shortly be possible to request a machine to discover existing knowledge or services and make appropriate transactions to obtain these, including security, trust, and metadata in a robust and efficient fashion. Its adoption will depend on "what there is to discover" and how valuable it is. We have variously argued10 that chemistry is an almost ideal discipline for transition to such a next generation of informatics infrastructure; a Chemical Semantic Web. This in turn would be supported by domain-specific de facto standards such as the CML (Chemical Markup Language) family."

I was interested in fully understanding this concept (wthout a working understanding of chemistry, itself). I am most interested in knowing that chemistry is the "an almost ideal discipline for transition to such a next generation of informatics infrastructure; a Chemical Semantic Web."

"In this article we argue that primary publications in this and similar journals should form a major substrate for such a chemical semantic web."

"The Semantic Web and the emergence of Grid computing involve a qualitative change not only in the way that we manage information but in the way that science is carried out. We see computers becoming an integral part of the scientific process in many ways..."

This is an interesting concept. A simple Google search yields much information now; imagine the possibilities!!!

"Scientific information can be corrupted during the publication process"

I liked this statement and believe it is true in many disciplines.

"Using the OSCAR publication checker14 (which analyses data for self-consistency and acceptable ranges) it has been shown that a very high proportion of articles in synthetic organic chemistry contain at least one dubious data value."

**This is interesting.

    • "Much chemical information is imprecisely defined. Scientific units are often omitted (e.g. in computational chemistry log files) and information can be interpreted differently by different readers. XML provides a basis for validating information."

Again, this was interesting and helped me understand the overall concept.

"A very high proportion of chemistry is potentially re-usable for scientific discovery, in several ways"

"Most chemical data is never satisfactorily published due to the impedance of conventional processes. We estimate that less than a fifth of primary data ever has the possibility of being re-used."

These two statements are interesting. The idea of sharing information so that failures aren't repeated is cost effective and time saving.

"The programs used to compute molecular properties almost invariably have manuals of several hundred pages, with an almost infinite combination of possible program options. When developing our automatic protocol (below) one of us (PM-R) made "elementary" mistakes (e.g. not requesting the program to use RAM instead of disk, and using an unnecessarily expensive and outdated method). We therefore asked the priesthood of computational chemistry (e.g. HSR) to devise a protocol which was automatic and more believable by the community. This process of developing and formalising the protocols allows for re-use by subsequent novices, hence providing for more (self)consistent data where outliers and trends can be more easily (and automatically) identified."

Often, these manuals are written by tech writers in conjunction with scientists.

"In this article we encourage chemists to develop a shared vision whereby information is communal and accessible. It is important to realise that all information is potentially valuable and that the producers may not realise at the time what their descendants will require. We argue that the technologies and protocols presented here can be implemented at marginal cost within the publication process, if the community desires."

This is an excellent call to action.

"In principle a "chemical Google" could be extremely effective and completely change the basis of chemical information management; in practice there are substantial cultural and technical barriers. We explore these in this article and urge all chemists (authors, editors, readers, examiners, funders, businesses and agencies) to consider how a change in practice could lead to much greater use and re-use of chemical information."

This statement is crucial to my research, as tech writers are often the ones writing funding proposals with chemists. There are many "cultural and technical barriers" to navigate, but, if the data is accurate, the real work will be developing outstanding funding and policy proposals.

"At present the primary chemical literature is not openly accessible on the Internet. There are currently 33 chemistry journals cited by the Directory of Open Access Journals20 as Open Access, and none of them are currently major publishers (e.g. from G8 nations). In a dissenting opinion, the American Chemical Society has argued21 that:

'The open-access movement's demand that an entirely new and unproven model for STM publishing be adopted is not in the best interests of science.'"

Clearly, the ACS does not support the opening of scientific data because they see a loss in funding. But, it is interesting to note that this is the position the ACS takes on OSS.

"Many chemistry publishers also currently prohibit the public self-archiving22 of "fulltext", preprints or postprints."

That stinks. OA is an excellent opportunity to share work.

"In this article we restrict ourselves to a plea that all primary chemical data be made openly available at time of publication. We emphasize "data" since "facts" are not copyrightable under the Berne convention, and primary publishers have little incentive or success in publishing the complete data associated with an article. In fact the current publication process is a dis-incentive to publishing experimental data. It is also notable that most supplemental data is not in re-usable form (often being found as Word or PDF files or as scanned images). In the case of crystallographic data it is often only available from the (non-open) Cambridge Crystallographic Data Centre. This results in further restrictions on access and re-use."

"We therefore argue for publication by the author of data under Open Access protocols to a public or institutional repository. We appreciate that this change will take time, and involves investment in technology."

Both of these statements address the copyright concerns many face. Arciving in repositories, blogging, etc. are all easily tracked for authenticity.

"We ask that publishers confirm that no copyright is violated in the extraction and reuse of factual information by robotic methods where the user has legitimate access to the information."

"We believe that the motivation behind the deposition of supplemental data is to make it available to the community for re-use, but that many authors do not realise the concerns of copyright. We ask that publishers confirm that their supplemental data, whether held by them or by a third party is freely reusable by humans and robots."

These two statements encourage publishers to support OA.

"In some cases it is not clear whether the supplemental data provided (e.g. by a publisher or data aggregator) is the original author's or has been creatively enhanced (e.g. by editing). We ask that publishers make it clear whether the changes have taken place, what their nature is, and if so to provide a copy of the author's original data for re-use."

"We also suggest that authors add a declaration in their manuscript and/or supplemental data that the data is freely readable and re-usable by humans or robots. We expect that The Creative Commons Science Project24 is likely to provide useful protocols."

I am interested in the Creative Commons Science Project, as I believe this will dampen the concerns of copyright.

"We ask that publishers have a policy to allow known robots from the scientific community to access, index and extract publicly available facts from their sites."

"While much bioscience is published with the knowledge that machines will be expected to understand at least part of it, almost all chemistry is published purely for humans to read. This is compounded by the current business model in chemical information where authors do not deliberately publish information to be machine-understandable. With Chemical Abstracts and Beilstein, the traditional sequence of author⇒primary publisher⇒secondary publisher and the resale of data leads to an expectation that chemists will pay others to curate and collate their information"

"We reassert that chemists must now move towards publishing their collective knowledge in a systematic and easily accessible form for re-use and innovation"

This is a great statement and will, likely, be the statement I use to discuss the work of OSS.

"We urge that authors, funders, editors, publishers and readers move further towards the following protocol:

All information should be ultimately machine-understandable in XML. Openly documented and reviewed XML data-centric languages include XHTML9 (for running text), CML11,12 (for molecular identity, including INChI, 2D structure and properties and 3D structures included when available), AniML35 spectral and analytical data, STMML36 for scientific datatypes and units and CatML37 for managing catalyst information. In addition ThermoML38 can be used for physicochemical data.
Machine understandable information for a compound should include a connection table, the IUPAC unique identifier (INChI) which guarantees that the connection table can be checked and regenerated, and a name (although in principle this can be generated from the connection table, it helps to check consistency and trivial names may also be used). Where available, information about physical nature of the compound, scalar analytical quantities (melting point, refractive index, optical rotation), full real-domain spectra (i.e. "continuous" data) for appropriate nuclei, and vibrational spectroscopy, high resolution mass spectrometric data and elemental composition and aggregate formula should also be included.
Rights metadata. An explicit statement in the data that its re-use is consistent with the Budapest Open Access initiative and a requirement that this statement be preserved when the data is re-used."

I like this manifesto.