PMRblognotes

=Peter Murray-Rust's a Scientist and the Web= =Blog Notes=

Rust's a Scientist and the Web Blog Notes

Overall Notes: This is a relatively new blog (September 1, 2006), but it is crammed with excellent information about the Open Data movement, and it is quite active. For some reason, the spacing is all sorts of crazy here, but I am not going to worry about it (in the spirit of note taking).

[|September 1, 2006] He establishes the purpose of this blog: Points: "This blog will cover a wide range of topics that are mushrooming on today’s web and which will change the practice of science." I am most impressed by this motive. The notion that the web "will change the practice of science" is an important issue for technical communicators to explore. As such, it impacts how we serve scientists as policy advocates. Topics he expects to cover include: "The relationship between human readable material (”full text”) and scientific data." "Web inspired technologies should revolutionize scientific communication." I am struck by this statement because I see the connection between technology and rhetoric. As presented in some rhetorical texts (van Kammen, for example), USERS are often left out of the loop for advocacy, as special-interest groups often dictate policy through advocacy, Technology, as it applies to chemistry, allows the USERS to dictate the language used to communicate the process. This will allow the users to generate their own ability to advocate. "A particular interest is the development of the “robotic amanuensis” for scientists - personal software which can help individuals read and publish information effectively." "Open data, open source, open access, open knowledge." This statement is engaging because it does not rely on the assumption that humans, alone, will generate information. Technical communicators focus on the discourse exchanged by human agents, and science is moving in the direction of mechanical computation (computers will engage in the "discourse"). This is an important realization because it shifts the rhetoric of chemistry from the data (itself) to other topics. Automation will free up some "space" for chemists to communicate about other things. "Unless we have free access to the primary outputs of science we are denied the opportunity to develop new ideas in informatics-driven science." Again, this is an interesting point. "Free access" does not simply mean free of charge but free to access. Prohibiting access means that data can not be shared in a timely way and, thus, "new ideas" can not be formed. "I have argued publicly that primary scientific data belong to the scientific commons and that they must be free." I like the expression of the "scientific commons" here, as I believe that this term indicates the collaborative nature of the work involved. "A corollary is that the output of funded science is not just full-text but the complete supporting information environment of the experiments." This is a key idea for technical communicators to understand. The funding process, at least in the US, is not just about the final product any more; it often requires OA out put, as well.

>> >>>> "“Programming for scientists”." "Modern scientists are enhanced by “information prosthesis” - the ability to receive and repurpose information. If they are able to “program”, they have greater expressive power." I like this statement because it deals with the practice of chemistry. To have "greater expressive power" scientists need to be able to use and modify information. >> >>>> "Many of the future skills will not be with conventional programming languages but the tools emerging from the explosion of social and technical operations in today’s web." This statement, again, indicates the power of web technology as it applies to the communication of scientific data.

>> >>>> "Markup languages in (physical) science." "Currently there are a few main approaches for content: MathML, GML (geography), Scalable Vector Graphics, Chemical Markup Language, AnIML (analytical chemistry), ThermoML (theorchemistry). There are many obvious gaps and I’ll suggest guidelines for any person or group interested in building a language." This statement helped me understand the larger picture of markup languages. CML is the one used by the chemists I have talked to in working on this project. >> >>>> "creation and management of virtual communities." "I’ve been involved with creating and nurturing communities for the last 15 years including BioMOO, the Virtual School of Natural Sciences, XML-DEV, and now the Blue Obelisk. I also believe strongly in Wikipedia and related efforts. I’ll review the features of successful communities and the guidelines for growth." The idea of "virtual community" is one that technical communicators will need to understand in order to be useful to this community. >> >>>> The post also discusses the terms of sharing within the blog: "We honour copyright, but ask that posters make there contributions available under Creative Commons. This allows the posters to retain their moral rights, but allows us to re-use the blog (including their contributions) for other purposes if required (e.g. it might be revised for supporting information, tutorials, etc.) We will always attribute posters." I think this is an important statement, and it allows everyone involved to understand the nature and scope of Peter's work. [|Saturday, September 2, 2006]

This post is titled "Is Openness 'Ethically Flawed'"? This post is fascinating, as it cuts straight to the discourse most relevant to scholarship in technical communication. Peter generally begins his blog with a comprehensive position statement (which is great from my perspective, as I know what the post will be about). >> "I have been interested in Openness for many years, and believe that knowledge and science can now only flourish in an Open environment. I believe that close commercial interests (publishers, aggregators, software developers and industrial customers such as the pharmaceutical industry) stifle innovation in information-driven science." I am most interested in the terminology used here to frame this position. The concept of "openness" is interesting, and, as further posts explore, is not a universally defined term. The philosophy of sharing, or the spirit of sharing, seems to be consistent in the "open" community, but the practice of sharing varies by participant. This is indicated by the statements "science can now only flourish" and "closed commercial interests...stifle innovation." It appears that most members of the "open" community agree with statements and the discussions center on how to share. "IMO that is why biosciences, with an Open ethic are about 10 years ahead of the chemical sciences in their use of information." This is an important distinction and comparison to other fields of science. I am focusing on the chemistry community because I see, through simple web searches, that the chemistry community lags behind the biological sciences. "OSCAR can read a complete chemical paper in a few seconds and analyze the data for errors. It picks up those that have been missed by the author, reviewer and technical editor and there are almost always some in every paper! FWIW OSCAR (Open Source Chemical Analysis and Retrieval) was written by 4 undergraduates and if you are interested (and have access to chemistry article - which are almost all closed)" Peter (like Jean-Claude) has an excellent way of communicating complicated chemical information to a non-chemist population (like me). >> >>> What is interesting about this post is that undergraduates created OSCAR. This demonstrates that the world of data driven chemistry isn't about the "careful" lines between graduate and undergraduate levels of chemists - all can participate and contribute in the process. The same is true in Jean-Claude's lab, where graduate chemist Khalid works in collaboration with undergraduate James. "But as I am pushing for radically new ways of doing things, my stance will sometimes be strong, as in the current post where I take issue with Peter Gregory’s comments on Open Access publishing in chemistry." Peter's honesty about his position is great. What I like most about this community is the way in which they engage conversation openly and honestly. I believe that Peter sets the stage for this type of communication because he reiterates, a few times in several blog posts, that this community needs to be open and honest. This is different, I think, from general communication in chemistry where posters may be afraid to be honest because of tenure, promotion, etc. issues. Peter also makes a point, several times in various blog posts, to make it "OK" for people to change their positions about topics through exploration. He allows the flow of communication to retain fluidity, and does not pretend or foster a "gospel" approach. I see the same position in Jean-Claude's work. He, too, approaches communication openly and solicits responses to his work. This is refreshing in academic communication, as it cuts out "competition" in favor of collaboration. >> >>> "Gregory believes that the open access author-pays model is ‘ethically flawed’, because it raises the risk that substandard science could be widely circulated without being subjected to more rigorous peer review. This could be particularly problematic in chemistry, where rapid, open access publication could be used to establish priority ahead of more >> time-consuming patent applications from rival groups, he added." This is a post about Peter Suber's discussion of the Royal Society of Chemistry’s position on OA chemistry (OA movement). Peter MR will discuss it, as well, in this post, but I think it is interesting that chemistry is the target here and not the bio sciences where OA has been successful and "substandard science" does not exist. >> >>> "My campaign is for Openness in:" Access. "I am least vocal on this, leaving it to established champions such as PeterS, SPARC, Stefan Harnad, Steve Heller and many others. However I support the formation of Open Access in chemistry and would Endeavour to publish there is appropriate journals exist. (Before Chemistry Central there were no Open journals that supported chemoinformatics)." Peter makes it clear in other blog entries that he does not wish to confuse OA with what he does (OD), so he comments least about this movement and allows the experts in those fields (Peter Suber, for example) to take charge if it. His call for more "appropriate journals" to exist, and I think this is an important point. "Nature" exists in the bio sciences, and a similar journal would be great for cheminformatics (beyond Chemistry Central note that there are Open Access journal in other chemistry areas, like organic chemistry ([|Beinstein Journal of Organic Chemistry]) and Nature does not exclude chemistry - but note that Nature is not an Open Journal >> >>> >>>> >>>>> Source "in informatics the lack of Openness is a serious problem." This statement indicates the "soul" of Peter's work. He works, via presentations at ACS, to encourage the opening of informatics. Data "I believe that scientific data belongs to the commons, not to publishers or secondary aggregators which is why I supported the continuation of PubChem last year in its struggle against Chemical Abstracts." I like the term "commons" and I see that it is used other places, as well. The idea of ownership, or the philosophy of ownership is engaging. Who "owns" knowledge? Do we all own it? Do agencies that fund it, own it? These are question that have been asked for centuries, but now, with the advent of technology, we are able to share information quickly, easily, and fairly cheaply. Standards "Science is bedeviled by lack of interoperability, often promoted by software companies and instrument manufacturers to create lock-in and closed markets. That is why Henry Rzepa and I have developed Chemical Markup Language as a core technology for interoperability and why we are members of the Blue Obelisk movement." I am interested in the ideas of "open standards" as I think is the least defined term of the open chemists I have spoken to in this scholarship endeavor. Generally, it appears that "open standards" simply refers to sharing information via technology through methods that are accessible to the largest group of people.

>> >>> >>>> >>>>> "It is a major challenge to get these ideas accepted in any community (especially chemistry) and I’m happy to take this on. I’m prepared to be called foolish, unrealistic, and encounter prophesies of failure; to be ignored by the mainstream of the discipline. But I don’t like being called unethical." "I have also been publicly criticized on two occasions as being immoral in publishing Open Source programs in chemistry. The argument of the critics is that Open-ness undercuts responsible developers and destroys their market leading to loss of support for science and poor quality code. This may or may not be true, but I do not see it as immoral." I like Peter's tone here, as I think it is important in communicating his desire to share, openly, with a community that isn't wholly accepting or supporting of the philosophy of sharing. I find the same is true in education, where I have been called "stupid" for sharing all of my course materials for free via the web. Just because opponents don't agree with the philosophy of sharing does not mean that the practice is "stupid" or, in Peter's case, "foolish" and "immoral." That sort of rhetoric, the degrading type meant to degrade a philosophical position, is the same type which this research hopes to prevent. As we have seen in drug development, attention is paid to "issues" because of advocacy (Britt, Kent, Zavetoski, et al) - not because any one issue is more pressing than others. In the instances of Zavetoski, et a, Gulf War veterans were largely ignored until self-advocacy rattled the media; prior to their advocacy, the issues were disregarded as "stress" related, and the rhetoric was "cut down" as a result. By positioning open chemists as being "unethical," the open chemists are cut from the conversation, just as the Gulf War veterans, and, thus, their struggle to define themselves is made more difficult. This is why it is so important to do this project; good information about this movement is timely. "So I contend that Open Access, Data, Source and Standards are not unethical. There will have to be new - and untested - business models for scientific information. Some won’t work. But the whole impetus of the current web with mashups and REST will inevitably change the face of science, so we should start preparing. There is nothing intrinsically laudable in publishing scientific material that looks visually the same as it did 120 years ago." Again, this call to action solidifies the appropriate nature of the discussion of opening data in chemistry. The idea that "we should start preparing" is not unethical. That Peter recognizes that "the face of science" is changing is not unethical. "I have even been known to change my point of view in response to careful argument supported by facts." This comment illustrates what I mentioned above about the fluidity of Peter's approach.

Comments Since this blog allows readers to comment, it is important to examine their comments in conjunction with the posts: Heather Morrison (OA advocate) "There is plenty of room for more advocates, in my opinion, particularly in chemistry where awareness about open access seems to be less than in other disciplines. If this blog post inspires others to speak up about open access - new OA voices are most welcome!" It is interesting that Heather (a non chemist) also sees, as I do (a non chemist) that OA is less accepted in chemistry. Jean-Claude Bradley (OSS/ONS) "I am not sure how effective Chemistry Central will be to promoting truly Open Access chemistry with an author fee of $1350/article. I think that true Open Access only works well when the barrier to publication is as low as the barrier to reading. There are some models for this in chemistry right now, like the Beilstein Journal of Organic Chemistry. Otherwise self-archiving (on blogs and wikis for example) is probably the only truly zero-cost general full Open Access model out there now. That’s why we’re using it to publish all of our lab data." Jean-Claude's post is important because he discusses the availability of OA chem journals. He also recapitulates his position on the use of the wiki-blog model for reporting chem information. [Peter's response to Jean-Claude] "I also agree about archiving data in public and applaud your efforts. We have done this for 250, 000 molecular calculations (see http://www.dspace.cam.ac.uk/handle/1810/724) and are now developing a data archiving system for chemistry (SPECTRa - http://www.lib.cam.ac.uk/spectra) which is supported by the UK’s JISC. We’d be very happy to share experiences - so far our questionnaires have revealed that 90% of the problems are social, not technical!" I am fascinated by the exchange of practice-related information, but also the idea that the problems are social and not as a result of bad technology. Jean-Claude, Peter, and others exchange practice related information as a result of this thread - demonstrating the ability for the technology to foster fast communication (discourse/rhetoric). [|Sunday, September 3, 2006]

While this post mainly communicates technical information, I think there are a few helpful things here that help me to understand how the process works. Mainly, this section is useful to me To preserve the spacing, I am italicizing it" > Coenzyme A is a fundamental biochemical in almost all organisms and will form part of any biochemistry degree. It is therefore not a rare or contentious substance. >>> PubChem is the NIH’s Open collection of chemical and biological information related to their Molecular Libraries initiative. It contains information (not samples) of about 5 million compounds. The information is not peer-reviewed and PubChem gratefully accepts contributions of information from many sources including suppliers, publishers, and researchers. SciFinder is a tool/service created by Chemical Abstracts Service. I do not regularly use it but my colleagues do, after debate as to whether they could afford it (I do not know prices but it costs a lot). I believe it contains about 25 million compounds though many of those are biological sequences. Beilstein is a commercial supplier of chemical information and has, I believe, about 6 million compounds and associated properties. Again, since I don’t use it, I can’t give figures. The CAS-RN is a unique ID for each chemical substance created by Chemical Abstracts on which they claim copyright. It is very widely used as a universal identifier and many sites (but not PubChem) will list the CAS number. Whether this has been agreed with CAS in individual cases is not normally known. PubChem and CAS were in dispute last year, with CAS lobbying the US congress to limit the activities of PubChem. Note also that the answer is not immediately clear (this is not unusual in chemistry as there are some subtle qualifiers). PubChem is free. [|CAS charges] $6.00 to non-subscribers for the information above. Beilstein will also charge.) These are helpful notes for non-chemists. [|Wednesday, September 6, 2006]
 * (Note: I have included institutions to emphasize the quality of the correspondents. Non-chemists need to know that:

This is one of Peter's shorter posts, but has some good information. "Very pleasant visit from Alma Swan - guru and expert in Open Access. We actually talked about Open Data - how data in scientific publications can be marked up semantically, published, archived and reused. We are doing a lot of this at present - see reply to Jean-Claude Bradley." I think he is referencing the previous exchange between him and Jean-Claude. I like the concise terminology used here; "scientific publications can be marked up semantically, published, archived and reused" "I’m talking on Sunday at the American Chemical Society on “eChemistry”. eScience - the Grid - cyberinfrastructure - has a lot of interest and support in almost all disciplines - physics, bioscience, medical, geoscience, astronomy. But not chemistry. Why not? I’ll be exploring these ideas in future posts." I have found that chemistry does, at Peter suggest, lag behind the other science disciplines. I wonder why this is true. Jean-Claude has said that chemists tend to be more "conservative" than other disciplines, but I wonder if any of this has to do with perceived competitive strategies. For example, are chemists afraid of losing funding if they share? Clearly, no one is losing funding in other disciplines. In theoretical work by Harding, she suggests that these strategies a hold-overs from cold war implications (sharing via socialist structure is bad). I wonder if that mindset plays any role, as most of the tenured PIs (senior faculty) are of the Cold War generation. I will pose this idea to the BO group and see what they think. [|Thursday, September 7, 2006]

The name of this post is "Open Data, Open Science. Closed Data, ..." Peter was preparing to speak at the ACS conference, and this post nicely summarizes his position on "eChemistry" - a term I like very much. "By eChemistry I mean more than simply compiling in-house data and running programs - I mean semantically enriched chemistry that machines can help to process." The goal for many open chemists to share data via technology. >> >>> Although he doesn’t directly link eChemistry and eScience, I assume, from the way he uses them, that they are different principles, but what is true about one is true about both: "The single fundamental requirement in eScience is that there is shared data. Ideally this should be semantic, and that’s a challenge, but at least it should be there and shared." The idea of "shared science" seems reasonable, and yet many resist. >> >>> He then speaks of chemistry directly: "In chemistry there is virtually none [sharing of data]. What there is has almost all come from bioscience (e.g. NCI and PubChem) and some of the US government agencies. However mainstream chemistry is totally uninterested in sharing chemical data and when it needs it expects to have to pay private sector providers. As a result innovation in eChemistry and chemoiformatics is stifled" This is strong but appropriate language. Peter's work, Jean-Claude's work, IS stifled if only a handful of chemists deny innovative approaches to sharing data. >> >>> "What we have done - and what I shall be reporting is to spider all the published crystal structures from journals that allow this. We haven’t spidered the ACS because they stamp copyright on the factual data deposited as supplemental and no-one except me and Henry has challenged this. Personally I regard this as illegal and certainly unacceptable but while communities like CHMINF-L accept this there is not a lot that 2 individuals can do other than make a fuss." This portion is exactly the reason I selected this topic. Alone, Peter and Henry have been the advocates for Open Data and Open Standards. Jean-Claude has been the advocate for Open Notebook Science. Alone, these three are limited in their ability to advocate. As we see with other advocacy studies (Britt, Zavestoski, et al), advocacy has to take place on several levels in order to establish legitimacy. >> >>> "So IF we had the structures from synthetic papers the problem would be solved." The question of "if" is important here. >> >>> "This is because chemists write in unnecessarily convoluted language: >> “To a solution of X was added 3 g of Y”. which is equivalent to “To my dog was donated a bone by me” (instead of “I gave my dog a bone” which is the sensible way). If we wrote “I added 3 g of Y to X” current grammars could parse it but this absurd mandating of the passive makes it a lot harder and we have to write a passive chemical grammar. But when we have cracked it, then we should be able to extract reactions from full-text." This is an interesting example. The discourse in chemistry (concerning experimentation) is fragmented, and cheminformatics standardizes the language. >> >>> ".It’s a perfectly sensible question and very exciting, but be prepared for disinterest and opposition from most of the community. We’ve been collaborating with Indiana on the use of a distributed OSCAR system and there are lots of areas where other people could help as long as they don’t mind working with Open Source." I think it is interesting that Peter prepares this reader for the "disinterest and opposition from most of the community." This speaks to a need for advocacy.

[|Thursday, September 7, 2006] (2)

This is an excellent post titled, "Open Source, Open Data, and the science commons" I like Peter's premise "In this post I content that the chemical information cycle is broken - to the detriment of the chemical and general commons." This is a nice summary of what he discusses. >> >>> "Robert Terry, Welcome Trust, is widely known for his advocacy of Open Access. As many of you know from next month if you are funded by Welcome you MUST make your publications Openly accessible. If your publisher doesn’t allow this, that’s your problem." This is an important reminder, but also indicates the movement of funding sources and their desire to open research to the public commons. >> >>> "Essentially his argument is that funders support scientists to do research. The results of this work are then given (i.e. copyright assigned) to publishers who get peer-review donated by the scientific community and then restrict the dissemination to readers who are able and prepared to pay. The wealth flow (which includes both money, informatics goods, and services) is a net drain FROM the funders TO the shareholders of the publishers." Yes >> >>> "The diagram has changed with the green arrow showing the flow of goods back to the commons. The cycle is complete: funders support science; science is published into the commons; the commons can be seen by the funders who can demonstrate the value of their contribution; and the new goods inspire the next generation of science." Exactly >> >>> "Can we apply the same sort of logic to software and data in science? Again we need a cycle or the producers end up subsidizing other parts of the chain. In bioscience this can work. Although there is a considerable problem in any science in supporting data and technology there is direct funding for databases and software. I have drawn 2 cycles - one for software, the other for data. The funders support science with a partial provision for the development of tools to support it. They require that the tools and the data are made available to the community. In this way the cycles are closed and there is a flow of goods back to the commons. Because of the central role of data in modern science, funders may also directly support databases. This is not easy, and it’s expensive but it still seems to happen. In any case the data are Open." Interesting >> >>> "My key contention is that these communal resources give rise to innovation in both the science and the technology. For example there is exciting research into the semantic web in life sciences because there are data on which to experiment and develop methods." Yes >> >>> "In contrast the flow in chemistry is broken. I have omitted the funders from the diagram but there are very few projects where major software or data has been mandated as Open by the funders. I’d be delighted to have examples. In practice almost all software is commercial and unresponsive to the needs of the science commons. The major market for both software and data is the pharmaceutical industry which pays billions to major information suppliers. This biases the flow so that only crumbs return to the commons. It’s actually worse than zero because if a commercial offering exists there is no motivation to build one in the Commons. So innovation is stifled." This is an excellent thought, and it speaks directly to how things are done in general chemistry. >> >>> "Rich Apodaca in his Blue Obelisk post had a nice quote from the editor of J. Chem. Inf. Comp. Sci. in 1984 urging that chemoinformatics be a reproducible scientific discipline. Unfortunately this is impossible with the software and data models we now have." Rich is an advocate, and I will be looking at his blog next. >> >>> Comments: From Peter to Richard Akerman; "the great thing about a blog is that you get >> >>> feedback in a way that never happens in most paper journals." This is consistently true and is something Jean-Claude talks about, as well.

>> >>> >>>> "I think Open Data is much less well understood than Open Access. In some communities

>> >>> this isn’t an issue - data get published automatically and authors accept this >> >>> and even welcome it. In some others possession of data, by authors, is seen as power >> >>> and is not released willingly. In others - such as chemistry - data is aggregated >> >>> by commercial interests and resold to the community that created it." >> >>> Peter is correct in that there is confusion between OD and OA and other O initiatives.

>> >>> >>>>

[|Thursday, September 7, 2006 (3)] This is an interesting post about "mash ups" - a term I had never heard before I read this post. It is the combination of two disparate web sources to create a combined source of information. "Unfortunately we have to have access to data sources before we can create mashups. In chemistry there were virtually no data sources before PubChem. Now that PubChem has survived the troubles of last year we can start to create mashups - and we’ll be showing some quite soon - based on InChIs. As we liberate more data sources in chemistry, expect some really exciting things to happen." I like the language of "liberating" data sources. [|Friday, September 8, 2006]

This post talks about the community of sharing chemists in preparation for the ACS conference. The point of Peter's presentation was >> "At the end it should be clear that there is enough technology from the Open Source community to take chemistry into the 21st Century." I like this forward-thinking ideology. [|Friday, September 8, 2006] (2)

This is an excellent post detailing Peter's predictions of open data in chemistry. "Chemical informatics and information is broken. It’s expensive, lossy, out of data and restrictive. There is virtually no innovation and no obvious understanding of how the web is changing. I don’t think the future Web (”Web 2.0″ or whatever current acronym can co-exist with the closed, inward-looking chemoinformatics community which supports the closed world of pharmaceutical research." Peter uses the terminology "broken" a lot in his posts. The cycle was "broken" in a previous post; the information is "broken" here. The question, of course, is, has it always been broken, or does the technology that can be used now (and couldn't before) weaken the cycle or chains? >> >>> "Unless current providers of information and software, and purchasers of these services (pharma) change rapidly there will be a split." I am interested in learning more about this split. >> >>> "The new informatics will be characterized by: biosciences and some sciences adjacent to chemistry (perhaps geosciences) We see this in OA >> >>> >>>> funders who aggressively promote Open Access and require their grantees to make their output universally available In the states, we see this with the NIH, and, I suspect, the NSF will soon follow >> >>> >>>> data providers who wish to build mashups - especially multidisciplinary, combined services, and autonomous processes. the young-at-heart generation who espouse Wikipedia, folksonomies, and social computing. Expect to see a lot of semi-formal semi-voluntary reviewing of information resources such as PubChem and Wikipedia This is the generation of undergrads and grads that I am most concerned about.

>> >>> >>>> a growing Open Source community based on the Blue Obelisk mantra of Open Source, Open Data and Open Standards Yes, I believe this is true as others outside of chemistry (Heather Morrison, me) work toward promoting OA and like minded concepts (OD)

>> >>> >>>> publishers with the foresight to see the new opportunities and the value of new products and services" These are predictions Peter makes for the next five years: "Wikipedia Chemistry will be more accessed than the Merck Handbook or general chemical textbooks I know that this is true n other disciplines. In English, wikitionaty is now used more than the standard Oxford English. >> >>> >>>> Students will bring PDAs into lectures (if they even bother to go) and point out when the lecturer makes mistakes This happens now :-)

>> >>> >>>> machines will be able to answer some first year chemistry exam questions Interesting.

>> >>> >>>> machines will roam the Open chemical semantic web mashing data against bio- and geo-sciences. Very Interesting

>> >>> >>>> PubChem will be more accessed that Chemical Abstracts. Universities will cancel their subscriptions to the latter, which will be increasingly oriented to serve the pharma industry This is an interesting prediction. I suspect a fight will happen as a result. As academic libraries cancel subscriptions, the journals are going to be thrown into a frenzy.

>> >>> >>>> chemical linguistic robots will read Open Chemical papers on behalf of the community and extract data, give guidance on what papers are worth reading, build personal chemical memexes, etc. Sheesh. I wish they could do this in English.

>> >>> >>>> mashups of Open crystallographic data will become universal and except for historical data searches replace the crystallographic databases. Here are some questions Peter raises about the future: "will the pharma industry continue in its closed approach to information? If it is to be information-driven it has to develop and open supply chain for multidisciplinary information and services will the major publishers react positively? will Google enter chemistry? I’ve been invited to Mountain View next week - very exciting. I expect to get a very different type of audience from the ACS - probably no chemists but many excited young web hackers. Google and the new technology could dramatically change chemical informatics."

[|Friday, September 8, 2006] (3)

This post talks about the Blue Obelisk "movement" (likened to the "Transcendental Club" really). Overall, I like Peter's language because it is direct: "Chemoinformatics and much chemical computation is seriously broken. The formats are 30 years old, the producers compete against each other, and there are no validated data resources, programs and no communal agreed knowledge. Each producer sees themselves at the centre of the universe and caters only for their own requirements, leading to a forest of “stovepipes” in the antipattern jargon. There is no sign of positive reaction to the developments on the web. Neighbouring disciplines such as bioinformatics sigh meaningfully and then go ahead and create the Open chemical resources they need." Again, Peter uses the word "broken." >> >>> This post indicates the level at which "old information" is not progressing because of competition. It is interesting that Peter points to other disciplines that have to create what they need (chemically) in order to function. >> >>> "Chemical software used to be free. It wasn’t interoperable, but that is because machines weren’t." This is interesting, and is something I did not know. >> >>> "That’s all changed. First the computational chemistry codes (quantum mechanical), then the chemoinformatics and molecular graphics ones were bought up by warring software companies in the 1908s. I was on the custom side, in pharma, and I’ll write more later. But everything became closed. One company threatened to sue customers if they revealed its file format…" I assume Peter means the 1980s - which is VERY interesting. In looking at Harding's work, these dates are consistent with the Cold War shut down of sharing knowledge for fear that other global scientists would be the first to go to the moon, send a satellite into space, etc. >> >>> "This mess persists. But about 10 years ago a number of small initiatives took place to create Open alternatives - a real labour of love because they were generally not innovating, but playing catch-up. They weren’t taken seriously. For the most part they still aren’t. But it’s changing. There is now a critical mass of developers in mainstream chemoinformatics - not enormous, but sufficient to create a usable, useful system. That is growing rapidly. I guess there are over 1 million lines of Java code, and the same in C++. Yes, we have to duplicate codes for platform reasons, but it’s a good things to have a few alternatives." Again, the timeline is interesting because 10 years ago more or less marks the end of the cold war. However, the prevailing ideologies of seniors in the field, those entrenched in McCarthy era or Cold War ethics, prevented the new technology from being "taken seriously" as Peter indicates. I also like that Peter supports redundancy, as this is something that Jean-Claude also supports. The more consistent and repeated the information, the more likely it will "stick." >> >>> "We discover each other by cyber-methods - mailing lists, IRCs, etc." This is certainly true, as this is how I got connected with this group. As a non-chemist, I would have had no way to connect to this body of knowledge before technology.

[|Sunday, September 10, 2006]

This post was about my first contact with Peter about this project. It is interesting to see how the scope of my project has changed and narrowed as a result of our communication. "I replied to Beth saying that there had been very little scholarship done on OS in chemistry as most people in chemistry think it’s a bunch of idealists and “student hackers”. “If it’s free it can’t be any good” and they ritually pay kilobucks/year for mediocre software." This is, interesting, as advocacy in the academy has, historically, happened in journals (either open or closed). This idea that "free is bad" is interesting, and it surfaces in other places like OA. So, my question, of course, is, who besides Peter and Jean-Claude, champions that "free is good"? [|Sunday, September 10, 2006] (2)

This post is interesting because it talks about how PDF files are terrible for the sharing of information: "PDF is one of the greatest disasters in scientific publishing" He discusses the politics of the PDF format, which was interesting to me because I had never thought of the politics of a formatting style. But, in reality, the PDF format prevents access. The same is true of PPT. >> He then discusses the practical issues associated with not being able to extract data. This is interesting because it distinguished philosophy (politics) from practice. [|Monday, September 11, 2006]

This is a shorter post about Peter's presentation at the ACS. Here is the portion I liked: "Some slight movement in some areas but on the whole I sense a major split between (a) pharma + software companies + commercial data bases (CAS, CCDC, etc.) and (b) the next generation of technology and social computing. I was pleased to see that - in a show of hands - ca 35-40% said they had read this blog - even though it’s only 10 days old." While Peter wouldn't claim to be the "head" of this movement, it is certain that he is. In other communications I have had, people say, "you need to ask Peter" and they either mean this Peter or Peter Suber. In another post the blog [|Open Dot Dot Dot], the writer says, "Peter Murray-Rust has an interesting [|post] on the concept of open data, its (short) history and its present status, with some good links." If linking to blogs is an indication of leadership in a movement, then both he and Jean-Claude represent these leaders. >> >>> >> >>> >> >>> [|Monday, September 11, 2006] (2)

This is an interesting post about OA publishing and the new language used in communicating knowledge in chemistry: "There has been a major shift in how (some) Scientific Publishers see the purpose and practice of scholarly communication. Listening to the words used, “database” has replaced “journal” and “users” has replaced “readers”. I suspect the latter word conflates “purchasing officers” with “readers” into an unhappy anonymous entity. This is an excellent observation. The shift in language is important, as I believe it de-personalizes academic knowledge. >> >>> Moreover there is a tension between the publisher and the users - significant content is illegally downloaded and an important role of the publisher is acting as “policeman” making sure that content is not stolen." Again, this is an interesting observation and ties to idea of who owns knowledge and who do people THINK should own knowledge? >> >>> "Now, I have never advocated breaking or abolishing copyright, but it is clear that this is creating a tension in the publisher/reader community. I’ve been involved in setting or being on the board of scientific journals and I see their major purpose as enhancing scholarly communication. I’m worried that we are losing sight of this, where journals in non-profit organizations are seen as a way of subsiding other activities of the society. If the publishers see “users” as a group who have a major motive to steal content, I suspect things will get worse." True. >> >>> " At some stage we seem to have flipped from a community where publishers interpreted the wishes of the community and served them - for a reasonable fee - to a world where publishers make the rules and police their non-compliance. This tension is worse with the advent of OA, I believe. >> >>> Did anyone in the reader community: actually ask for journals to be transformed to databases? Good question. >> >>> >>>> actually ask for content to be limited in time to the duration of a subscription (we used to have physical journals we could take home and even hand down to our descendants or give to needy institutions)" Hmmm. That is interesting. Payment doesn't mean ownership; it only means access. >> >>> >>>> "It worries me that this has happened almost silently. I remember in ca. 1970 (when I was too inexperienced to notice) that authors were asked to transfer copyright to publishers. These requests came from trusted societies - national societies and international unions (At that stage there were essentially no commercial publishers - Pergamon was a few years later). I didn’t think twice about it - but it was one of the biggest mistakes of my scientific life. Are we sleepwalking into something just as serious?" Again, this timeline fits with the ideas of the cold war shut down. >> >>> "Objectively I have some sympathy with publishers whose content is illegally downloaded - I do believe in copyright. But pragmatically is the way forward to be increasingly draconian with readers (sorry, users)?" This is an interesting perspective!

[|Tuesday, September 12, 2006]

This post is titled "Open Data - the time has come." It is a useful post for an outsider like me. "The term “Open Data” is now becoming commonly used and we (Blue Obelisk) are trying to define it (our mantra being ODOSOS. Open Data, Open Source, Open Standards)." I think it is important that this community tries to define this term. >> >>> "It was not commonly used two years ago although the concept is general enough to have been important. In the last 12-15 months there has been a lot of use, particularly in the techie web logs and meetings. The idea is potentially very much broader and looks set to become very important." This demonstrates the change in discourse. >> >>> "The earliest references I can find are: >> [|Jim Kent on the human genome.] An [|Open Data Consortium] was founded in ca. 2003 seemingly concerned with geospatial data. Simon St. Laurent gave a [|presentation without date] but it looks a few years back. It has a strong XML flavour. I became concerned about Open data in ca. 2003-2004 and Henry and I published a [|Manifesto for Open Chemistry] in 2004. I followed these up in 2005 with several mails.[|(example)] presentations to JISC, OAI, STM Publishers, etc. where I used the term “Open Data”." "Late in 2005 SPARC set up an [|Open Data list] with me as moderator. Science started in >> [|Dec 2004.] "In 2005 the term started to emerge, possibly independently, in the XML/tech area as in: >> [|XTech 2005]." "It is now a [|hot topic among the Tims Bray and O’Reilly]" These are great historical references. >> >>> "There seem to be several related threads: scientific data deemed to belong to the commons (e.g. the human genome) infrastructural data essential for scientific Endeavour (e.g. GIS) data published in scientific articles which are factual and therefore not copyrightable data as opposed to software and therefore not covered by OS licenses and potentially capable of being misappropriated. (this is a very general idea)" This is a great part of the overall definition. >> >>> >>>> "I think the current usages are sufficiently close that we should try to bring them together. Comments here would be useful. Maybe a Wikipedia article would help?" Comments "Peter, you missed my favorite link:[|http://www.opendatafoundation.org] >> We hope this will mark a new beginning for collaborative efforts towards open standards and open tools." (Pascal) >> "Although the term open data is rather new, the concept is rather old. The International Geophysical Year of 1957-8 caused the setting up of several world data centres and - more importantly - set standards for descriptive metadata to be used for data exchange and utilization." (Keith G. Jeffery) "And finally a plea; please make open data metadata formal; that is - unlike Dubln Coe - it should be machine-understandable as well as machine-readable; then it will scale (automated processes can be used rather than requiring human browsing)." (Keith G. Jeffery) [|Thursday, September 14, 2006]

"The Blue Obelisk Open Source group has now achieved a critical mass of high quality software, especially in chemoinformatics, chemical text analysis, editing and infrastructure such as markup languages (CML). We are beginning to be taken seriously and more collaborators are joining. The success is built on years of work by a few individuals. Those of you who think Open Source is now “obvious” may not realize that in domains - such as chemistry - it is normally regarded as suboptimal, carried out in “undergraduate projects” (a slur, anyway, as undergraduates have created some of our best materials)." I think it is interesting that this group, which appears "new" is not new. Also, ranking isn't built on degrees, but on the ability to contribute to the project. >> "This reputation is blown away by the splendid molecular visualize Jmol. Indeed Nature Publishing Group has chose “First Glance in Jmol” (FGiJ) as the tool with which to display proteins in their articles. Quality and Open Source are thereby recognized." This is a great notion. >> [|Thursday, September 14, 2006] (2)

"People are taking us seriously. It is still extremely hard to get support for Open Source in domains, especially chemistry (though some of us can thank funding bodies for our existence). The market is slewed to the pharma industry that has little effective interest in encouraging Open Source, even though they know that the current products are broken and do not interoperate (see earlier blogs). It is an enormous labour of love to create tools which appear to duplicate existing commercial offerings and be ignored." People are, indeed, taking this movement seriously. [|September 14, 2006 (3)]

"A major problem in chemistry is that there is a plethora of file formats and it continues to get worse. Each manufacturer thinks they are the centre of the world and everyone else will use their approach. So they make up some ad hoc format and the number of different file types multiplies. Synaptic, semantic and ontological incompatibility is rife. One speaker from the pharma industry at ACS opined that this was a fact of life we had to put up with. We don’t - and that is what the BO is about. In some sunny future we shall use XML/CML-based files in which modern tools can store and convert ontological information. But for now we have to convert between different types. This process is necessarily lossy and would normally require n(n-1)/2 programs for n file types."

I like this idea because it identifies a specific problem between what is being done and what would be a better system. [|September 16, 2006]

This is a post about Jean-Claude's work. I like the connections made this post. "The Useful chemistry blog has a remarkable and valuable feature - J-C publishes chemistry as it is being done." This is a nice statement about the nature of Jean-Claude's work. "This raises a fundamental problem in publishing - is it “science”?. To me it is obviously science - a formal description of the hypothesis and its testing by doing an experiment. The careful measurement of the results and the critical analysis. Did it work? - J-C and collaborators are prepared to admit “failure” - although failure should be a positive idea in science. By publishing he establishes the date of the experiment (and therefore priority) and invites critiques from the rest of the community. Since he is working in antimalarials he also gives the world community a chance to pick up potentially exciting compounds. >> But it isn’t part of the mainstream of scientific publishing. By putting his work on the web he has automatically forfeited the opportunity to submit the work to a mainstream journal in chemistry. Many mainstream chemistry journals require that the work has not been previously published and that includes putting it on the Web." I think this is a honest recapitulation of Jean-Claude's work. I believe it mentions some of the problems Jean-Claude faces. "So, in this way, the scientific publishing process can actually inhibit useful critiquing before publication. (Many other disciplines - such as physics and computer science - encourage the posting of preprints for community critique - and it’s sad we can’t do this for mainstream chemistry)." I think this is an important concept, as well. "Why do we publish? Unfortunately the single most important reason for many authors is “to be cited” in a high-impact journal. (Hilaire Belloc opined ‘When I am dead, I hope it may be said: “His sins were scarlet, but his books were read.”‘. Scientists may change that to “his papers were cited”). I’ll post more later on the citation economy… Jean-Claude is (possibly) forfeiting the opportunity to be cited in a high-impact journal" This is an interesting point, and one that doesn't concern Jean-Claude. "But most other reasons for publishing are fulfilled by the blog:

priority communication of the work ability to be critiqued and to gain feedback. record of the work, re-usable by others We can usefully argue whether these are done better or worse by blog than traditional methods. IMO the blog has many advantages and I’ll be developing the following themes in later posts:

the blog can experiment with semantic publishing. (So can publishers, but the investment is larger). J-C and I can start adding active CML to his blogs almost immediately. This means the blogs and wikis can act as active semantic documents (cows) not dead paper (hamburgers) the community can review the blog. This is anathema to traditionalists - unless a paper has been formally peer-reviewed it’s worthless. In some disciplines (e.g. clinical trials) I would agree. But in chemistry is the formal peer-reviewing process so wonderful? I and OSCAR (the robot) have found technical errors in almost every paper on synthesis I have looked at. Reactions that don’t “balance”, formulae that don’t square with the compound being discussed, mistyped chemical names and compound references, etc. I am sympathetic to the reviewers - an in-depth peer-review of a chemical synthesis can easily take a day. I found one where the supporting information (more later) ran to 200 pages - most of it PDF hamburger. I am not advocating the abandonment of peer-review but in some cases there are complementary approaches the blog is immediate, formal publication can take months The blog can link to other resources and unlike formal publications can be updated, preserving its revision history (in a Wikipedia-like manner)" These are excellent points about the usefulness of blogs. [|September 17, 2006] "In a communal Open Source project every developer and every tester (or user when the code is released) can contribute bugs to a buglist. There is both the incentive to post bugs and the technology to manage them. (How many of you send off bug reports after a Blue Screen Of Death on Windows?) The bugs are found, listed, prioritized and - as developers are available - are fixed. Large sites such as Apache have huge lists - many thousands - the Blue Obelisk projects have less but it is still the way we try to work. The key thing is that bugs are welcomed - of course we hate hearing about a new bug at 1 a.m. - but we’d rather know now than six months down the line."

The notion that bugs are welcome is refreshing and illustrates the non-competitive nature of this group of research chemists.

"Can we have communal peer-review? Is peer-review not something that has to be done by the great and the good? No - just as all bugs are not equal, so peer-review can be extended over the community. This is being [|explored by Nature] - typical examples are: [|Scientific publishers should let their online readers become reviewers]. and Peer Here I want to explore a special case of peer review - data review. In many sciences the data are of prime importance - they almost are the publication. Where this happens some sciences implement impressive systems for data review - a good example is in crystallography where all papers are reviewed by machines as well as humans." I liked this statement because it illustrates a concept that humanities scholars sometimes don't understand. In chemistry, the scholarship is the DATA and not always the article. In the humanities, we rely on other texts (novel, poetry, whatever) and write about that in our scholarship.

[|September 20, 2006] This post is called "the cost of decaying scientific data." I like the title, and I think the content accurately captures some of the issues. "Why is it important to archive the data? Isn’t normal academic publication (including theses) sufficient? Isn’t it very costly and a waste of money that could be spent on proper research? Well, the crystallographic community has archived its data for many years and research on this data alone has given rise to hundreds or even thousands of papers data mining this resource. Without this chemistry would be very much poorer as we would have little in the way of molecular or crystal structure systematics." I think this is an important issue. So much chemistry is shared via open chemistry, and the results are important to the community.

"So what is the cost of the unpublished data? To carry out the structures at [|commercial rates] would be about USD 1500-5000 for the size of structures currently published. Let’s assume a laboratory does 500 structures a year and if we assume that full economic costs are half the commercial (this is just a guess) - we are looking at half a million dollars per year to do crystal structures in a chemistry department. (I suspect the numbers are on the low side - I’d be interested in comments). Allowing that there has been some publication of some of the material as comments in chemical papers I suspect that the information from quite a high proportion of the structures is never published in any form. How easy is it to find information in current theses, especially if you don’t know it’s there?" Again, this is an important theme.

"I think I would be safe in saying that worldwide hundreds of millions of dollars’ worth of crystallographic data is lost each year. For spectra and synthetic chemistry it will be at least 10 times greater. Many synthetic chemists say they are interested in failed reactions - and these are almost never published! If funders are aware of this they should be concerned about the loss. Funders are increasingly being proactive in requiring funded research to be Openly accessible." This reaches the heart of technical communicators serving as advocates.

[|September 21, 2006] This post has some interesting connections.

"So I feel a considerable feeling of sadness. I am sure that if synthetic chemists had embraced computers in the same way as chess players we would be significantly better off. This is, of course, an act of faith but it’s borne out by the knowledge revolution taking place in many disciplines. The bioscientists are eagerly exploring the S/semantic W/web with formal ontologies and reasoning ..."

"I am sure that our biggest problem is the lack of an immediate Open global knowledge base in chemistry. It’s all there on paper, but to get it into a machine is a mighty task. It will need new methods of computing - including social computing..."

" So I am pleased to see the quality of the chemical blogs, even if [|Tenderbutton] is retiring. With lightweight mashup-like approaches we may be able to use the new approaches to informatics that are being developed in social computing. Biology has control of its knowledgebase - it had to fight to keep it in the genome information wars- but it’s vibrant and innovative. Chemistry has surrendered its knowledgebase to commercial and quasi-commercial interests who point in the direction of pharma rather than the information revolution. I will show in a week or two how we might be able to start regaining some of it." > [|September 23, 2006] This is an interesting post that discusses the use of PDFs in sharing chemical information. Peter argues that chemists should stop using PDFs because they can not be read, clearly or functionally, in open chemistry. Some key points:

"I am of course suggesting gently that the process of publishing organic chemical experiments is seriously and universally broken"

"The authors obviously spent a lot of time preparing this SI. The publisher probably calls it a “creative work” - you can claim copyright on creative works. I’d call it a destructive work. It doesn’t actually have a copyright notice, although the ACS has a meta-copyright where they assert copyright over all SI (except one from Henry Rzepa and me)."

"The message is simple: STOP USING PDF FOR SCIENTIFIC INFORMATION DO NOT USE PDF FOR DIGITAL CURATION" > [|September 26, 2006] This post discusses this adventure in scholarship. Some points:

"My current position - and it has changed as a result of the discussion - is that the term “Open” both unites us and causes potential confusion. “Open” has connotations of trust, collaboration, innovation, etc. but because someone espouses “Open X” that doesn’t mean they espouse “Open Y”."

"Our discussions on [|Blue Obelisk mailing list] revolved around the term “Open Source”. We use this in Blue Obelisk to mean “Open Source software” as defined by the Open Source Initiative. [The BO mantra is ODOSOS (Open data, Open Source, Open Standards). ]Naively I assumed that this was the only use of the term “Open Source”. However Jean Claude uses the term “Open Source Science” and Beth had assumed that this means that the philosophy behind Open Source software and Open Source science were identical. In fact I (and I suspect most other BO members) have not heard of Open Source Science ([|example]). So I looked this up and found it has been used about 2 years ago to mean an approach to science which relies of collaboration and openness at an early stage in the process. Here is [|Jean Claude on patents.]"

"It seems reasonable to extend “Open Source” philosophy to other initiatives that share some of the general principles of Open Source computing. However we cannot assume that the actual practice is compatible. Having looked at Wikipedia I find that “Open Source” is so widespread it needs a disambiguation page which lists an amazing number of “Open Source Foo”"

"This means that any use of “Open” is likely to be fuzzy and confusing. The “Open Access” movement is broad and supports several major points of view which, though overlapping, have significant differences either in pragmatics or philosophy. Moreover “Open Foo” does not imply “Open Bar”. Thus “[|Open Access]” publications will not by themselves ensure “Open Data”." On this page, Peter has kindly listed the various definitions and tools, as well.**

[|October 1, 2006] This was an extremely useful blog post. Peter discusses the conference he attended in Washington, DC on the Science Commons. It linked to several useful articles that I will be using in my paper.

Some other notes he makes:

"Unfortunately for me most of this debate is centered on biosciences and geosciences. I don’t find many chemists who are concerned about their commons - witness the near-global chemical silence over PubChem."

[|October 1, 2006 (2)] This, too, is an interesting post and connects almost directly with my subsequent Habermas research. Peter talks about the "tragedy of lurkers." I find this to be a fascinating concept, and one that I truly appreciate and understand. As a person who shares information freely, it is annoying that "share alike" in not observed.

Some other points:

"Open Source developers have a very lonely road. It can be years before anything takes off. So the most important thing is the community. We invest our resources in the expectation of the community developing. There is a natural hope that the users of the goods will, in some way, contribute. There is, of course, no legal requirement but I think there is a moral one. Contributions need not be code or financial (though these are appreciated) but can be bug reporting, use cases, documentation, and simple moral support. If by a lack of such contribution users make the future development of the good less easy than it would have been there is a tragedy." "I’ve worked in pharma - it can be very secretive. But I suspect there are many people in pharma who not only use Open Source but have developed material to contribute. Perhaps it is fear we have to overcome…"

[|October 4, 2006] I like this post because it details something that I believe is fundamentally different between traditional chemistry and open chemistry. In the open movement, anyone can contribute code or data or process; there are no limitations bound to degree or institution. I like this idea and think that it allows a free flow of communication.

Points: "No one takes contributions to an Open Source project and regards them as “substandard”. They are simply contributions of varying quality and use. They may be useful and buggy or thoroughly tested and irrelevant (apparently) or even possibly both or neither."

"Contributors are always honoured in an OS project, often in alphabetical order. This survives even if (or often when…) their code is refactored or removed so that not a word of the contribution remains lexically. But the contribution has still been made."

[|October 7, 2006] This post discussed the recent recipient of the nobel prize in chemistry. While I am not sure what the big deal is, I am alsonot a chemist. Peter's commentary is interesting as it relates to a blog post on another blog. That post asserts that, "**Departmental anything, not just chemistry, may be dying**, [my emphasis] but the real challenge is to find a way to give chemists both the special skills and insights that comprise chemistry at its best with the breadth and depth of knowledge of the complementary fields needed to understand where chemistry can make a contribution. The darwinian process of competitive evolution applied to science and academic recognition may not be the best way to either recognise or understand the major problems the science is now capable of solving. Do you get a Ph.D for finally synthesising something or for coming up with a question that is worth answering?"

Peter's points:

"I’m grateful to the chemical community for many things and proud to be part of it. But I hope I am more than a “chemist”. As I have posted earlier we need multidisciplinary scientists and technologists who go beyond labels. The activities of merging the language and practice of chemistry with the Internet revolution is valued outside chemistry but not yet within it."

"[|PubChem] is probably the most prominent and valuable example of the knowledge revolution in chemical science, but it is largely unknown within mainstream chemistry. It has over 5 million molecules but the driving and funding force is biology, not chemistry."

"Similarly I lamented that when I and others presented at a session on last month “Cyberchemistry” at the ACS there was virtually no effective use or interest in the Grid/cyberinfrastructure for chemistry beyond the “usual suspect” 3-4 groups. And in Open Access matters the publishers of chemistry have been among the last to explore this (and most haven’t)."

[|October 7, 2006] This is an interesting post because it inadvertantly highlights some nice connections between Habermas and the struggle between traditional and open chemistry.

Points:

" Distinguish “ownership” from re-use. I can continue to own data while allowing others to use it. Legal constraints (formal) vs. community practices (informal). Data (private until publication) vs. knowledge (public, universal). Technology and policy must address all of these. He [Andrew Lawrence] showed a knowledge chain:
 * 1) raw data (directly from instrument)
 * 2) calibrated data (skymaps, catalogues)
 * 3) physical properties (particular knowledge)
 * 4) understanding (properties in general)"

"And universities continue to urge IPR protection which is rapidly creating the anticommons."

"And chemistry… … mainly publish and throw away… …most data is lost. Our[| SPECTRa] project is looking at why this is so - so far our findings show it is social factors (”ownership”) that are the main factor. And re-use? We publish hamburgers so there aren’t many cows."

October 11, 2006 I like this post because it outlines the potential impace of the blogosphere as it applies to sharing chemistry information.

Points:

"I’m quite sure that this blogosphere will develop to become a key part of informatics (publishing, retrieval, etc.) in this area. Obviously the blogs are openly accessible, and several (like mine) use **[|**Creative Commons**]** or **[|**Science Commons**]** licenses."

"Indeed Google becomes the UberDatabase."

"If the chemistry in the blogosphere is published as InChIs then Google acts as an UberChemicalDatabse."

"The main challenge is to get InChI - which is only a year old - adopted as the main way of indexing molecules. That is where the blogosphere comes in. So we are starting to talk with the main chembloggers to see what tools are required and what type of social computing will work in this area."

"here is tangible synergy between multiple efforts - they diversify and mutate and give each other support. I can see a future where enough chemists are excited by this that most things of note end up in the blogosphere. That’s where we need tools like InChI and others to help us - we are developing some exciting tools and there will be more posts on this subject quite soon."

[|October 20, 2006] This post proposes a useful warning that I think meets the needs of the habermas/Foucaultian view of this problem:

"However Stirling was where I made the biggest mistake of my scientific life - I first signed a form transferring the copyright of my work to a publisher (I think Acta Crystallographica). Why, in the early 1970’s did no-one in the academic sector foresee the problems. A simple refusal by universities not to hand over copyright would have forestalled the commercial publishinig industry with its ownership, and worse, its power to direct scholarship. Why were librarians, senior editors and principals silent? Can we be sure that our continued inability to control our own scholarship is not leading us into an even worse future?"

[|October 20, 2006 (2)] This post is about a talk Peter gave about open scholarship. I think it has some tidy examples that are useful in this study.

I have added the entire post here because I think it is extremely useful: " Data as well as text is now ESSENTIAL - we should stop using “full-text” as it is dangerously destructive in science. “PDF” is an extremely effective way of doing this. We need compound documents (Henry Rzepa and I have coined the term [|datument]). Need automated, instant, access to and re-use of millions of published digital objects. The Harnad model of self-archiving on individual web pages with copyright retained by publishers is useless for modern robotic science. Much scientific progress is made from the experiments of others by making connections, simulations, re-interpretation. We need semantic authoring. Librarians must support the complete publication process. Problems: Successes: Other inititiatives: What must be done
 * apathy and lack of vision - scientists (especially chemists) need demonstrators before people take us seriously
 * restrictive or FUDdy IPR. Enormously destructive of time and effort
 * emphasis on visual rendering rather than semantic content. Insidiously dangerous
 * broken economic model (anticommons)
 * [|Blue Obelisk] - communal [|Open Source] ([|home], [|license]) and Open Data group [|Jmol] . JFDI
 * [|**PubChem** - (Wikipedia,] [|home page)]and pubmed (UKPMC should be exciting)
 * [|**SPECTRa** : **JISC**]. Collaborative project to reposit chemistry at source.
 * Chemical blogosphere
 * SPARC - Open Data mailing list
 * 1) DEVELOP TOOLS FOR AUTHORING, VERSIONING AND DISSEMINATING DATUMENTS. THESE MUST BE IN XML.
 * 2) INSIST THAT ALL AUTHORS’ WORKS ARE THEIR COPYRIGHT AND RE-USABLE UNDER COMMONS-LIKE LICENSE (from menu)
 * 3) INTRODUCE NEW APPROACHES TO PEER-REVIEW OF COMPLETE WORKS (WITH/WITHOUT “TEXT”). INCLUDE YOUNG PEOPLE AND SOCIAL COMPUTING
 * 4) DEVELOP AND USE LOOSELY-CONTROLLED DOMAIN-SPECIFIC VOCABULARIES (cf. microformats).
 * 5) PAY PUBLISHERS FOR WHAT ADDED VALUE THEY PROVIDE, NOT WHAT VALUE THEY CONTROL. CREATE A MARKET WHERE PUBLISHERS HAVE TO COMPETE WITH OTHER WAYS OF SOLVING THE PROBLEM (Google, folksonomies, etc.)"

[|October 26, 2006] This post discusses Rich A.'s work (and I will be going to his blog next). I have included a link here, though I am going to refer to the primary source in my paper. Thanks for the heads-up!