Sage – What is Merck’s freebie really all about?
I was kind of hoping I would be able to get a little vacation away from Open Access, but it seems to be chasing me around. There is a lot of hype in the Interwebs about a Merck spin-off non-profit organization called Sage. For those of you who haven’t heard about it yet, please see here, here, and for a more skeptical approach here and here. Open Access supporters are cheering loudly to praise the pharmaceutical giant for their good will in supposedly donating a huge amount of data to the project. The founders, Stephen Friend and Eric Schadt, liken the project to a new “Science Facebook” and happily paint the future of drug discovery with bright colors as a network of scientists all interacting together for the greater good of the society. However, there is virtually nothing on what the system is going to look like and what it is going to contain. After some more googling, I found this interview with Schadt, which sheds a bit more light on the whole deal.
Either Eric Schadt doesn’t know what he is really doing, or he is not saying everything, but the impression I got was that of vagueness. There are some encouraging signs. Both the interview, and lists of publications on Rosetta Inpharmatics’ (the subsidiary of Merck most strongly tied to the new project) website, and on Sage’s (almost empty) website suggest that the data behind Sage will be three-fold: genetic (SNP, copy number variations), RNA expression (mRNA, miRNA, other non-coding RNA), and clinical. Schadt claims that the amount of data donated initially by Merck is comparable to NCBI GEO, which is almost too good to be true. In addition, the data is more coherent and was collected in a more consistent way than the data on GEO, which is a big advantage. The way I understood it, this seed data is supposed to jump-start a wider initiative where investigators will be attracted by the data already available and will want to expand the platform using their own data. In order to expand the database in an orderly fashion, Sage is supposed to enter into an incubation phase of a few years, where only selected few institutes will be able to contribute and make use of the database, so that potential problems can be identified and tools can be adjusted to be most useful to the wider audience. Apparently, Sage founders want to avoid the situation Novartis created a while back, where they just dumped raw data onto the web and let everybody use it as they saw fit. Sage is aiming for something bigger – they actually want to create a social network of scientists all putting their heads together to tackle important disease-related questions based on widely-available public domain data and tools. I kind of see where they are going with it, but I can envisage a few problems with their approach:
- Difference in culture between academia and industry. The collaborative team-oriented efforts characteristic of pharmaceutical industry may not work out in academia. Academics are very individualistic beings and a very wide collaboration may not appeal to them as much as to industry-bred scientists.
- Problems with keeping the data coherent. They will have to develop ways to accommodate various experimental paradigms under the same umbrella. It is very unlikely that investigators in academia will want to use the exact same protocols they used at Merck, and on the other hand it will be difficult to keep the dataset consistent with the existing database if they allow too many variations in experimental methods. In other words the more scalable the system will be, the less coherent.
- Investigators’ unwillingness to work on improving the system. I’m guessing that Friend and Schadt are hoping that members of the scientific community will keep improving the system and they are planning to build tools that will facilitate it. My concern is that PIs will simply not have the time on their hands to do that kind of a job, which will carry no clear immediate benefits to themselves. Schadt actually acknowledges that as one of the problems they will have to tackle, so I am hoping they will manage to develop some kind of a reward system where people will actually be incentivized to help out.
- The cost. Schadt seems to think that the costs will only be substantial in the beginning, and after that, when all the tools are created, and the community pretty much takes over with very little involvement from the company itself, the enterprise will be pretty cheap to run. I am not so sure. If they intend to keep the system more-or-less coherent, more than a little supervision will be required, and that is never cheap, especially on a scale that they are hoping to achieve.
- Merck involvement. I’m wondering how much will Sage be able to cut off its ties with Merck. Schadt actually admits that they will “maintain some strong collaborations with Merck”. Will that mean that Merck will be eavesdropping on how the community uses the data?
That last point brings up another question: Why would a pharma giant like Merck donate data probably worth tens to hundreds of millions of dollars to the public domain? Here’s a few ideas, some of them recycled from other people’s blogs and comments:
- They desperately need new ideas from basic research to fill in their drug discovery pipeline and so they are trying to facilitate basic research. They cannot, within their own company, get through the wealth of data they have created and cannot follow every lead, so they want other people to do it for them, and then use the results for their more applied research.
- They may count on attracting new talent to Merck from the users of the database (think of the contact list they will be able to create).
- They are trying to earn some good will from the scientific community and the powers that be in hope of future gains.
That is pretty much all that can be said at this point about the initiative. Until anyone outside of Merck/Sage actually gets access to the data, and until plans for the new platform are crystallized and the big grey light bulb actually starts putting on some colors, there is really no point in blogging about it. Oh, wait… I just did, didn’t I? Crap!
Update: There is a nice story in this month’s The Scientist about predictive models in systems biology, a field that could benefit most from a project like Sage. Mind you, the article is about some exotic bacteria, but scientists have been relatively good in the past about scaling things up.
Entry filed under: Biomedical research, Computers in science, New and cool in biomed, open access. Tags: bioinformatics, Eric Schadt, john wilbanks, Merck, open access, Rosetta Inpharmatics, Sage, Science Commons, Stephen Friend.