Archives: Computational biology

Bibbreviate: a Python package for BibTeX journal abbreviations

Journal abbreviations and BibTeX drive me nuts.  Some journals demand that you submit manuscripts with abbreviated journal titles in the bibliography, while others want fully spelled-out names.  The only solution that I’m aware of is to keep two .bib files, one with abbreviate names and one without.  However, this means keeping two .bib files up to date, and I don’t know how people do that without tearing their hair out.  This got so annoying that I finally wrote a Python script to use the JabRef abbreviations lists to flip back and forth as needed.  That way, I can keep one master .bib file and abbreviate it when I need to.

The results aren’t always perfect, but they do cut out 90+% of the find-and-replace work needed to handle a journal that asks for abbreviate titles.  If you need that too, you can download the package from pypi, or grab it from my GitHub repo.  Fixes and feature additions are welcome, so feel free to fork it and send a pull request!

You must post your source code in science.

An image of the Ctrl-Alt-Delete source code.

Show me the source!

I’ve made it pretty clear by now that I do a lot of computational work in my research, so you can imagine that my metaphorical ears perked up when I came across this article in Nature by Nick Barnes about releasing scientific source code when you publish research papers.  I liked the article he wrote, but since this is my blog, I’m allowed to go even further than he did.

I’ve written a lot of software as a hobbyist over the years, ever since I began programming in elementary school.  A lot of it I simply tinkered with and forgot, but I’ve released source into the wild before, though I’m not an open source / free software zealot in general;  for example, I think that the FSF is, frankly, a little out to lunch about the issues.

However, when it comes to science I am militant:  if you publish a scientific paper based on the results of code that you have written, then that code is a part of your methodology and must be made available to others in the scientific community so that they can examine it and replicate your results.  Nobody is allowed to get away with saying “well, we did this DNA sequencing, but I’m not going to tell you how we did it, what materials or equipment we used, or what our procedures were – you just have to trust us that our results are right”.  That wouldn’t fly in any reputable journal, and it shouldn’t be allowed when it comes to source code.  The Barnes article implies that some people are too ashamed of their code to release it – but if you’re too ashamed to let other people see it, why are you publishing results based on it?

The other excuses he mentions in the article are equally rubbish, but one that he didn’t mention (which I’ve actually come across) is “oh, well, I can’t give you the source code because you’ll use it to do further work and publish papers that I want to do”.   This infuriated me when I heard it.  How the author in question managed to justify this in my head mystifies me;  where in science do you get to claim that you can’t release your methodology because other people might replicate or extend your work?  That’s the whole point of science.  You don’t get to write a paper which proposes a great new method and then admonish people that they can’t use it until you’re done publishing on it!

The only possible exception that I can see to this is in cases where the paper describes a finished product that is being made available to the scientific community (either free or for pay), but I don’t think too much about these cases because they strike me as being more of an advertisement than anything else.  There’s nothing inherently wrong with that, but it will also be fairly rare.  I would also distinguish between specific products (like, say, a GIS tool) and a new method, like a statistical analysis package.  The latter should release the code, no exceptions, because otherwise we can’t validate the method for ourselves.  An example of doing it right comes from the Laland lab at St. Andrews, who published a new method for measuring the spread of information across a network (network based diffusion analysis; NBDA).  Along with the paper, they released the source code and a package for R to help users implement and use the method themselves.

In the end, science thrives on the free exchange of information for the advancement of our collective knowledge. Anyone who feels that their source code is not a part of that exchange is not only wrong, they’re doing bad science.

Image credit: ptufts

Computational behavioural ecology…?

An image of a man wearing a t-shirt that reads "Deep down inside, we all love math".

I know I do.

I’ve written about the methods that we use in behavioural ecology, and the method that I use the most is definitely modeling.  To be more precise, I do a lot of computer simulation work on the evolution of behaviour (my focus is on evolutionary algorithms and individual-based models).  I do some formal mathematical modeling as well, primarily in game theory, but the bulk of my research is computational.  I admit it:  I’m a computer geek, and I always have been.  I love writing software, I love tinkering with code and hardware, and my natural approach to biological questions has always been to throw processor cycles at them.

Which leaves me wondering: what do we call computational studies of behavioural ecology?

The obvious answer, computational biology, is – I think – wrong.  At least, as it is currently defined, computational biology seems to be heavily focused on questions at the level of the cell or below.  If you look at the Wikipedia entry on computational biology, you’ll see that the examples given are all about cells, molecular biology, genomics, and so on.  Bioinformatics, computational genomics, “computational biomodeling” (not sure what that is, to be honest), systems biology, etc. are all examples of labels under the heading of computational biology, and none of them apply to the kind of work I do.  It’s natural that a lot of attention would be focused on this level of inquiry – people doing exciting work in genomics, cell biology, and proteomics are drowning in data and need computers to help them climb out of the well.  But I spend my time at the level of the individual and the evolution of their behavior, which doesn’t give me a lot to talk about with the computational biology people.

At the other end of the scale is the relatively new field of computational ecology.  If you forced me to chose right now, I would probably throw in with this camp, but it’s still a bit of an uneasy fit.  Computational ecology focuses on global population-level questions, and big ecosystems with many layers of complexity.  This is a fascinating area of work, but just like behavioural ecology differs from classical ethology / ecology in focusing on the individual, so too does the work I do focus on the evolution of mechanisms and behaviour at the level of the individual.  A typical question that I’m working on right now is the evolution of learning mechanisms for social foraging – how do animals learn to use the best strategy when foraging in a group, and what is the form of the mechanisms which allow them to do that?

And in the end, I’m left wondering where I fit.  There are others like me, of course;  for example, I’ve always admired the work of Dr. Graeme Ruxton, as well as the Laland group, both of which have done work in the same vein (this is by no means an exhaustive list, of either the people whose work I admire or who do work in the same area).  With the increasing specialization of scientists into subfields of subfields of major fields, I’m hesitant to invent a new term for myself and others like me, but maybe it’s time.

So:  computational behavioural ecology, anyone?

(Photo credit: Network Osaka)