Home - maccergit/Rosalind GitHub Wiki

This is wiki for the Rosalind Python project. I started working on the Rosalind problems to get a deeper understanding of Python - learning bioinformatics was an interesting side effect. While I don't have a particularly strong biology background, I have a chemistry and physics background in addition to my software engineering, so I don't find the science side of the problems to be too complex. One of the reasons I chose Rosalind for learning Python is that bioinformatics tends to involve a lot of string processing. The other project I use to learn Python is Project Euler, which is a mathematics site. By working on both sets of problems, I cover string processing and numerical processing, which covers a lot of the processing problems typically encountered. Both sites also have problems that deal with the need for algorithms that scale well - the examples are small, but the actual problem to solve often cannot be solved using "brute force" techniques. Rosalind puts an emphasis on this by including a time limit to submit answers.

If you are looking for a similar set of Python projects with an emphasis on numerical computing, check out my Project Euler repository.

If you are looking to get a start in programming Python, or even a start just in programming, this is the repo for you - it includes a short introduction to Python, along with an introduction to standard programming approaches to solving problems. Also, the Rosalind problems tend to include more background and assume less prior knowledge than the Euler problems. It is more of a teaching site than the Euler one, which is more of an exploration site.

I also note that the Rosalind team asks that solutions to the problems not be simply copied - if you wish to learn the topics presented, you should give them an honest effort to solve them yourself - you may surprise yourself. Your solution doesn't always have to be elegant or efficient - modern computers are so fast that sometimes a naive, brute force approach is enough to process large amounts of data. However, don't be surprised when this is not always the case - the field of Computer Science is full of examples of problems that scale exponentially in a way where a brute force approach will take the fastest machines millions of years to compute the answer, but a more efficient approach is available that allows a simple home machine to get the answer in a few seconds.

This wiki is meant to also address some of the annoying things about solving the Rosalind problems. For example, the site does not talk about the actual process of getting the data, processing it, and submitting the solution - although the first of the Python Village problems gives you a taste of what to expect, you still don't need to download or upload any files. However, most of the later problems require downloading a data file - and the clock starts running at that moment. You then need to process that file into another file (because the solution is too large to paste into a web form) and then upload that file to the web site to be checked - and all of that needs to be done in 5 minutes. If your solution code is not set up to read its input from a file and to write the answer to a file, you may run out of time. Likewise, some of the problems are picky about the format of the results file - sometimes in ways that are not obvious. While solving these issues can simulate working in the real world, where not everything is well documented, it can also be frustrating - hopefully, documenting these issues will help others. Note that I also have a teaching background, so don't expect to just be handed the answers up front - sometimes, a few leading questions are enough to put a person on the right track.

This is a "work in progress", and I expect it will be under development for quite some time. Also, this repo only gets sporadic updates - it is a hobby of mine, and only one out of several hobbies.

Project Pages

Python Village

Algorithmic Heights