Is Your Research Software Correct?
I'm Mike Smith! But this is an abridged talk written by Mike Croucher.
www.walkingrandomly.com
Imagine...
Your results are amazing!
but wrong
This 2003 trial, done in Kenya, found that deworming whole schools improved children’s health, school performance, and school attendance.
In 2013, the data was reanalysed independently using new computer programs
Many mistakes found.
We have a problem!
Croucher's law
I can be an idiot and WILL make mistakes.
You are no different!2>
Your Analysis?
What you did
Open package foo. Click, Click, drag, Click, Click, Click, Right-Click, Save, 'results.csv'.
Load into Excel. Click, drag, generate graph, right click, save, 'pretty-graph.png'
Your Analysis?
What you said
I analysed my data in foo using the bar analysis. Here's a graph of the results.
How reproducible is a mouse click?
Automate
aka 'learn to program'
You are already ahead of most researchers!
The Ideal
Results = TheAnalysis(MyData)
Reality
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
Write code in a (very) high-level language
- Python, MATLAB, R, Mathematica, Julia, etc
- Fewer mistakes, faster to code.
- Computer time is cheap. Programmer time is expensive.
- Ensure it's correct, then worry about speed.
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
- Write code in a (very) high-level language
Two facts that, combined, worry me:
Scientists typically spend 30% or more of their time developing software
90% or more of them are primarily self-taught
Hannay JE, Langtangen HP, MacLeod C, Pfahl D, Singer J, et al.. (2009) How do scientists develop and use scientific software? In: Proceedings Second International Workshop on Software Engineering for Computational Science and Engineering. pp. 1–8. doi:10.1109/SECSE.2009.5069155.
Prabhu P, Jablin TB, Raman A, Zhang Y, Huang J, et al.. (2011) A survey of the practice of computational science. In: Proceedings 24th ACM/IEEE Conference on High Performance Computing, Networking, Storage and Analysis. pp. 19:1–19:12. doi:10.1145/2063348.2063374.
Get some training
Just enough Software Engineering to Perform
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
- Write code in a (very) high-level language
- Get some training
Is this familiar?
- code_ver1.m
- code_ver1b_BROKEN.m
- code_ver1b_BROKEN_Working_march20.m
- code_ver1b_BROKEN_Working_march20_Bobs_mods_ForMike.m
Which version did the results come from?
Taking you back to your happy place
True Story
- Me: Can I see the code please?
- Them: I'll just get the changes from Bob folded in and email it
- Me: Shouldn't we be using version control?
- Them: No need - it's overkill. We don't have a VC problem.
- Me: The code you sent me doesn't work
- Them: Sorry. I sent the wrong version.
Which version control system should you use?
I like and use 'git' but use whatever your colleagues are using.
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
- Write code in a (very) high-level language
- Get some training
- Use version control
Get a code buddy
Doesn't have to understand your research
Remit: Tell me where I could do better?
Problem 1: Get the code running on THEIR machine
Get a code buddy
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
- Write code in a (very) high-level language
- Get some training
- Use version control
- Get a code buddy
Share your code and data openly
You've come so far...
- You can get your results by entering one command
- Your code buddy has seen your code -> Show it to the world
- Your code is in git -> upload to public github
Benefits
- It's the right thing to do
- Others will use, debug and enhance your work
- Others will reproduce and cite your work
- More opportunities to collaborate
If python: write it as a module - future you will be grateful
Literate computing
Makes it easy for people to use your module.
Use jupyter, mathematica, rstudio, matlab live, etc
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
- Write code in a (very) high-level language
- Get some training
- Use version control
- Get a code buddy
- Share your code and data openly
- Use literate computing technologies
Afraid to change your code?
Write tests
- Every decent language has a testing framework
- Learn how to use it (software carpentry)
- You write additional code that ensures your code gives the answers you expect
- Tests give you confidence to make changes
$ nosetests ./unittests.py
..............................
----------------------------------------------------------------------
Ran 30 tests in 0.152s
OK
Problem
I am an idiot and will make mistakes
(Partial) Solutions
- Automate (aka learn to program)
- Write code in a (very) high-level language
- Get some training
- Use version control
- Get a code buddy
- Share your code and data openly
- Use literate computing technologies
- Write tests