From f8c9747d3b6425b420839ff06931b63692318f03 Mon Sep 17 00:00:00 2001 From: Blaise Thompson Date: Thu, 5 Apr 2018 09:47:06 -0500 Subject: 2018-04-05 09:47 --- software/chapter.tex | 66 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 22 deletions(-) (limited to 'software') diff --git a/software/chapter.tex b/software/chapter.tex index b52764b..b235ecb 100644 --- a/software/chapter.tex +++ b/software/chapter.tex @@ -1,6 +1,4 @@ -% TODO: add StoddenVictoria2016a (Enhancing reproducibility for computational methods) - -\chapter{Software} +\chapter{Software} \label{cha:sof} \begin{dquote} The following guidelines are to be used in the documentation of all software developed in the @@ -55,8 +53,8 @@ basic software engineering concepts. % This is in part due to the their general lack of formal training in programming and software development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software development through `informal self study', while \textcite{SegalJudith2004a} mentions that -``[scientists] do not describe themselves as software developers and have little formal education -or training in software development''. HannayJoErskine2009a agrees. JoppaLucasN2013a aggrees. +\emph{``[scientists] do not describe themselves as software developers and have little formal + education or training in software development''}. This lack of training is not in-and-of-itself a problem. % After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which @@ -105,7 +103,15 @@ Software development ``by-and-for'' scientists poses unique challenges. % In this section, I attempt to summarize the literature about these challenges, with a focus on those challenges that I have found most relevant. % -\textbf{Extensibility.} % TODO: cite +\textbf{``End-user developers.''} \cite{SegalJudith2005a, HannayJoErskine2009a, JoppaLucasN2013a} +% TODO: see Joppa ref 17, 21 22 +Typically the developers of scientific software are not trained software developers. % +This is perfectly appropriate, because scientific software development typically requires a large +amount of domain knowledge that only ``end-users'' possess. % +Software development practices may not be valued in a scientific environment. % + +\textbf{Extensibility.} \cite{SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a, + PrabhuPrakash2011a} Many traditional software development paradigms demand an upfront articulation of goals and requirements. % This allows the developers to carefully design their software, even before a single line of code is @@ -122,13 +128,18 @@ of researchers and a contracted team of software engineers. % \end{dquote} -PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorously test their programs'' +Scientific software is \emph{explorative}, and it needs to be flexible and extendable. % +Scientific software developers cannot know what will be required before they set out to try. % +This is probably the most fundamental challenge in such projects, and a big part of why science +cannot simply ``contract out'' a large part of its software development needs. % +Sometimes, a scientific problem is worked out though the iterative process of developing software +to solve it. % -\textbf{Lifetime.} -PrabhuPrakash2011a--- subsection ``long history of software development'' -Challenges with portability, and updating to ``modern standards''. +\textbf{Lifetime.} \cite{CarverJeffreyC2007a, PrabhuPrakash2011a} +Many scientific software projects have long life cycles, measured in decades or more. % +Challenges with portability, and updating to ``modern standards''. % -\textbf{Maintenance} +\textbf{Maintenance.} \cite{PrabhuPrakash2011a} Scientific software, especially software maintained by graduate students, tends to be very hard to maintain. % This problem is compounded by the long lifetime of such software, and the poorly defined @@ -138,9 +149,17 @@ written by generation upon generation of student. % Worse, software is sometimes abandoned or left untouched to become a crucial but arcane component of a scientific research project. % -\textbf{Optimization} -PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of -parallelization paradigms'' +\textbf{Testing.} \cite{SandersRebecca2008a, PrabhuPrakash2011a} +Testing is a huge part of software development practices, but many researchers do not engage in +sufficient testing of their software... % +The issue of testing is also consistent with the system of peer review... +Software is not typically peer reviewed... +Especially for domain-specific computational software, determining the ``correct outcome'' to test +against is often infeasible. % + +\textbf{Optimization.} \cite{PrabhuPrakash2011a} +Scientists do not optimize for the common case. % +Scientists are unaware of parallelization paradigms. % \section{Good-enough practices} % ================================================================ @@ -151,7 +170,7 @@ In this section, I attempt to very quickly summarize my personal perspective on software development good---with citations to literature that supports each idea. % These practices are not, generally, \emph{extra work}. % In fact, many of them save massive amounts of time and effort in the long \emph{and} short run, -when properly applied. % +when properly applied. \cite{WilsonGreg2006a} % \textbf{Do not reinvent.} \cite{WilsonGreg2017a} % Before you sit down and implement a piece of software, stop! % @@ -172,7 +191,7 @@ that accepts a set of arguments. % If your software package grows to contain multiple files, make those files modular. % As a general rule, once you have two classes you need multiple files. % -\textbf{Choose good data formats.} \cite{WilsonGreg2017a} % +\textbf{Choose good data formats.} \cite{BaxterSusanM2006a, WilsonGreg2017a} % Choose a non-proprietary format if at all possible---remember: you yourself might not have access to the proprietary software in 10 years. % Choose plain text if you can. % @@ -185,7 +204,7 @@ Make sure that it is clear what each piece of data means. % For tabular data, use headers. % Don't forget units. % -\textbf{Use version control.} % +\textbf{Use version control.} \cite{BaxterSusanM2006a, WilsonGreg2006a} % Version control systems allow programmers to save a software package such that they can always return to that save point. % All of the files in the package are saved together. % @@ -205,7 +224,7 @@ reason not to. % If the language you are using has a convention for representing the version programmatically, such as a \python{__version__} attribute in Python, comply with that convention. % -\textbf{Test.} \cite{WilsonGreg2017a} % +\textbf{Test.} \cite{BaxterSusanM2006a, WilsonGreg2006a, WilsonGreg2017a} % As the old saying goes, ``if it's not tested, it's broken''. % If you rely on a piece of functionality in your software, consider writing a test that defines that functionality. % @@ -220,7 +239,7 @@ unless your project becomes very important. % Distribute test datasets, when appropriate. % Remember, your tests can serve double duty as simple minimal examples. % -\textbf{Collaborate and share.} \cite{WilsonGreg2017a, BarnesNick2010a} % +\textbf{Collaborate and share.} \cite{BaxterSusanM2006a, WilsonGreg2017a, BarnesNick2010a} % If you are part of a team, consider sharing software and collaborating to create it. % Try using practices like code review and issue tracking, but don't feel obligated to use them if it doesn't make sense for your project. % @@ -232,7 +251,10 @@ Put your software on an open platform, like GitHub \cite{GitHub}, and mint a DOI Cite your software, and ask other people who are using your software to do the same. % Choose a license early, and choose permissive and commercially compatible unless you 1. know what you are doing and 2. plan to enforce. % -% TODO: cite 'publish your code it is good enough' +Afraid to share because your code needs more polish? % +If your software is good enough to be used in active scientific research, it's worth sharing. % +As Nick Barnes says, \emph{``Publish your computer code: it is good enough''}. +\cite{BarnesNick2010a} % \textbf{Write human readable code, and document it well.} \cite{WilsonGreg2017a} % Let the computer do the work, but write the program to be read by a human. % @@ -290,9 +312,9 @@ class Person(): Now I can make some instances of that class, and access their attributes and methods. % \begin{codefragment}{python} >>> mary = Person(name='Mary', favorite_food='pizza', hated_food='falafel') ->>> jane = Person(name='Jane', favorite_food='salad') +>>> jane = Person(name='Jane', favorite_food='salad'') >>> mary.react_to('falafel') -'gross---no thank you''' +'gross---no thank you'''''' >>> jane.react_to('salad') 'yum! my favorite' >>> mary.favorite_food -- cgit v1.2.3