diff options
Diffstat (limited to 'software')
| -rw-r--r-- | software/chapter.tex | 66 | 
1 files changed, 44 insertions, 22 deletions
| diff --git a/software/chapter.tex b/software/chapter.tex index b52764b..b235ecb 100644 --- a/software/chapter.tex +++ b/software/chapter.tex @@ -1,6 +1,4 @@ -% TODO: add StoddenVictoria2016a (Enhancing reproducibility for computational methods)
 -
 -\chapter{Software}
 +\chapter{Software} \label{cha:sof}
  \begin{dquote}
    The following guidelines are to be used in the documentation of all software developed in the
 @@ -55,8 +53,8 @@ basic software engineering concepts.  %  This is in part due to the their general lack of formal training in programming and software
  development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software
  development through `informal self study', while \textcite{SegalJudith2004a} mentions that
 -``[scientists] do not describe themselves as software developers and have little formal education
 -or training in software development''. HannayJoErskine2009a agrees. JoppaLucasN2013a aggrees.
 +\emph{``[scientists] do not describe themselves as software developers and have little formal
 +  education or training in software development''}.
  This lack of training is not in-and-of-itself a problem.  %
  After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which
 @@ -105,7 +103,15 @@ Software development ``by-and-for'' scientists poses unique challenges.  %  In this section, I attempt to summarize the literature about these challenges, with a focus on
  those challenges that I have found most relevant.  %
 -\textbf{Extensibility.}  % TODO: cite
 +\textbf{``End-user developers.''} \cite{SegalJudith2005a, HannayJoErskine2009a, JoppaLucasN2013a}
 +% TODO: see Joppa ref 17, 21 22
 +Typically the developers of scientific software are not trained software developers.  %
 +This is perfectly appropriate, because scientific software development typically requires a large
 +amount of domain knowledge that only ``end-users'' possess.  %
 +Software development practices may not be valued in a scientific environment.  %
 +
 +\textbf{Extensibility.} \cite{SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a,
 +  PrabhuPrakash2011a}
  Many traditional software development paradigms demand an upfront articulation of goals and
  requirements.  %
  This allows the developers to carefully design their software, even before a single line of code is
 @@ -122,13 +128,18 @@ of researchers and a contracted team of software engineers.  %  \end{dquote}
 -PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorously test their programs''
 +Scientific software is \emph{explorative}, and it needs to be flexible and extendable.  %
 +Scientific software developers cannot know what will be required before they set out to try.  %
 +This is probably the most fundamental challenge in such projects, and a big part of why science
 +cannot simply ``contract out'' a large part of its software development needs.  %
 +Sometimes, a scientific problem is worked out though the iterative process of developing software
 +to solve it.  %
 -\textbf{Lifetime.}
 -PrabhuPrakash2011a--- subsection ``long history of software development''
 -Challenges with portability, and updating to ``modern standards''.
 +\textbf{Lifetime.} \cite{CarverJeffreyC2007a, PrabhuPrakash2011a}
 +Many scientific software projects have long life cycles, measured in decades or more.  %
 +Challenges with portability, and updating to ``modern standards''.  %
 -\textbf{Maintenance}
 +\textbf{Maintenance.} \cite{PrabhuPrakash2011a}
  Scientific software, especially software maintained by graduate students, tends to be very hard to
  maintain.  %
  This problem is compounded by the long lifetime of such software, and the poorly defined
 @@ -138,9 +149,17 @@ written by generation upon generation of student.  %  Worse, software is sometimes abandoned or left untouched to become a crucial but arcane component
  of a scientific research project.  %
 -\textbf{Optimization}
 -PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of
 -parallelization paradigms''
 +\textbf{Testing.} \cite{SandersRebecca2008a, PrabhuPrakash2011a}
 +Testing is a huge part of software development practices, but many researchers do not engage in
 +sufficient testing of their software...  %
 +The issue of testing is also consistent with the system of peer review...
 +Software is not typically peer reviewed...
 +Especially for domain-specific computational software, determining the ``correct outcome'' to test
 +against is often infeasible.  %
 +
 +\textbf{Optimization.} \cite{PrabhuPrakash2011a}
 +Scientists do not optimize for the common case.  %
 +Scientists are unaware of parallelization paradigms.  %
  \section{Good-enough practices}  % ================================================================
 @@ -151,7 +170,7 @@ In this section, I attempt to very quickly summarize my personal perspective on  software development good---with citations to literature that supports each idea.  %
  These practices are not, generally, \emph{extra work}.  %
  In fact, many of them save massive amounts of time and effort in the long \emph{and} short run,
 -when properly applied.  %
 +when properly applied. \cite{WilsonGreg2006a}  %
  \textbf{Do not reinvent.} \cite{WilsonGreg2017a}  %
  Before you sit down and implement a piece of software, stop!  %
 @@ -172,7 +191,7 @@ that accepts a set of arguments.  %  If your software package grows to contain multiple files, make those files modular.  %
  As a general rule, once you have two classes you need multiple files.  %
 -\textbf{Choose good data formats.} \cite{WilsonGreg2017a}  %
 +\textbf{Choose good data formats.} \cite{BaxterSusanM2006a, WilsonGreg2017a}  %
  Choose a non-proprietary format if at all possible---remember: you yourself might not have access
  to the proprietary software in 10 years.  %
  Choose plain text if you can.  %
 @@ -185,7 +204,7 @@ Make sure that it is clear what each piece of data means.  %  For tabular data, use headers.  %
  Don't forget units.  %
 -\textbf{Use version control.}  %
 +\textbf{Use version control.} \cite{BaxterSusanM2006a, WilsonGreg2006a}  %
  Version control systems allow programmers to save a software package such that they can always
  return to that save point.  %
  All of the files in the package are saved together.  %
 @@ -205,7 +224,7 @@ reason not to.  %  If the language you are using has a convention for representing the version programmatically, such
  as a \python{__version__} attribute in Python, comply with that convention.  %
 -\textbf{Test.} \cite{WilsonGreg2017a}  %
 +\textbf{Test.} \cite{BaxterSusanM2006a, WilsonGreg2006a, WilsonGreg2017a}  %
  As the old saying goes, ``if it's not tested, it's broken''.  %
  If you rely on a piece of functionality in your software, consider writing a test that defines that
  functionality.  %
 @@ -220,7 +239,7 @@ unless your project becomes very important.  %  Distribute test datasets, when appropriate.  %
  Remember, your tests can serve double duty as simple minimal examples.  %
 -\textbf{Collaborate and share.} \cite{WilsonGreg2017a, BarnesNick2010a}  %
 +\textbf{Collaborate and share.} \cite{BaxterSusanM2006a, WilsonGreg2017a, BarnesNick2010a}  %
  If you are part of a team, consider sharing software and collaborating to create it.  %
  Try using practices like code review and issue tracking, but don't feel obligated to use them if it
  doesn't make sense for your project.  %
 @@ -232,7 +251,10 @@ Put your software on an open platform, like GitHub \cite{GitHub}, and mint a DOI  Cite your software, and ask other people who are using your software to do the same.  %
  Choose a license early, and choose permissive and commercially compatible unless you 1. know what
  you are doing and 2. plan to enforce.  %
 -% TODO: cite 'publish your code it is good enough'
 +Afraid to share because your code needs more polish?  %
 +If your software is good enough to be used in active scientific research, it's worth sharing.  %
 +As Nick Barnes says, \emph{``Publish your computer code: it is good enough''}.
 +\cite{BarnesNick2010a}  %
  \textbf{Write human readable code, and document it well.} \cite{WilsonGreg2017a}  %
  Let the computer do the work, but write the program to be read by a human.  %
 @@ -290,9 +312,9 @@ class Person():  Now I can make some instances of that class, and access their attributes and methods.  %
  \begin{codefragment}{python}
  >>> mary = Person(name='Mary', favorite_food='pizza', hated_food='falafel')
 ->>> jane = Person(name='Jane', favorite_food='salad')
 +>>> jane = Person(name='Jane', favorite_food='salad'')
  >>> mary.react_to('falafel')
 -'gross---no thank you'''
 +'gross---no thank you''''''
  >>> jane.react_to('salad')
  'yum! my favorite'
  >>> mary.favorite_food
 | 
