From f8c9747d3b6425b420839ff06931b63692318f03 Mon Sep 17 00:00:00 2001
From: Blaise Thompson <blaise@untzag.com>
Date: Thu, 5 Apr 2018 09:47:06 -0500
Subject: 2018-04-05 09:47

---
 software/chapter.tex | 66 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 22 deletions(-)

(limited to 'software')

diff --git a/software/chapter.tex b/software/chapter.tex
index b52764b..b235ecb 100644
--- a/software/chapter.tex
+++ b/software/chapter.tex
@@ -1,6 +1,4 @@
-% TODO: add StoddenVictoria2016a (Enhancing reproducibility for computational methods)
-
-\chapter{Software}
+\chapter{Software} \label{cha:sof}
 
 \begin{dquote}
   The following guidelines are to be used in the documentation of all software developed in the
@@ -55,8 +53,8 @@ basic software engineering concepts.  %
 This is in part due to the their general lack of formal training in programming and software
 development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software
 development through `informal self study', while \textcite{SegalJudith2004a} mentions that
-``[scientists] do not describe themselves as software developers and have little formal education
-or training in software development''. HannayJoErskine2009a agrees. JoppaLucasN2013a aggrees.
+\emph{``[scientists] do not describe themselves as software developers and have little formal
+  education or training in software development''}.
 
 This lack of training is not in-and-of-itself a problem.  %
 After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which
@@ -105,7 +103,15 @@ Software development ``by-and-for'' scientists poses unique challenges.  %
 In this section, I attempt to summarize the literature about these challenges, with a focus on
 those challenges that I have found most relevant.  %
 
-\textbf{Extensibility.}  % TODO: cite
+\textbf{``End-user developers.''} \cite{SegalJudith2005a, HannayJoErskine2009a, JoppaLucasN2013a}
+% TODO: see Joppa ref 17, 21 22
+Typically the developers of scientific software are not trained software developers.  %
+This is perfectly appropriate, because scientific software development typically requires a large
+amount of domain knowledge that only ``end-users'' possess.  %
+Software development practices may not be valued in a scientific environment.  %
+
+\textbf{Extensibility.} \cite{SegalJudith2005a, CarverJeffreyC2007a, HannayJoErskine2009a,
+  PrabhuPrakash2011a}
 Many traditional software development paradigms demand an upfront articulation of goals and
 requirements.  %
 This allows the developers to carefully design their software, even before a single line of code is
@@ -122,13 +128,18 @@ of researchers and a contracted team of software engineers.  %
 
 \end{dquote}
 
-PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorously test their programs''
+Scientific software is \emph{explorative}, and it needs to be flexible and extendable.  %
+Scientific software developers cannot know what will be required before they set out to try.  %
+This is probably the most fundamental challenge in such projects, and a big part of why science
+cannot simply ``contract out'' a large part of its software development needs.  %
+Sometimes, a scientific problem is worked out though the iterative process of developing software
+to solve it.  %
 
-\textbf{Lifetime.}
-PrabhuPrakash2011a--- subsection ``long history of software development''
-Challenges with portability, and updating to ``modern standards''.
+\textbf{Lifetime.} \cite{CarverJeffreyC2007a, PrabhuPrakash2011a}
+Many scientific software projects have long life cycles, measured in decades or more.  %
+Challenges with portability, and updating to ``modern standards''.  %
 
-\textbf{Maintenance}
+\textbf{Maintenance.} \cite{PrabhuPrakash2011a}
 Scientific software, especially software maintained by graduate students, tends to be very hard to
 maintain.  %
 This problem is compounded by the long lifetime of such software, and the poorly defined
@@ -138,9 +149,17 @@ written by generation upon generation of student.  %
 Worse, software is sometimes abandoned or left untouched to become a crucial but arcane component
 of a scientific research project.  %
 
-\textbf{Optimization}
-PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of
-parallelization paradigms''
+\textbf{Testing.} \cite{SandersRebecca2008a, PrabhuPrakash2011a}
+Testing is a huge part of software development practices, but many researchers do not engage in
+sufficient testing of their software...  %
+The issue of testing is also consistent with the system of peer review...
+Software is not typically peer reviewed...
+Especially for domain-specific computational software, determining the ``correct outcome'' to test
+against is often infeasible.  %
+
+\textbf{Optimization.} \cite{PrabhuPrakash2011a}
+Scientists do not optimize for the common case.  %
+Scientists are unaware of parallelization paradigms.  %
 
 \section{Good-enough practices}  % ================================================================
 
@@ -151,7 +170,7 @@ In this section, I attempt to very quickly summarize my personal perspective on
 software development good---with citations to literature that supports each idea.  %
 These practices are not, generally, \emph{extra work}.  %
 In fact, many of them save massive amounts of time and effort in the long \emph{and} short run,
-when properly applied.  %
+when properly applied. \cite{WilsonGreg2006a}  %
 
 \textbf{Do not reinvent.} \cite{WilsonGreg2017a}  %
 Before you sit down and implement a piece of software, stop!  %
@@ -172,7 +191,7 @@ that accepts a set of arguments.  %
 If your software package grows to contain multiple files, make those files modular.  %
 As a general rule, once you have two classes you need multiple files.  %
 
-\textbf{Choose good data formats.} \cite{WilsonGreg2017a}  %
+\textbf{Choose good data formats.} \cite{BaxterSusanM2006a, WilsonGreg2017a}  %
 Choose a non-proprietary format if at all possible---remember: you yourself might not have access
 to the proprietary software in 10 years.  %
 Choose plain text if you can.  %
@@ -185,7 +204,7 @@ Make sure that it is clear what each piece of data means.  %
 For tabular data, use headers.  %
 Don't forget units.  %
 
-\textbf{Use version control.}  %
+\textbf{Use version control.} \cite{BaxterSusanM2006a, WilsonGreg2006a}  %
 Version control systems allow programmers to save a software package such that they can always
 return to that save point.  %
 All of the files in the package are saved together.  %
@@ -205,7 +224,7 @@ reason not to.  %
 If the language you are using has a convention for representing the version programmatically, such
 as a \python{__version__} attribute in Python, comply with that convention.  %
 
-\textbf{Test.} \cite{WilsonGreg2017a}  %
+\textbf{Test.} \cite{BaxterSusanM2006a, WilsonGreg2006a, WilsonGreg2017a}  %
 As the old saying goes, ``if it's not tested, it's broken''.  %
 If you rely on a piece of functionality in your software, consider writing a test that defines that
 functionality.  %
@@ -220,7 +239,7 @@ unless your project becomes very important.  %
 Distribute test datasets, when appropriate.  %
 Remember, your tests can serve double duty as simple minimal examples.  %
 
-\textbf{Collaborate and share.} \cite{WilsonGreg2017a, BarnesNick2010a}  %
+\textbf{Collaborate and share.} \cite{BaxterSusanM2006a, WilsonGreg2017a, BarnesNick2010a}  %
 If you are part of a team, consider sharing software and collaborating to create it.  %
 Try using practices like code review and issue tracking, but don't feel obligated to use them if it
 doesn't make sense for your project.  %
@@ -232,7 +251,10 @@ Put your software on an open platform, like GitHub \cite{GitHub}, and mint a DOI
 Cite your software, and ask other people who are using your software to do the same.  %
 Choose a license early, and choose permissive and commercially compatible unless you 1. know what
 you are doing and 2. plan to enforce.  %
-% TODO: cite 'publish your code it is good enough'
+Afraid to share because your code needs more polish?  %
+If your software is good enough to be used in active scientific research, it's worth sharing.  %
+As Nick Barnes says, \emph{``Publish your computer code: it is good enough''}.
+\cite{BarnesNick2010a}  %
 
 \textbf{Write human readable code, and document it well.} \cite{WilsonGreg2017a}  %
 Let the computer do the work, but write the program to be read by a human.  %
@@ -290,9 +312,9 @@ class Person():
 Now I can make some instances of that class, and access their attributes and methods.  %
 \begin{codefragment}{python}
 >>> mary = Person(name='Mary', favorite_food='pizza', hated_food='falafel')
->>> jane = Person(name='Jane', favorite_food='salad')
+>>> jane = Person(name='Jane', favorite_food='salad'')
 >>> mary.react_to('falafel')
-'gross---no thank you'''
+'gross---no thank you''''''
 >>> jane.react_to('salad')
 'yum! my favorite'
 >>> mary.favorite_food
-- 
cgit v1.2.3