diff options
-rw-r--r-- | software/chapter.tex | 134 |
1 files changed, 122 insertions, 12 deletions
diff --git a/software/chapter.tex b/software/chapter.tex index 074e0b9..8bc81af 100644 --- a/software/chapter.tex +++ b/software/chapter.tex @@ -21,11 +21,26 @@ \clearpage
-% SOFTWARE IS PART OF SCIENCE
+\section{Science needs software} % ===============================================================
-Cutting-edge science increasingly relies on custom software. In their 2008 survey,
-\textcite{HannayJoErskine2009a} demonstrated just how important software is to the modern
-scientist. %
+Cutting-edge science increasingly relies on custom software. %
+Software does more than just help scientists analyze data---scientific software enables scientists
+to collect, analyze, and model results in ways that would otherwise be wholly impossible. %
+
+How does scientific software get made? %
+Who makes it, and what is the quality of that product? %
+Much has been written about these questions. %
+To this authors knowledge, there are at least 8 case studies and surveys dedicated to how
+scientists develop and use scientific software. \cite{CardDavidN1986a, SeamanCarolynB1997a,
+ MullerMatthias2001a, SegalJudith2004a, SegalJudith2005a, CarverJeffreyC2007a,
+ HannayJoErskine2009a, PrabuPrakash2011a} %
+Although they focus on different disciplines, and were published at different times, these articles
+present a remarkably consistent perspective on what challenges tend to arise when developing
+software ``by and for'' scientists. %
+
+Scientists do more than just use software: they develop it. %
+In their 2008 survey, \textcite{HannayJoErskine2009a} showed just how much of the work of science
+comes down to software development: %
\begin{ditemize}
\item 84.3\% of surveyed scientists state that developing scientific software is important or
very important for their own research.
@@ -36,20 +51,111 @@ scientist. % \item On average, scientists spend approximately 30\% of their work time developing scientific
software.
\end{ditemize}
+PrabhuPrakash2011a---35\% developing, breakdown by type of work...
+
Despite the importance of software to science and scientists, most scientists are not familiar with
basic software engineering concepts. %
-% TODO: demonstrate that `most scientists are not familiar with basic software engineering concepts'
-This is in part due to the their general lack of formal training in programming and software development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software development through `informal self study'. Indeed, I myself have never been formally trained in software development.
+This is in part due to the their general lack of formal training in programming and software
+development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software
+development through `informal self study', while \textcite{SegalJudith2004a} mentions that
+``[scientists] do not describe themselves as software developers and have little formal education
+or training in software development''. HannayJoErskine2009a agrees.
+
+This lack of training is not in-and-of-itself a problem. %
+After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which
+they receive no formal training: everything from plumbing and electrical engineering to human
+resources and project management. %
+So why pay particular attention to software development practices and skills? %
+
+One reason to pay special attention to software is that software mistakes can have particularly
+dramatic consequences. %
+As experimentalists in the physical sciences, we are often tempted by the intuition that small
+mistakes lead to small errors. %
+These intuitions do not typically apply to software---software is ``brittle'' and small bugs have
+huge consequences. %
+In his 2015 opinion article ``Rampant software errors may undermine scientific results'', David A.
+W. Soergel attempts to estimate how many errors there might be in scientific software, and how far
+reaching the consequences might be. %
+Quoting Soergel:
+
+\begin{dquote}
+ ...software is profoundly brittle: ``small'' bugs commonly have unbounded error propagation. %
+ A sign error, a missing semicolon, an off-by-one error in matching up two columns of data, etc.
+ will render the results complete noise. %
+ It is rare that a software bug would alter a small proportion of the data by a small amount. %
+ More likely, it systematically alters every data point, or occurs in some downstream aggregate
+ step with effectively global consequences. %
+ In general, software errors produce outcomes that are inaccurate, not merely imprecise. %
+
+\end{dquote}
+
+On a more positive note, better software development practices may be ``low-hanging-fruit'' that
+can greatly improve researcher's lives without huge amounts of investment. %
+Great software makes science easier, faster, and often of higher quality. %
+And making great software isn't necessarily harder than the development practices that scientists
+are following today---indeed sometimes it is easier to follow best practices. %
+
+\section{Challenges in scientific software development} % ========================================
+
+Software development ``by-and-for'' scientists poses unique challenges. %
+
+\subsection{Extensibility} % ---------------------------------------------------------------------
+
+Many traditional software development paradigms demand an upfront articulation of goals and
+requirements. %
+This allows the developers to carefully design their software, even before a single line of code is
+written. %
+In her seminal 2005 case study \textcite{SegalJudith2005a} describes a collaboration between a team
+of researchers and a contracted team of software engineers. %
+
+\begin{dquote}
+ Unlinke traditional commercial software developers, but very much like developers in open source
+ projects or startups, scientific programmers usually don't get their requirements from customers,
+ and their requirements are rarely frozen.
+ In fact, scientists often can't know what their programs should do next until the current version
+ has produced some results.
+
+\end{dquote}
+
+\subsection{Testing} % ---------------------------------------------------------------------------
+
+PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorusly test their programs''
+
+\subsection{Lifetime} % --------------------------------------------------------------------------
+
+PrabhuPrakash2011a--- subsection ``long history of software development''
+
+\subsection{Optimization} % ----------------------------------------------------------------------
+
+PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of
+parallelization paradigms''
+
+\subsection{Maintenance} % -----------------------------------------------------------------------
-% GOOD SOFTWARE MAKES SCIENCE EASIER AND FASTER
+\section{Good-enough practices} % ================================================================
-% CHALLENGES IN SCIENTIFIC SOFTWARE DEVELOPMENT
+In their [...] perspective, ``Good enough practices in scientific computing'', (from which this
+section gets its name) [WILSON ET AL] describe a set of techniques that, in their words, ``every
+researcher can and should consider adopting''. %
-% scientific software---focused on extensibility
+Let the computer do the work...
-Software development in a scientific context poses unique challenges. Many traditional software development paradigms demand an upfront articulation of goals and requirements. This allows the developers to carefully design their software, even before a single line of code is written. In her seminal 2005 case study \textcite{SegalJudith2005a} describes a collaboration between a team of researchers and a contracted team of software engineers. Ultimately
-% TODO: finish the discussion of SegalJudith2005a
-% TODO: segue to reccomendation of agile development practices: http://agilemanifesto.org/
+Write programs for people, not computers. %
+
+Don't repeat yourself, or others (we built on top of scipy, hdf5).
+
+Plan for mistakes / use testing.
+
+Write first, optimize later.
+
+Document document docuement.
+
+Collaborate.
+Code review...
+Issues...
+Make incremental changes...
+
+\subsection{Data formats} % ----------------------------------------------------------------------
% HDF5
@@ -57,6 +163,10 @@ Software development in a scientific context poses unique challenges. Many tradi % OBJECT ORIENTED PROGRAMMING
+\subsection{Version control} % -------------------------------------------------------------------
+
% SOURCE CONTROL AND VERSIONING
+\subsection{Licensing and distribution} % --------------------------------------------------------
+
% LICENSING AND DISTRIBUTION
\ No newline at end of file |