From 8011c6bab4bf2994f7f4ba7498e7a4c78db70463 Mon Sep 17 00:00:00 2001 From: Blaise Thompson Date: Mon, 2 Apr 2018 17:12:47 -0500 Subject: 2018-04-02 17:12 --- software/chapter.tex | 134 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 122 insertions(+), 12 deletions(-) diff --git a/software/chapter.tex b/software/chapter.tex index 074e0b9..8bc81af 100644 --- a/software/chapter.tex +++ b/software/chapter.tex @@ -21,11 +21,26 @@ \clearpage -% SOFTWARE IS PART OF SCIENCE +\section{Science needs software} % =============================================================== -Cutting-edge science increasingly relies on custom software. In their 2008 survey, -\textcite{HannayJoErskine2009a} demonstrated just how important software is to the modern -scientist. % +Cutting-edge science increasingly relies on custom software. % +Software does more than just help scientists analyze data---scientific software enables scientists +to collect, analyze, and model results in ways that would otherwise be wholly impossible. % + +How does scientific software get made? % +Who makes it, and what is the quality of that product? % +Much has been written about these questions. % +To this authors knowledge, there are at least 8 case studies and surveys dedicated to how +scientists develop and use scientific software. \cite{CardDavidN1986a, SeamanCarolynB1997a, + MullerMatthias2001a, SegalJudith2004a, SegalJudith2005a, CarverJeffreyC2007a, + HannayJoErskine2009a, PrabuPrakash2011a} % +Although they focus on different disciplines, and were published at different times, these articles +present a remarkably consistent perspective on what challenges tend to arise when developing +software ``by and for'' scientists. % + +Scientists do more than just use software: they develop it. % +In their 2008 survey, \textcite{HannayJoErskine2009a} showed just how much of the work of science +comes down to software development: % \begin{ditemize} \item 84.3\% of surveyed scientists state that developing scientific software is important or very important for their own research. @@ -36,20 +51,111 @@ scientist. % \item On average, scientists spend approximately 30\% of their work time developing scientific software. \end{ditemize} +PrabhuPrakash2011a---35\% developing, breakdown by type of work... + Despite the importance of software to science and scientists, most scientists are not familiar with basic software engineering concepts. % -% TODO: demonstrate that `most scientists are not familiar with basic software engineering concepts' -This is in part due to the their general lack of formal training in programming and software development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software development through `informal self study'. Indeed, I myself have never been formally trained in software development. +This is in part due to the their general lack of formal training in programming and software +development. \textcite{HannayJoErskine2009a} found that over 90\% of scientists learn software +development through `informal self study', while \textcite{SegalJudith2004a} mentions that +``[scientists] do not describe themselves as software developers and have little formal education +or training in software development''. HannayJoErskine2009a agrees. + +This lack of training is not in-and-of-itself a problem. % +After all, academic scientists are required to be ``do-it-yourself''ers in many contexts for which +they receive no formal training: everything from plumbing and electrical engineering to human +resources and project management. % +So why pay particular attention to software development practices and skills? % + +One reason to pay special attention to software is that software mistakes can have particularly +dramatic consequences. % +As experimentalists in the physical sciences, we are often tempted by the intuition that small +mistakes lead to small errors. % +These intuitions do not typically apply to software---software is ``brittle'' and small bugs have +huge consequences. % +In his 2015 opinion article ``Rampant software errors may undermine scientific results'', David A. +W. Soergel attempts to estimate how many errors there might be in scientific software, and how far +reaching the consequences might be. % +Quoting Soergel: + +\begin{dquote} + ...software is profoundly brittle: ``small'' bugs commonly have unbounded error propagation. % + A sign error, a missing semicolon, an off-by-one error in matching up two columns of data, etc. + will render the results complete noise. % + It is rare that a software bug would alter a small proportion of the data by a small amount. % + More likely, it systematically alters every data point, or occurs in some downstream aggregate + step with effectively global consequences. % + In general, software errors produce outcomes that are inaccurate, not merely imprecise. % + +\end{dquote} + +On a more positive note, better software development practices may be ``low-hanging-fruit'' that +can greatly improve researcher's lives without huge amounts of investment. % +Great software makes science easier, faster, and often of higher quality. % +And making great software isn't necessarily harder than the development practices that scientists +are following today---indeed sometimes it is easier to follow best practices. % + +\section{Challenges in scientific software development} % ======================================== + +Software development ``by-and-for'' scientists poses unique challenges. % + +\subsection{Extensibility} % --------------------------------------------------------------------- + +Many traditional software development paradigms demand an upfront articulation of goals and +requirements. % +This allows the developers to carefully design their software, even before a single line of code is +written. % +In her seminal 2005 case study \textcite{SegalJudith2005a} describes a collaboration between a team +of researchers and a contracted team of software engineers. % + +\begin{dquote} + Unlinke traditional commercial software developers, but very much like developers in open source + projects or startups, scientific programmers usually don't get their requirements from customers, + and their requirements are rarely frozen. + In fact, scientists often can't know what their programs should do next until the current version + has produced some results. + +\end{dquote} + +\subsection{Testing} % --------------------------------------------------------------------------- + +PrabhuPrakash2011a---lots of good stuff under ``Scientists do not rigorusly test their programs'' + +\subsection{Lifetime} % -------------------------------------------------------------------------- + +PrabhuPrakash2011a--- subsection ``long history of software development'' + +\subsection{Optimization} % ---------------------------------------------------------------------- + +PrabhuPrakash2011a: ``scientists do not optimize for the common case'', ``scientists are unaware of +parallelization paradigms'' + +\subsection{Maintenance} % ----------------------------------------------------------------------- -% GOOD SOFTWARE MAKES SCIENCE EASIER AND FASTER +\section{Good-enough practices} % ================================================================ -% CHALLENGES IN SCIENTIFIC SOFTWARE DEVELOPMENT +In their [...] perspective, ``Good enough practices in scientific computing'', (from which this +section gets its name) [WILSON ET AL] describe a set of techniques that, in their words, ``every +researcher can and should consider adopting''. % -% scientific software---focused on extensibility +Let the computer do the work... -Software development in a scientific context poses unique challenges. Many traditional software development paradigms demand an upfront articulation of goals and requirements. This allows the developers to carefully design their software, even before a single line of code is written. In her seminal 2005 case study \textcite{SegalJudith2005a} describes a collaboration between a team of researchers and a contracted team of software engineers. Ultimately -% TODO: finish the discussion of SegalJudith2005a -% TODO: segue to reccomendation of agile development practices: http://agilemanifesto.org/ +Write programs for people, not computers. % + +Don't repeat yourself, or others (we built on top of scipy, hdf5). + +Plan for mistakes / use testing. + +Write first, optimize later. + +Document document docuement. + +Collaborate. +Code review... +Issues... +Make incremental changes... + +\subsection{Data formats} % ---------------------------------------------------------------------- % HDF5 @@ -57,6 +163,10 @@ Software development in a scientific context poses unique challenges. Many tradi % OBJECT ORIENTED PROGRAMMING +\subsection{Version control} % ------------------------------------------------------------------- + % SOURCE CONTROL AND VERSIONING +\subsection{Licensing and distribution} % -------------------------------------------------------- + % LICENSING AND DISTRIBUTION \ No newline at end of file -- cgit v1.2.3