aboutsummaryrefslogtreecommitdiff
path: root/software
diff options
context:
space:
mode:
authorBlaise Thompson <blaise@untzag.com>2017-11-12 18:51:13 -0600
committerBlaise Thompson <blaise@untzag.com>2017-11-12 18:51:13 -0600
commit4ddc0bcecdd172e6fbed0df2e80dfc7663b6ab73 (patch)
tree98247665aa5dfed337adb5a9f113ca30b2e160fd /software
parentcc1859e9a25b7c2a54e66515a6bb45ce918d28c1 (diff)
structure
Diffstat (limited to 'software')
-rw-r--r--software/chapter.tex115
1 files changed, 115 insertions, 0 deletions
diff --git a/software/chapter.tex b/software/chapter.tex
new file mode 100644
index 0000000..e2c9652
--- /dev/null
+++ b/software/chapter.tex
@@ -0,0 +1,115 @@
+% TODO: add StoddenVictoria2016.000 (Enhancing reproducibility for computational methods)
+% TODO: add MillmanKJarrod2011.000 (Python for Scientists and Engineers)
+% TODO: add vanderWaltStefan2011.000 (The NumPy Array: A Structure for Efficient Numerical Computation)
+% TODO: reference https://www.nsf.gov/pubs/2016/nsf16532/nsf16532.htm (Software Infrastructure for Sustained Innovation (SI2: SSE & SSI))
+
+\chapter{Software}
+
+Cutting-edge science increasingly relies on custom software. In their 2008 survey, \textcite{HannayJoErskine2009.000} demonstrated just how important software is to the modern scientist.
+\begin{enumerate}[topsep=-1.5ex, itemsep=0ex, partopsep=0ex, parsep=0ex, label=$\rightarrow$]
+ \item 84.3\% of surveyed scientists state that developing scientific software is important or very important for their own research.
+ \item 91.2\% of surveyed scientists state that using scientific software is important or very important for their own research.
+ \item On average, scientists spend approximately 40\% of their work time using scientific software.
+ \item On average, scientists spend approximately 30\% of their work time developing scientific software.
+\end{enumerate}
+Despite the importance of software to science and scientists, most scientists are not familiar with basic software engineering concepts.
+% TODO: demonstrate that `most scientists are not familiar with basic software engineering concepts'
+This is in part due to the their general lack of formal training in programming and software development. \textcite{HannayJoErskine2009.000} found that over 90\% of scientists learn software development through `informal self study'. Indeed, I myself have never been formally trained in software development.
+
+Software development in a scientific context poses unique challenges. Many traditional software development paradigms demand an upfront articulation of goals and requirements. This allows the developers to carefully design their software, even before a single line of code is written. In her seminal 2005 case study \textcite{SegalJudith2005.000} describes a collaboration between a team of researchers and a contracted team of software engineers. Ultimately
+% TODO: finish the discussion of SegalJudith2005.000
+% TODO: segue to reccomendation of agile development practices: http://agilemanifesto.org/
+
+\section{Overview}
+
+In the Wright Group, \gls{PyCMDS} replaces the old acquisition softwares `ps control', written by Kent Meyer and `Control for Lots of Research in Spectroscopy' written by Schuyler Kain.
+
+\section{WrightTools}
+
+WrightTools is a software package at the heart of all work in the Wright Group.
+
+\section{PyCMDS}
+
+PyCMDS directly addresses the hardware during experiments.
+
+\subsection{Overview}
+
+PyCMDS has, through software improvements alone, dramatically lessened scan times...
+
+\begin{itemize}[topsep=-1.5ex, itemsep=0ex, partopsep=0ex, parsep=0ex, label=$\rightarrow$]
+ \item simultaneous motor motion
+ \item digital signal processing % TODO: reference section when it exists
+ \item ideal axis positions \ref{sec:ideal_axis_positions}
+\end{itemize}
+
+\subsection{Ideal Axis Positions}\label{sec:ideal_axis_positions}
+
+Frequency domain multidimensional spectroscopy is a time-intensive process. A typical \gls{pixel} takes between one-half second and three seconds to acquire. Depending on the exact hardware being scanned and signal being detected, this time may be mostly due to hardware motion or signal collection. Due to the \gls{curse of dimensionality}, a typical three-dimensional CMDS experiment contains roughly 100,000 pixels. CMDS hardware is transiently-reliable, so speeding up experiments is a crucial component of unlocking ever larger dimensionalities and higher resolutions.
+
+One obvious way to decrease the scan-time is to take fewer pixels. Traditionally, multidimensional scans are done with linearly arranged points in each axis---this is the simplest configuration to program into the acquisition software. Because signal features are often sparse or slowly varying (especially so in high-dimensional scans) linear stepping means that \emph{most of the collected pixels} are duplicates or simply noise. A more intelligent choice of axis points can capture the same nonlinear spectrum in a fraction of the total pixel count.
+
+An ideal distribution of pixels is linearized in \emph{signal}, not coordinate. This means that every signal level (think of a contour in the N-dimensional case) has roughly the same number of pixels defining it. If some generic multidimensional signal goes between 0 and 1, one would want roughly 10\% of the pixels to be between 0.9 and 1.0, 10\% between 0.8 and 0.9 and so on. If the signal is sparse in the space explored (imagine a narrow two-dimensional Lorentzian in the center of a large 2D-Frequency scan) this would place the majority of the pixels near the narrow peak feature(s), with only a few of them defining the large (in axis space) low-signal floor. In contrast linear stepping would allocate the vast majority of the pixels in the low-signal 0.0 to 0.1 region, with only a few being used to capture the narrow peak feature. Of course, linearizing pixels in signal requires prior expectations about the shape of the multidimensional signal---linear stepping is still an appropriate choice for low-resolution ``survey'' scans.
+
+CMDS scans often posses correlated features in the multidimensional space. In order to capture such features as cheaply as possible, one would want to define regions of increased pixel density along the correlated (diagonal) lineshape. As a concession to reasonable simplicity, our acquisition software (PyCMDS) assumes that all scans constitute a regular array with-respect-to the scanned axes. We can acquire arbitrary points along each axis, but not for the multidimensional scan. This means that we cannot achieve strictly ideal pixel distributions for arbitrary datasets. Still, we can do much better than linear spacing. % TODO: refer to PyCMDS/WrightTools 'regularity' requirement when that section exists
+
+Almost all CMDS lineshapes (in frequency and delay) can be described using just a few lineshape functions:
+
+\begin{itemize}[topsep=-1.5ex, itemsep=0ex, partopsep=0ex, parsep=0ex, label=$\rightarrow$]
+ \item exponential
+ \item Gaussian
+ \item Lorentzian
+ \item bimolecular
+\end{itemize}
+
+Exponential and bimolecular dynamics fall out of simple first and second-order kinetics (I will ignore higher-order kinetics here). Gaussians come from our Gaussian pulse envelopes or from normally-distributed inhomogeneous broadening. The measured line-shapes are actually convolutions of the above. I will ignore the convolution except for a few illustrative special cases. More exotic lineshapes are possible in CMDS---quantum beating and breathing modes, for example---I will also ignore these. Derivations of the ideal pixel positions for each of these lineshapes appear below. %TODO: cite Wright Group quantum beating paper, Kambempati breathing paper
+
+\subsection{Exponential}
+
+Simple exponential decays are typically used to describe population and coherence-level dynamics in CMDS. For some generic exponential signal $S$ with time constant $\tau$,
+
+\begin{equation} \label{eq:simple_exponential_decay}
+S(t) = \me^{-\frac{t}{\tau}}.
+\end{equation}
+
+We can write the conjugate equation to \ref{eq:simple_exponential_decay}, asking ``what $t$ do I need to get a certain signal level?'':
+
+\begin{eqnarray}
+\log{(S)} &=& -\frac{t}{\tau} \\
+t &=& -\tau\log{(S)}.
+\end{eqnarray}
+
+So to step linearly in $t$, my step size has to go as $-\tau\log{(S)}$.
+
+We want to go linearly in signal, meaning that we want to divide $S$ into even sections. If $S$ goes from 0 to 1 and we choose to acquire $N$ points,
+
+\begin{eqnarray}
+t_n &=& -\tau\log{\left(\frac{n}{N}\right)}.
+\end{eqnarray}
+
+Note that $t_n$ starts at long times and approaches zero delay. So the first $t_1$ is the smallest signal and $t_N$ is the largest.
+
+Now we can start to consider realistic cases, like where $\tau$ is not quite known and where some other longer dynamics persist (manifested as a static offset). Since these values are not separable in a general system, I'll keep $S$ normalized between 0 and 1.
+
+\begin{eqnarray}
+S &=& (1-c)\me^{-\frac{t}{\tau_{\mathrm{actual}}}} + c \\
+S_n &=& (1-c)\me^{-\frac{-\tau_{\mathrm{step}}\log{\left(\frac{n}{N}\right)}}{\tau_{\mathrm{actual}}}} + c \\
+S_n &=& (1-c)\me^{-\frac{\tau_{\mathrm{step}}}{\tau_{\mathrm{actual}}} \log{\left(\frac{N}{n}\right)}} + c \\
+S_n &=& (1-c)\left(\frac{N}{n}\right)^{-\frac{\tau_{\mathrm{step}}}{\tau_{\mathrm{actual}}}} + c \\
+S_n &=& (1-c)\left(\frac{n}{N}\right)^{\frac{\tau_{\mathrm{step}}}{\tau_{\mathrm{actual}}}} + c
+\end{eqnarray}
+
+\begin{figure}[p!] \label{fig:exponential_steps}
+ \centering
+ \includegraphics[scale=0.5]{"software/PyCMDS/ideal axis positions/exponential"}
+ \caption[TODO]{TODO}
+\end{figure}
+
+\subsubsection{Gaussian}
+
+\subsubsection{Lorentzian}
+
+\subsubsection{Bimolecular}
+
+\section{WrightSim}
+
+WrightSim does simulations.