2018-03-12 10:42

author: Blaise Thompson <blaise@untzag.com> 2018-03-12 10:42:25 -0500
committer: Blaise Thompson <blaise@untzag.com> 2018-03-12 10:42:25 -0500
commit: 7a15287015fb33da2050ea2d75969a8c8ff3c49c (patch)
tree: d7b316717d524063fa2edbd150141f24e196df00 /processing/chapter.tex
parent: f5b039d94276ceb078d27b09c7719ae54c72139f (diff)
1 files changed, 55 insertions, 72 deletions
diff --git a/processing/chapter.tex b/processing/chapter.tex
index b755fb0..9d50bac 100644
--- a/processing/chapter.tex
+++ b/processing/chapter.tex
@@ -1,95 +1,78 @@
 \chapter{Processing}
 
-\section{Overview}
+% TODO: cool quote, if I can think of one
 
-In the Wright Group, \gls{PyCMDS} replaces the old acquisition softwares `ps control', written by Kent Meyer and `Control for Lots of Research in Spectroscopy' written by Schuyler Kain.
+From a data science perspective, CMDS has several unique challenges:
+\begin{ditemize}
+  \item Dimensionality of datasets can typically be greater than two, complicating
+    \textbf{representation}.
+  \item Shape and dimensionality change...
+  \item Data can be large (over one million points).  % TODO: contextualize large (not BIG DATA)
+\end{ditemize}
+I have designed a software package that directly addresses these issues.  %
 
-\section{WrightTools}
+WrightTools is a software package at the heart of all work in the Wright Group.  %
 
-WrightTools is a software package at the heart of all work in the Wright Group.
+% TODO: more intro
 
-\section{PyCMDS}
+\section{Data object model}  % ====================================================================
 
-PyCMDS directly addresses the hardware during experiments.
+WrightTools uses a programming strategy called object oriented programming (OOP).  %
 
-\subsection{Overview}
+It contains a central data ``container'' that is capable of storing all of the information about
+each multidimensional (or one-dimensional) spectra.  %
 
-PyCMDS has, through software improvements alone, dramatically lessened scan times...
+\subsubsection{Python interface}  % ---------------------------------------------------------------
 
-\begin{itemize}[topsep=-1.5ex, itemsep=0ex, partopsep=0ex, parsep=0ex, label=$\rightarrow$]
-	\item simultaneous motor motion
-	\item digital signal processing  % TODO: reference section when it exists
-	\item ideal axis positions \ref{sec:ideal_axis_positions}
-\end{itemize}
+WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural''
+application programming interface (API).  %
+To use WrightTools, simply import:
 
-\subsection{Ideal Axis Positions}\label{sec:ideal_axis_positions}
+\begin{minted}{python}
+import numpy as np
+ 
+def incmatrix(genl1,genl2):
+    m = len(genl1)
+    n = len(genl2)
+    M = None #to become the incidence matrix
+    VT = np.zeros((n*m,1), int)  #dummy variable
+ 
+    #compute the bitwise xor matrix
+    M1 = bitxormatrix(genl1)
+    M2 = np.triu(bitxormatrix(genl2),1) 
+ 
+    for i in range(m-1):
+        for j in range(i+1, m):
+            [r,c] = np.where(M2 == M1[i,j])
+            for k in range(len(r)):
+                VT[(i)*n + r[k]] = 1;
+                VT[(i)*n + c[k]] = 1;
+                VT[(j)*n + r[k]] = 1;
+                VT[(j)*n + c[k]] = 1;
+ 
+                if M is None:
+                    M = np.copy(VT)
+                else:
+                    M = np.concatenate((M, VT), 1)
+ 
+                VT = np.zeros((n*m,1), int)
+ 
+    return M
+\end{minted}
 
-Frequency domain multidimensional spectroscopy is a time-intensive process. A typical \gls{pixel} takes between one-half second and three seconds to acquire. Depending on the exact hardware being scanned and signal being detected, this time may be mostly due to hardware motion or signal collection. Due to the \gls{curse of dimensionality}, a typical three-dimensional CMDS experiment contains roughly 100,000 pixels. CMDS hardware is transiently-reliable, so speeding up experiments is a crucial component of unlocking ever larger dimensionalities and higher resolutions.
+\subsubsection{wt5 file format}  % ----------------------------------------------------------------
 
-One obvious way to decrease the scan-time is to take fewer pixels. Traditionally, multidimensional scans are done with linearly arranged points in each axis---this is the simplest configuration to program into the acquisition software. Because signal features are often sparse or slowly varying (especially so in high-dimensional scans) linear stepping means that \emph{most of the collected pixels} are duplicates or simply noise. A more intelligent choice of axis points can capture the same nonlinear spectrum in a fraction of the total pixel count.
 
-An ideal distribution of pixels is linearized in \emph{signal}, not coordinate. This means that every signal level (think of a contour in the N-dimensional case) has roughly the same number of pixels defining it. If some generic multidimensional signal goes between 0 and 1, one would want roughly 10\% of the pixels to be between 0.9 and 1.0, 10\% between 0.8 and 0.9 and so on. If the signal is sparse in the space explored (imagine a narrow two-dimensional Lorentzian in the center of a large 2D-Frequency scan) this would place the majority of the pixels near the narrow peak feature(s), with only a few of them defining the large (in axis space) low-signal floor. In contrast linear stepping would allocate the vast majority of the pixels in the low-signal 0.0 to 0.1 region, with only a few being used to capture the narrow peak feature. Of course, linearizing pixels in signal requires prior expectations about the shape of the multidimensional signal---linear stepping is still an appropriate choice for low-resolution ``survey'' scans.
+\section{Artists}  % ==============================================================================
 
-CMDS scans often posses correlated features in the multidimensional space. In order to capture such features as cheaply as possible, one would want to define regions of increased pixel density along the correlated (diagonal) lineshape. As a concession to reasonable simplicity, our acquisition software (PyCMDS) assumes that all scans constitute a regular array with-respect-to the scanned axes. We can acquire arbitrary points along each axis, but not for the multidimensional scan. This means that we cannot achieve strictly ideal pixel distributions for arbitrary datasets. Still, we can do much better than linear spacing.  % TODO: refer to PyCMDS/WrightTools 'regularity' requirement when that section exists
 
-Almost all CMDS lineshapes (in frequency and delay) can be described using just a few lineshape functions:
 
-\begin{itemize}[topsep=-1.5ex, itemsep=0ex, partopsep=0ex, parsep=0ex, label=$\rightarrow$]
-	\item exponential
-	\item Gaussian
-	\item Lorentzian
-	\item bimolecular
-\end{itemize}
+\section{Fitting}  % ==============================================================================
 
-Exponential and bimolecular dynamics fall out of simple first and second-order kinetics (I will ignore higher-order kinetics here). Gaussians come from our Gaussian pulse envelopes or from normally-distributed inhomogeneous broadening. The measured line-shapes are actually convolutions of the above. I will ignore the convolution except for a few illustrative special cases. More exotic lineshapes are possible in CMDS---quantum beating and breathing modes, for example---I will also ignore these. Derivations of the ideal pixel positions for each of these lineshapes appear below. %TODO: cite Wright Group quantum beating paper, Kambempati breathing paper
 
-\subsection{Exponential}
 
-Simple exponential decays are typically used to describe population and coherence-level dynamics in CMDS. For some generic exponential signal $S$ with time constant $\tau$,
+\section{Distribution and licensing}  % ===========================================================
 
-\begin{equation} \label{eq:simple_exponential_decay}
-S(t) = \me^{-\frac{t}{\tau}}.
-\end{equation}
 
-We can write the conjugate equation to \ref{eq:simple_exponential_decay}, asking ``what $t$ do I need to get a certain signal level?'':
 
-\begin{eqnarray}
-\log{(S)} &=& -\frac{t}{\tau} \\
-t &=& -\tau\log{(S)}.
-\end{eqnarray}
-
-So to step linearly in $t$, my step size has to go as $-\tau\log{(S)}$.
-
-We want to go linearly in signal, meaning that we want to divide $S$ into even sections. If $S$ goes from 0 to 1 and we choose to acquire $N$ points,
-
-\begin{eqnarray}
-t_n &=& -\tau\log{\left(\frac{n}{N}\right)}.
-\end{eqnarray}
-
-Note that $t_n$ starts at long times and approaches zero delay. So the first $t_1$ is the smallest signal and $t_N$ is the largest.
-
-Now we can start to consider realistic cases, like where $\tau$ is not quite known and where some other longer dynamics persist (manifested as a static offset). Since these values are not separable in a general system, I'll keep $S$ normalized between 0 and 1.
-
-\begin{eqnarray}
-S &=& (1-c)\me^{-\frac{t}{\tau_{\mathrm{actual}}}} + c \\
-S_n &=& (1-c)\me^{-\frac{-\tau_{\mathrm{step}}\log{\left(\frac{n}{N}\right)}}{\tau_{\mathrm{actual}}}} + c \\
-S_n &=& (1-c)\me^{-\frac{\tau_{\mathrm{step}}}{\tau_{\mathrm{actual}}} \log{\left(\frac{N}{n}\right)}} + c \\
-S_n &=& (1-c)\left(\frac{N}{n}\right)^{-\frac{\tau_{\mathrm{step}}}{\tau_{\mathrm{actual}}}} + c \\
-S_n &=& (1-c)\left(\frac{n}{N}\right)^{\frac{\tau_{\mathrm{step}}}{\tau_{\mathrm{actual}}}} + c
-\end{eqnarray}
-
-\begin{dfigure}[p!]
-	\includegraphics[scale=0.5]{"processing/PyCMDS/ideal axis positions/exponential"}
-	\caption[TODO]{TODO}
-  \label{fig:exponential_steps}
-\end{dfigure}
-
-\subsubsection{Gaussian}
-
-\subsubsection{Lorentzian}
-
-\subsubsection{Bimolecular}
-
-\section{WrightSim}
-
-WrightSim does simulations.
+\section{Future directions}  % ====================================================================
+\ No newline at end of file
author	Blaise Thompson <blaise@untzag.com>	2018-03-12 10:42:25 -0500
committer	Blaise Thompson <blaise@untzag.com>	2018-03-12 10:42:25 -0500
commit	7a15287015fb33da2050ea2d75969a8c8ff3c49c (patch)
tree	d7b316717d524063fa2edbd150141f24e196df00 /processing/chapter.tex
parent	f5b039d94276ceb078d27b09c7719ae54c72139f (diff)