aboutsummaryrefslogtreecommitdiff
path: root/processing/chapter.tex
diff options
context:
space:
mode:
authorBlaise Thompson <blaise@untzag.com>2018-03-31 12:08:35 -0500
committerBlaise Thompson <blaise@untzag.com>2018-03-31 12:08:35 -0500
commite40e84ad9b891c96ffe7cda884087c0b9dc098c7 (patch)
treed3b1d472553e26d5a4b25834fdb93c9e4fda7689 /processing/chapter.tex
parent548cc56e7b65184d1e10a26711837e18f189c136 (diff)
2018-03-31 12:08
Diffstat (limited to 'processing/chapter.tex')
-rw-r--r--processing/chapter.tex221
1 files changed, 135 insertions, 86 deletions
diff --git a/processing/chapter.tex b/processing/chapter.tex
index db8da3b..81886c2 100644
--- a/processing/chapter.tex
+++ b/processing/chapter.tex
@@ -23,19 +23,49 @@
\clearpage
-From a data science perspective, CMDS has several unique challenges:
+CMDS takes a somewhat unique approach to instrumental science. %
+There are not that many well-defined, well-trodden experimental paths. %
+The basic ideas stay the same, but the real power is in the creativity and flexibility to tweak the
+experiment according to the particular question being asked. %
+How, then could one go about making a data processing software package for CMDS? %
+The package has to be flexible enough to accommodate the diversity of experiments, but still solid
+enough to be a foundational tool. %
+
+When creating a toolkit for CMDS, there are several challenges worth considering:
\begin{ditemize}
\item Dimensionality of datasets can typically be greater than two, complicating
\textbf{representation}.
- \item Shape and dimensionality change...
- \item Data can be large (over one million points). % TODO: contextualize large (not BIG DATA)
+ \item Shape and dimensionality change, and relevant axes can be different from the scanned
+ dimensions. %
+ \item Data can be awkwardly large-ish (several million pixels), and can become legitimately large
+ in numerical simulations. %
+ \item There are no agreed-upon file formats for CMDS dataset storage. %
\end{ditemize}
-I have designed a software package that directly addresses these issues. %
-
-WrightTools is a software package at the heart of all work in the Wright Group. %
-
-% TODO: more intro
+The biggest challenge is to find a really good definition for what constitutes a CMDS dataset. %
+Once understood, this common denominator can be enshrined into software and built upon. %
+
+WrightTools is a software package written in Python, built using the excellent tools provided by
+the scientific Python collection of packages, especially Scipy and Numpy. % TODO: cite cite cite
+WrightTools defines a universal file-format that is flexible enough to encompass the diversity of
+CMDS while still being entirely self-describing. %
+This file format is based on the popular binary format ``HDF5''. % TODO: cite
+This format allows for computers to interact with the arrays piece-by-piece in a very fast and
+reliable way, without loading the entire array in and out of memory. %
+WrightTools piggybacks on this, allowing users to interact with legitimately large CMDS datasets
+without worrying about memory overflow. %
+WrightTools takes a unique approach to representing CMDS data in array format, nick-named
+``semi-structure'', that allows for greater flexibility in representing CMDS in different
+coordinate spaces. %
+
+WrightTools is written to be used in scripts and in the command line. %
+It does not have any graphical components built in, except for the ability to generate plots using
+matplotlib. % TODO: cite
+Being built in this way gives WrightTools users maximum flexibility, and allows for rapid
+collaborative development. %
+It also allows other software packages to use WrightTools as a ``back-end'' foundational software,
+as has already been done in simulation and acquisition software created in the Wright Group. %
+\clearpage
\section{Introduction to WrightTools} % ==========================================================
WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural''
@@ -47,7 +77,7 @@ To use WrightTools, simply import:
3.0.0
\end{codefragment}
I'll discuss more about how exactly WrightTools packaging, distribution, and instillation works in
-\autoref{sec:processing_distbribution}.
+\autoref{pro:sec:processing_distribution}.
We can use the builtin Python function \python{dir} to interrogate the contents of the
WrightTools package. %
@@ -117,75 +147,86 @@ spectrum, and channels are each a particular kind of signal within that spectrum
Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels
\python{[pmt, pyro1, pyro2, pyro3]}. %
-As an overview, the following lexicographically lists the attributes and methods of
-\python{Data}. %
-\begin{ditemize}
- \item method \python{collapse}: Collapse along one dimension in a well-defined way.
- \item method \python{convert}: Convert all axes of a certain kind.
- \item method \python{create_channel}: Create a new channel.
- \item method \python{create_variable}: Create a new variable.
- \item method \python{fullpath}
- \item method \python{get_nadir}
- \item method \python{get_zenith}
- \item method \python{heal}
- \item attribute \python{kind}
- \item method \python{level}
- \item method \python{map_variable}
- \item attribute \python{natural_name}
- \item attribute \python{ndim}
- \item method \python{offset}
- \item method \python{print_tree}
- \item method \python{remove_channel}
- \item method \python{remove_variable}
- \item method \python{rename_channels}
- \item method \python{rename_variables}
- \item attribute \python{shape}
- \item method \python{share_nans}
- \item attribute \python{size}
- \item method \python{smooth}
- \item attribute \python{source}
- \item method \python{split}
- \item method \python{transform}
- \item attribute \python{units}
- \item attribute \python{variable_names}
- \item attribute \python{variables}
- \item method \python{zoom}
-\end{ditemize}
+\begin{table}
+ \begin{tabular}{c | c | l}
+ & type & description \\ \hline
+ \python{collapse} & method & Collapse along one dimension in a well-defined way. \\ \hline
+ \python{convert} & method & Convert all axes of a certain kind. \\ \hline
+ \python{create_channel} & method & Create a new channel. \\ \hline
+ \python{create_variable} & method & Create a new variable. \\ \hline
+ \python{fullpath} & attribute & \\ \hline
+ \python{get_nadir} & & \\ \hline
+ \python{get_zenith} & & \\ \hline
+ \python{heal} & & \\ \hline
+ \python{kind} & & \\ \hline
+ \python{level} & & \\ \hline
+ \python{map_variable} & & \\ \hline
+ \python{natural_name} & & \\ \hline
+ \python{ndim} & & \\ \hline
+ \python{offset} & & \\ \hline
+ \python{print_tree} & & \\ \hline
+ \python{remove_channel} & & \\ \hline
+ \python{remove_variable} & & \\ \hline
+ \python{rename_channels} & & \\ \hline
+ \python{shape} & & \\ \hline
+ \python{share_nans} & & \\ \hline
+ \python{size} & & \\ \hline
+ \python{smooth} & & \\ \hline
+ \python{source} & & \\ \hline
+ \python{split} & & \\ \hline
+ \python{transform} & & \\ \hline
+ \python{units} & & \\ \hline
+ \python{variable_names} & & \\ \hline
+ \python{variables} & & \\ \hline
+ \python{zoom} & & \\ \hline
+ \end{tabular}
+ \caption[Attributes and methods of Data]{
+ Key attributes and methods of data, lexicographically listed
+ }
+\end{table}
Each data object contains instances of \python{Channel} and \python{Variable} which represent the
principle multidimensional arrays. %
The following lexicographically lists the attributes of these instances. %
Certain methods and attributes are unique to only one type of dataset, and are marked as such. %
-\begin{ditemize}
- \item method \python{argmax}
- \item method \python{argmin}
- \item method \python{chunkwise}
- \item method \python{clip}
- \item method \python{convert}
- \item attribute \python{full}
- \item attribute \python{fullpath}
- \item attribute \python{label} (variable only)
- \item method \python{log}
- \item method \python{log10}
- \item method \python{log2}
- \item method \python{mag}
- \item attribute \python{major_extent} (channel only)
- \item method \python{max}
- \item method \python{min}
- \item attribute \python{minor_extent} (channel only)
- \item attribute \python{natural_name}
- \item method \python{normalize} (channel only)
- \item attribute \python{null} (channel only)
- \item attribute \python{parent}
- \item attribute \python{points}
- \item attribute \python{signed} (channel only)
- \item method \python{slices}
- \item method \python{symmetric_root}
- \item method \python{trim} (channel only)
-\end{ditemize}
+
Channels and variables also support direct indexing / slicing using \python{__getitem__}, as
discussed more in... % TODO: where is it discussed more?
-
+
+\begin{table}
+ \begin{tabular}{c | c | c | l}
+ & type & of & description \\ \hline
+ \python{argmax} & method & both & \\ \hline
+ \python{argmin} & & & \\ \hline
+ \python{chunkwise} & & & \\ \hline
+ \python{clip} & & & \\ \hline
+ \python{convert} & & & \\ \hline
+ \python{full} & & & \\ \hline
+ \python{fullpath} & & & \\ \hline
+ \python{label} & attribute & variable & \\ \hline
+ \python{log} & & & \\ \hline
+ \python{log10} & & & \\ \hline
+ \python{log2} & & & \\ \hline
+ \python{mag} & & & \\ \hline
+ \python{major_extent} & attribute & channel & \\ \hline
+ \python{max} & & & \\ \hline
+ \python{min} & & & \\ \hline
+ \python{minor_extent} & attribute & channel & \\ \hline
+ \python{natural_name} & & & \\ \hline
+ \python{normalize} & & channel & \\ \hline
+ \python{null} & & channel & \\ \hline
+ \python{parent} & & & \\ \hline
+ \python{points} & & & \\ \hline
+ \python{signed} & & channel & \\ \hline
+ \python{slices} & & & \\ \hline
+ \python{symmetric_root}
+ \python{trim} & & channel & \\ \hline
+ \end{tabular}
+ \caption[Attributes and methods of Channel and Variable.]{
+ Key attributes and methods of channel and variable, lexicographically listed
+ }
+\end{table}
+
Axes are ways to organize data as functional of particular variables (and combinations thereof). %
The \python{Axis} class does not directly contain the respective arrays---it merely refers to the
associated variables. %
@@ -199,20 +240,28 @@ include \python{'w1'}, \python{'w1=wm'}, \python{'w1+w2'}, \python{'2*w1'}, \pyt
Axes can be directly indexed / sliced into using \python{__getitem__}, and they support many of the
``numpy-like'' attributes. %
A lexicographical list of axis attributes and methods follows.
-\begin{ditemize}
- \item attribute \python{full}
- \item attribute \python{label}
- \item attribute \python{natural_name}
- \item attribute \python{ndim}
- \item attribute \python{points}
- \item attribute \python{shape}
- \item attribute \python{size}
- \item attribute \python{units_kind}
- \item attribute \python{variables}
- \item method \python{convert}
- \item method \python{min}
- \item method \python{max}
-\end{ditemize} % TODO: actually lexicographical
+
+
+\begin{table}
+ \begin{tabular}{c | c | l}
+ & type & description \\ \hline
+ \python{full} & & \\ \hline
+ \python{label} & & \\ \hline
+ \python{natural_name} & & \\ \hline
+ \python{ndim} & & \\ \hline
+ \python{points} & & \\ \hline
+ \python{shape} & & \\ \hline
+ \python{size} & & \\ \hline
+ \python{units_kind} & & \\ \hline
+ \python{variables} & & \\ \hline
+ \python{convert} & & \\ \hline
+ \python{min} & & \\ \hline
+ \python{max} & & \\ \hline
+ \end{tabular}
+ \caption[Attributes and methods of Axis.]{
+ Key attributes and methods of axis, lexicographically listed
+ }
+\end{table}
\section{Creating a data object} % ===============================================================
@@ -309,7 +358,7 @@ Conceptually, it behaves like a folder in a traditional file-system. %
The primary attributes and methods of \python{Collection} are
\begin{ditemize}
- \item attribute item_names
+ \item attribute \python{item_names}
\item attribute \python{fullpath}
\end{ditemize}
% TODO: finish adding attributes and methodsd