From e40e84ad9b891c96ffe7cda884087c0b9dc098c7 Mon Sep 17 00:00:00 2001 From: Blaise Thompson Date: Sat, 31 Mar 2018 12:08:35 -0500 Subject: 2018-03-31 12:08 --- processing/chapter.tex | 221 ++++++++++++++++++++++++++++++------------------- 1 file changed, 135 insertions(+), 86 deletions(-) (limited to 'processing') diff --git a/processing/chapter.tex b/processing/chapter.tex index db8da3b..81886c2 100644 --- a/processing/chapter.tex +++ b/processing/chapter.tex @@ -23,19 +23,49 @@ \clearpage -From a data science perspective, CMDS has several unique challenges: +CMDS takes a somewhat unique approach to instrumental science. % +There are not that many well-defined, well-trodden experimental paths. % +The basic ideas stay the same, but the real power is in the creativity and flexibility to tweak the +experiment according to the particular question being asked. % +How, then could one go about making a data processing software package for CMDS? % +The package has to be flexible enough to accommodate the diversity of experiments, but still solid +enough to be a foundational tool. % + +When creating a toolkit for CMDS, there are several challenges worth considering: \begin{ditemize} \item Dimensionality of datasets can typically be greater than two, complicating \textbf{representation}. - \item Shape and dimensionality change... - \item Data can be large (over one million points). % TODO: contextualize large (not BIG DATA) + \item Shape and dimensionality change, and relevant axes can be different from the scanned + dimensions. % + \item Data can be awkwardly large-ish (several million pixels), and can become legitimately large + in numerical simulations. % + \item There are no agreed-upon file formats for CMDS dataset storage. % \end{ditemize} -I have designed a software package that directly addresses these issues. % - -WrightTools is a software package at the heart of all work in the Wright Group. % - -% TODO: more intro +The biggest challenge is to find a really good definition for what constitutes a CMDS dataset. % +Once understood, this common denominator can be enshrined into software and built upon. % + +WrightTools is a software package written in Python, built using the excellent tools provided by +the scientific Python collection of packages, especially Scipy and Numpy. % TODO: cite cite cite +WrightTools defines a universal file-format that is flexible enough to encompass the diversity of +CMDS while still being entirely self-describing. % +This file format is based on the popular binary format ``HDF5''. % TODO: cite +This format allows for computers to interact with the arrays piece-by-piece in a very fast and +reliable way, without loading the entire array in and out of memory. % +WrightTools piggybacks on this, allowing users to interact with legitimately large CMDS datasets +without worrying about memory overflow. % +WrightTools takes a unique approach to representing CMDS data in array format, nick-named +``semi-structure'', that allows for greater flexibility in representing CMDS in different +coordinate spaces. % + +WrightTools is written to be used in scripts and in the command line. % +It does not have any graphical components built in, except for the ability to generate plots using +matplotlib. % TODO: cite +Being built in this way gives WrightTools users maximum flexibility, and allows for rapid +collaborative development. % +It also allows other software packages to use WrightTools as a ``back-end'' foundational software, +as has already been done in simulation and acquisition software created in the Wright Group. % +\clearpage \section{Introduction to WrightTools} % ========================================================== WrightTools is written in Python, and endeavors to have a ``pythonic'', explicit and ``natural'' @@ -47,7 +77,7 @@ To use WrightTools, simply import: 3.0.0 \end{codefragment} I'll discuss more about how exactly WrightTools packaging, distribution, and instillation works in -\autoref{sec:processing_distbribution}. +\autoref{pro:sec:processing_distribution}. We can use the builtin Python function \python{dir} to interrogate the contents of the WrightTools package. % @@ -117,75 +147,86 @@ spectrum, and channels are each a particular kind of signal within that spectrum Typical variables might be \python{[w1, w2, w3, d1, d2]}, and typical channels \python{[pmt, pyro1, pyro2, pyro3]}. % -As an overview, the following lexicographically lists the attributes and methods of -\python{Data}. % -\begin{ditemize} - \item method \python{collapse}: Collapse along one dimension in a well-defined way. - \item method \python{convert}: Convert all axes of a certain kind. - \item method \python{create_channel}: Create a new channel. - \item method \python{create_variable}: Create a new variable. - \item method \python{fullpath} - \item method \python{get_nadir} - \item method \python{get_zenith} - \item method \python{heal} - \item attribute \python{kind} - \item method \python{level} - \item method \python{map_variable} - \item attribute \python{natural_name} - \item attribute \python{ndim} - \item method \python{offset} - \item method \python{print_tree} - \item method \python{remove_channel} - \item method \python{remove_variable} - \item method \python{rename_channels} - \item method \python{rename_variables} - \item attribute \python{shape} - \item method \python{share_nans} - \item attribute \python{size} - \item method \python{smooth} - \item attribute \python{source} - \item method \python{split} - \item method \python{transform} - \item attribute \python{units} - \item attribute \python{variable_names} - \item attribute \python{variables} - \item method \python{zoom} -\end{ditemize} +\begin{table} + \begin{tabular}{c | c | l} + & type & description \\ \hline + \python{collapse} & method & Collapse along one dimension in a well-defined way. \\ \hline + \python{convert} & method & Convert all axes of a certain kind. \\ \hline + \python{create_channel} & method & Create a new channel. \\ \hline + \python{create_variable} & method & Create a new variable. \\ \hline + \python{fullpath} & attribute & \\ \hline + \python{get_nadir} & & \\ \hline + \python{get_zenith} & & \\ \hline + \python{heal} & & \\ \hline + \python{kind} & & \\ \hline + \python{level} & & \\ \hline + \python{map_variable} & & \\ \hline + \python{natural_name} & & \\ \hline + \python{ndim} & & \\ \hline + \python{offset} & & \\ \hline + \python{print_tree} & & \\ \hline + \python{remove_channel} & & \\ \hline + \python{remove_variable} & & \\ \hline + \python{rename_channels} & & \\ \hline + \python{shape} & & \\ \hline + \python{share_nans} & & \\ \hline + \python{size} & & \\ \hline + \python{smooth} & & \\ \hline + \python{source} & & \\ \hline + \python{split} & & \\ \hline + \python{transform} & & \\ \hline + \python{units} & & \\ \hline + \python{variable_names} & & \\ \hline + \python{variables} & & \\ \hline + \python{zoom} & & \\ \hline + \end{tabular} + \caption[Attributes and methods of Data]{ + Key attributes and methods of data, lexicographically listed + } +\end{table} Each data object contains instances of \python{Channel} and \python{Variable} which represent the principle multidimensional arrays. % The following lexicographically lists the attributes of these instances. % Certain methods and attributes are unique to only one type of dataset, and are marked as such. % -\begin{ditemize} - \item method \python{argmax} - \item method \python{argmin} - \item method \python{chunkwise} - \item method \python{clip} - \item method \python{convert} - \item attribute \python{full} - \item attribute \python{fullpath} - \item attribute \python{label} (variable only) - \item method \python{log} - \item method \python{log10} - \item method \python{log2} - \item method \python{mag} - \item attribute \python{major_extent} (channel only) - \item method \python{max} - \item method \python{min} - \item attribute \python{minor_extent} (channel only) - \item attribute \python{natural_name} - \item method \python{normalize} (channel only) - \item attribute \python{null} (channel only) - \item attribute \python{parent} - \item attribute \python{points} - \item attribute \python{signed} (channel only) - \item method \python{slices} - \item method \python{symmetric_root} - \item method \python{trim} (channel only) -\end{ditemize} + Channels and variables also support direct indexing / slicing using \python{__getitem__}, as discussed more in... % TODO: where is it discussed more? - + +\begin{table} + \begin{tabular}{c | c | c | l} + & type & of & description \\ \hline + \python{argmax} & method & both & \\ \hline + \python{argmin} & & & \\ \hline + \python{chunkwise} & & & \\ \hline + \python{clip} & & & \\ \hline + \python{convert} & & & \\ \hline + \python{full} & & & \\ \hline + \python{fullpath} & & & \\ \hline + \python{label} & attribute & variable & \\ \hline + \python{log} & & & \\ \hline + \python{log10} & & & \\ \hline + \python{log2} & & & \\ \hline + \python{mag} & & & \\ \hline + \python{major_extent} & attribute & channel & \\ \hline + \python{max} & & & \\ \hline + \python{min} & & & \\ \hline + \python{minor_extent} & attribute & channel & \\ \hline + \python{natural_name} & & & \\ \hline + \python{normalize} & & channel & \\ \hline + \python{null} & & channel & \\ \hline + \python{parent} & & & \\ \hline + \python{points} & & & \\ \hline + \python{signed} & & channel & \\ \hline + \python{slices} & & & \\ \hline + \python{symmetric_root} + \python{trim} & & channel & \\ \hline + \end{tabular} + \caption[Attributes and methods of Channel and Variable.]{ + Key attributes and methods of channel and variable, lexicographically listed + } +\end{table} + Axes are ways to organize data as functional of particular variables (and combinations thereof). % The \python{Axis} class does not directly contain the respective arrays---it merely refers to the associated variables. % @@ -199,20 +240,28 @@ include \python{'w1'}, \python{'w1=wm'}, \python{'w1+w2'}, \python{'2*w1'}, \pyt Axes can be directly indexed / sliced into using \python{__getitem__}, and they support many of the ``numpy-like'' attributes. % A lexicographical list of axis attributes and methods follows. -\begin{ditemize} - \item attribute \python{full} - \item attribute \python{label} - \item attribute \python{natural_name} - \item attribute \python{ndim} - \item attribute \python{points} - \item attribute \python{shape} - \item attribute \python{size} - \item attribute \python{units_kind} - \item attribute \python{variables} - \item method \python{convert} - \item method \python{min} - \item method \python{max} -\end{ditemize} % TODO: actually lexicographical + + +\begin{table} + \begin{tabular}{c | c | l} + & type & description \\ \hline + \python{full} & & \\ \hline + \python{label} & & \\ \hline + \python{natural_name} & & \\ \hline + \python{ndim} & & \\ \hline + \python{points} & & \\ \hline + \python{shape} & & \\ \hline + \python{size} & & \\ \hline + \python{units_kind} & & \\ \hline + \python{variables} & & \\ \hline + \python{convert} & & \\ \hline + \python{min} & & \\ \hline + \python{max} & & \\ \hline + \end{tabular} + \caption[Attributes and methods of Axis.]{ + Key attributes and methods of axis, lexicographically listed + } +\end{table} \section{Creating a data object} % =============================================================== @@ -309,7 +358,7 @@ Conceptually, it behaves like a folder in a traditional file-system. % The primary attributes and methods of \python{Collection} are \begin{ditemize} - \item attribute item_names + \item attribute \python{item_names} \item attribute \python{fullpath} \end{ditemize} % TODO: finish adding attributes and methodsd -- cgit v1.2.3