diff options
author | Blaise Thompson <blaise@untzag.com> | 2018-04-14 10:22:54 -0500 |
---|---|---|
committer | Blaise Thompson <blaise@untzag.com> | 2018-04-14 10:22:54 -0500 |
commit | 9b1b744d5c205f8bff39cd3b1cbe80626a46015c (patch) | |
tree | c2adf80a81bbb6fa375886a9ae39d0f0c966a563 | |
parent | 92b0eaa59ac85b9fa7b4bbb61ce84949e96e286b (diff) |
2018-04-14 10:22
-rw-r--r-- | processing/chapter.tex | 97 |
1 files changed, 69 insertions, 28 deletions
diff --git a/processing/chapter.tex b/processing/chapter.tex index 997b297..f201721 100644 --- a/processing/chapter.tex +++ b/processing/chapter.tex @@ -24,9 +24,6 @@ \clearpage
CMDS takes a somewhat unique approach to instrumental science. %
-There are not that many well-defined, well-trodden experimental paths. %
-The basic ideas stay the same, but the real power is in the creativity and flexibility to tweak the
-experiment according to the particular question being asked. %
How, then could one go about making a data processing software package for CMDS? %
The package has to be flexible enough to accommodate the diversity of experiments, but still solid
enough to be a foundational tool. %
@@ -36,12 +33,13 @@ When creating a toolkit for CMDS, there are several challenges worth considering \item Dimensionality of datasets can typically be greater than two, complicating representation.
\item Shape and dimensionality change, and relevant axes can be different from the scanned
dimensions. %
- \item Data can be awkwardly large-ish (several million pixels), and can become legitimately large
- in numerical simulations. %
+ \item Data can be awkwardly large-ish (several million pixels), to legitimately large---it is not
+ always possible to store entire arrays in memory. %
\item There are no agreed-upon file formats for CMDS dataset storage. %
\end{ditemize}
The biggest challenge is to find a really good definition for what constitutes a CMDS dataset. %
Once understood, this common denominator can be enshrined into software and built upon. %
+This chapter describes WrightTools, a software package that I created to process CMDS datasets. %
WrightTools is a software package written in Python, built using the excellent tools provided by
the scientific Python collection of packages, especially Scipy \cite{SciPy} and Numpy
@@ -50,12 +48,11 @@ WrightTools defines a universal file-format that is flexible enough to encompass CMDS while still being entirely self-describing. %
This file format is based on the popular binary format ``HDF5'' \cite{FolkMike2011a}, as
interfaced by the h5py python library \cite{h5py}. %
-This format allows for computers to interact with the arrays piece-by-piece in a very fast and
+This format allows WrightTools to interact with the arrays piece-by-piece in a very fast and
reliable way, without loading the entire array in and out of memory. %
-Using object oriented programming, the main classes in WrightTools are children of h5py classes,
-allowing users to interact with legitimately large CMDS datasets without worrying about memory
+This allows users to interact with legitimately large CMDS datasets without worrying about memory
overflow. %
-WrightTools takes a unique approach to representing CMDS data in array format, nick-named
+WrightTools takes a unique approach to representing CMDS data in array format, what I call
``semi-structure'', that allows for greater flexibility in representing CMDS in different
coordinate spaces. %
@@ -112,24 +109,24 @@ functions, and classes. % Finally, subpackages are literally folders that contain several \bash{.py} files: several
modules. %
+All spectra are stored within wt5 files as multidimensional arrays. %
+Arrays are containers that store many instances of the same data type, typically numerical
+datatypes. %
+These arrays have some \python{shape}, \python{size}, and
+\python{dtype}. %
+In the context of WrightTools, they can contain floats, integers, complex numbers and NaNs. %
+
WrightTools is designed around a universal ``wt5'' file format. %
wt5 files are simply extensions of the hdf5 format, with some additional requirements applied to
their internal structure. %
This puts wt5 files in the same category as other domain-specific heirarchial data formats (see
section ...). %
-One of the most important features of the hdf5 paradigm is the ability to access portions of the
+One of the most important features of the HDF5 paradigm is the ability to access portions of the
multidimensional arrays at a time. %
WrightTools takes full advantage of this, such that the WrightTools package is simply an
\emph{interface} to the data contained with the wt5 file, and arrays are not stored in memory until
needed. %
-All spectra are stored within wt5 files as multidimensional arrays. %
-Arrays are containers that store many instances of the same data type, typically numerical
-datatypes. %
-These arrays have some \python{shape}, \python{size}, and
-\python{dtype}. %
-In the context of WrightTools, they can contain floats, integers, complex numbers and NaNs. %
-
There are two classes which are top-level components of the WrightTools package:
\python{Collection} and \python{Data}. %
@@ -144,18 +141,62 @@ collection objects---empowering users to organize their datasets into clearly st labeled hierarchies within the wt5 file. %
See section ... for more information about \python{Collection}. %
-[PARAGRAPH ABOUT ARTISTS]
-
-[PARAGRAPH ABOUT FIT]
-
-[PARAGRAPH ABOUT DATASETS]
-
-[PARAGRAPH ABOUT DIAGRAMS]
-
-[PARAGRAPH ABOUT EXCEPTIONS]
-
-[PARAGRAPH ABOUT KIT]
+The \python{artists} subpackage contains all of the tools needed to plot \python{Data} objects. %
+There are ``quick'' artist functions made primarily for use in interactive plotting, and a larger,
+more flexable set of classes and functions that can be used to construct more elaborate figures. %
+See section ... for more information. %
+The \python{fit} subpackage is an interface which endeavors to make fitting multidimensional
+\python{Data} objects as easy as possible. %
+Towards this end, the \python{fit} subpackage takes a unique approach of dimensionality reduction
+via fitting. %
+See section .. for more information. %
+
+The \python{datasets} subpackage is simply a python interface to the set of raw data that is
+distributed within WrightTools. %
+\python{datasets} is not imported by default, so ``from'' syntax must be used. %
+\python{datasets} allows users to access full filepaths to the raw data, rather than returning
+instances of \python{Data} or \python{Collection}. %
+\begin{codefragment}{python}
+>>> from WrightTools import datasets
+>>> datasets.COLORS.v0p2_d1_d2_diagonal
+'.../WrightTools/datasets/COLORS/v0.2/d1_d2 diagonal.dat'
+\end{codefragment}
+This strategy is more flexable and allows the developers of WrightTools to write tests and examples
+using datasets that are guaranteed to be on every machine. %
+
+The \python{diagrams} subpackage is a small set of tools used for drawing diagrams, with a focus on
+diagrams commonly required by CMDS practitioners. %
+Currently \python{diagrams} can draw WMELs [CITE] and delay space labels (see figure ...). %
+\python{diagrams} interfaces well with artists since they both are built on top of matplotlib, so
+it is easy for WrightTools users to draw diagrams in the same figure as other elements. %
+
+The \python{units} module handles all unit information, and conversion between values in different
+unit systems. %
+
+The \python{exceptions} module defines the unique exceptions and warnings that WrightTools
+raises. %
+All exceptions are children of the \python{WrightToolsException} class, and all warnings are
+children of the \python{WrightToolsWarning} class. %
+In this way, users of WrightTools can easily intercept all exceptions/warnings coming from
+WrightTools itself (as opposed to packages that WrightTools relies upon) when debugging their
+application. %
+
+Finally, the \python{kit} subpackage is a small menagerie of classes and functions that are useful,
+but have no other place within WrightTools. %
+Many of these are used internally throughout the rest of the program, and others are distributed to
+be used by WrightTools users. %
+As examples:
+\begin{ditemize}
+ \item The \python{TimeStamp} class represents a moment in time, and handles conversion between
+ different popular representations of time. %
+ \item The \python{INI} class is a very simple python interface to \bash{.ini} configuration
+ files. %
+ \item The \python{fft} function is a friendly user interface for N-dimensional fft operations. %
+ \item The \python{closest_pair} function finds the pair(s) of indices corresponding to the
+ closest elements in an array. %
+\end{ditemize}
+
\begin{table}
\begin{tabular}{c | c | l}
& type & description \\ \hline
|