ACID_code.DataList

A class that stores Data instances in a list indexed by order. The DataList is a useful class for running ACID over multiple orders with parallelization. Fundamentally this class holds Data instances (which ACID updates with the results per order) as a list and can map the true order number from the instrument (stored in the config) to the index of the list. It handles missing/incomplete orders, and the ability to append new orders. For more information and a full example on how to use the DataList, see :ref:`datalist’ in the documentation. Note that the DataList is not strictly necessary to run ACID over multiple orders, you can handle the multiple instances yourself.

The DataList class works with a required root directory specified by the user to access to the same data across parallel processes, and also to save intermediate results and figures per order.

Initializes the DataList object. The DataList can be initialized in two ways: either by providing the wavelengths, flux, errors, and sn arrays directly in the class initialization (here), or using the :py:classmethod:`DataList.from_datalist` method with a list of Data objects. The former is useful for quickly initializing a DataList from raw data, while the latter is useful for loading a saved DataList or for more fine-grained control over the initialization of each Data object.

Parameters:

wavelengths (Array3D | Array2D | None, optional) – A 2D or 3D array of wavelengths for the input spectra. If a 2D array is provided, it is assumed to have shape (n_orders, n_pixels). If a 3D array is provided, it is assumed to have shape (n_orders, n_frames, n_pixels). Default is None. The format for the last 1 or 2 dimensions follows that of the “wavelengths” input in the :py:function:`Acid.ACID` method. Sometimes, fits files store their frames in shape (n_frames, n_orders, n_pixels), you can swap the axes with np.swapaxes(wavelengths, 0, 1) to get them in the correct shape. It is also possible to input orders with different numbers of pixels, in which case the wavelengths should be a list of 2D arrays/lists.
flux (Array3D | Array2D | None, optional) – A 2D or 3D array of fluxes for the input spectra. Same shape assumptions as wavelengths. Default is None.
errors (Array3D | Array2D | None, optional) – A 2D or 3D array of errors for the input spectra. Same shape assumptions as wavelengths. Default is None.
sn (Array2D | Array1D | None, optional) – A 1D or 2D array of signal-to-noise ratios for the input spectra. If a 1D array is provided, it is assumed to have shape (n_orders,). If a 2D array is provided, it is assumed to have shape (n_orders, n_frames). Default is None. Follows the same logic as the “sn” input in the :py:function:`Acid.ACID` method, for approximating the errors (or vice versa) if one is not provided.
velocities (Array1D | None, optional) – The velocity grid to be used for all the orders. This should be a 1D array of velocity values in km/s. Follows the same format as the “velocities” input in the :py:function:`Acid.ACID` method. Default is None.
linelist (Array1D | str | LineList | dict | None, optional) – The linelist to be used for all the orders. This can be provided in the same formats as the “linelist” input in the :py:function:`Acid.ACID` method.
order_range (Array1D | None, optional) – A 1D array of order labels corresponding to the orders in the input data. The index of this array should match to the order of the index of the first dimension of the wavelengths, flux, errors, and sn arrays. For example, if your input data has 3 orders and they correspond to orders 100, 101, and 102 in the instrument, then you should input order_range = [100, 101, 102]. If not provided, it is assumed to be a pythonic 0-indexed range of the same length as the number of orders in the input data. Default is None.
config (Config | list[Config] | None, optional) – A template Config object for all orders or a list of Config objects per order containing the configuration for the ACID run. If inputting a list, the index and length of the list must match the first dimension of the input data arrays and the order_range. These take higher priority than any config_kwargs passed in the initialization. Setting ‘order’ will not have any effect as they will be overwritten by the order numbers in the order_range. If not provided, default Config values will be used. Default is None.
save_dir (str | None, optional) – The default directory to save results and figures for each order. By default the DataList will save data.pkl and sampler.h5 to the directory (named by the order number) to in this directory. If the Configs or kwargs passed contain their own save_path or sampler_path (see Acid), those instead are used. If None, no saving will be done, this is however, not recommended. Default is None.
overwrite (bool, optional) – Whether to overwrite existing with new Data instances when using run_ACID, or to load and use existing Data instance if they exist. If True, if a Data instance already exists for an order, it will be overwritten with the new Data instance generated from the ACID run for that order. Note, that the saving of this new Data instance only applies when run_ACID is run, otherwise it is just held in memory. If False, if a Data instance already exists for an order, it will be loaded and used instead of generating a new Data instance from the ACID run for that order. Default is False.
verbose (int | bool | str | None, optional) – The verbosity level for printing information during the initialization. Follows the same format as the “verbose” input in the Config class. Default is None.
_load (Any, optional) – Not yet implemented, do not use. The idea is that you can input a Load object which has its own tools to pull s2d data from common instruments such as ESPRESSO, HARPS, etc. If you want to use this feature, please open an issue or contribute a pull request with the implementation.
_data_list (list[:py:class:`Data] | None`, optional) – This is an internal argument used for initializing the DataList from a list of Data objects in the :py:classmethod:`DataList.from_datalist` method.
**config_kwargs – Additional keyword arguments to be passed with low priority to all of the generated Config objects. These kwargs will join with but NOT overwrite any existing keys in the input Config object(s). Setting ‘order’ will not have any effect as they will be overwritten by the order numbers in the order_range. Inputting kwargs not part of the defaults in the Config class will cause an error. If not provided, default Config values will be used.

Load a DataList from a list of Data objects. This is useful for loading a saved DataList or for more fine-grained control over the initialization of each Data object. All Data objects should be already properly initialised with linelists, velocities, configs and inputs, and the DataList will check for consistency across the list (e.g. all orders should have the same velocity grid, etc.).

Parameters:

data_list (list[:py:class:`Data] | Data`) – A list of Data objects to initialize the DataList from. If a single Data object is provided, it will be converted to a list with one element.
save_dir (str | None, optional) – The directory to save intermediate results and figures for each order. If None, no saving will be done. Default is None.
verbose (int | bool | str | None, optional) – The verbosity level for printing information during the initialization. Follows the same format as the “verbose” input in the Config class. Default is None.

Returns:

A DataList object initialized from the provided list of Data objects.

Return type:

DataList

append(data: Data, overwrite: bool = False, extend: bool = False, force_order: IntLike | None = None) → None[source]

Appends a Data instance to the data list. Note that the order range of the class is kept, if you want to set a new order range, use the set_order_range() method first to change it.

Parameters:

data (Data) – The Data instance to append to the data list. The order of the Data instance is taken from its config, but can be overridden with the force_order argument.
overwrite (bool, optional) – If True, will overwrite an existing Data instance with the same order number. Default is False.
extend (bool, optional) – If True, will extend the order range to include the new order if it is not already present. Default is False.
force_order (int, optional) – If provided, will set the order of the Data instance to this value, overriding its config. Default is None.

set_order_range(order_range: ACID_code.Array1D) → None[source]

Sets the order range for the DataList. The new range must be a superset of the already saved orders in the list, otherwise a ValueError is raised.

Parameters:: order_range (Array1D) – The new order range to set for the DataList. This should be a 1D array of order numbers.

sort_by_order() → None[source]: Sorts the data list by order number, and updates the o2i mapping accordingly. Internally called whenever self.data_list is updated.

Runs ACID on the Data instances in the data list for the specified orders. The results are saved in the save_dir if it is not None, with one pickle file per order containing the Result object. The idea is that you can run ACID on any orders you choose

Parameters:

orders (Array1D | int | str | None, optional) – The orders to run ACID on. This can be provided as a single integer for one order, a list of integers for multiple specific orders, the string “all” to run on all orders, or None to run on all orders. Default is None, which will run all orders.
use_index_mapping (bool, optional) – If False, will not use the order to index mapping, instead orders are indexed directly. Default is True. Only applies for int or array inputs for orders.
worker (IntLike | None, optional) – Used in conjunction with nworkers. If an integer is provided, it specifies the worker number for this process. When both worker and nworkers are provided, all the orders specified in “orders” will be split evenly across the nworkers. For example, if there are 100 orders, and nworkers is 4, then worker 0 will run orders 0-24, worker 1 will run orders 25-49, etc. The workers are 0-indexed. Default is None, which means no splitting and all specified orders will be run in this process.
nworkers (IntLike | None, optional) – The total number of workers to use to split the orders. See the “worker” parameter for more details. Default is None.
store_sampler (bool, optional) – If True, the sampler object from the ACID run will be stored in the same folder as the resulting data pickles. This will take up more disk space, but allow for use of the Result methods requiring the sampler attribute. We recommend leaving this on True if using deterministic sampling, otherwise set to False. Default is True.
size_limit (Scalar | None, optional) – A hard size limit to the sampler in GB. If the sampler exceeds this size, it will not be stored regardless of the store_sampler flag. This is to avoid accidentally storing very large samplers. If None, no limit is set. Default is 1GB. A warning will be printed if this size_limit forces the store_sampler to be False if store_sampler was set to True.
overwrite (bool, optional) – If True, will allow overwriting existing data and sampler pickles in the save_dir. Default is None, which will use the class default behaviour set in initialization (which is False). If False, this will skip running ACID on orders that already have result pickles in the save_dir.
overwrite_kwargs (bool, optional) – If True, any keys in the kwargs that are also in the config for the Data instance will be overwritten by the kwargs values. Use with caution, by default False.
**kwargs – Additional keyword arguments to be passed to the ACID method for each order. These will not overwrite any existing keys unless overwrite_kwargs is set to True, in which case they will overwrite existing keys in the config for the Data instance for that order. The kwargs passed also allow you to add/overwrite the linelist and velocities in the Data instance with the same overwrite logic.

save(save_dir: str | None = None) → None[source]

Pack all of the DataList instances into a single DataList pickle, and save the state of the datalist to this pickle. Otherwise, the data instances can always be reloaded separately from the inidividual resulting pickle files in the directory for their order. Or just wherever the Config has them stored. The pickle file contains a dictionary with the list of Data objects (converted to dictionaries) and the save_dir. The filename is always “datalist.pkl”, as save_dir must be a directory. If save_dir is not provided, self.save_dir is used. If that is also None, a ValueError is raised. All the orders should be in the memory to run this function, you can ensure they are all loaded with the load() method and pointing to the directory with all the data pickles.

Parameters:: save_dir (str | None, optional) – The directory to save the DataList pickle file. If None, self.save_dir is used. Default is None.

classmethod load(path: str, verbose: int | str | bool | None = None) → DataList[source]

Loads a DataList from a pickle file. The pickle file should contain a dictionary with the list of Data objects (converted to dictionaries) and the save_dir. Will attempt to load from datalist.pkl in the provided path if it is a directory, otherwise will attempt to load from the provided path directly. If neither of those work, it will attempt to load from result pickles in a results directory within the provided path.

Parameters:

path (str) – The directory containing the datalist.pkl file, or the datalist.pkl itself. Note that the directories containing the results should also be in here.
verbose (int | str | bool | None, optional) – The verbosity level to use when loading the DataList. If None, the verbosity level from the pickle file is used. Default is None. The verbosity only affects this function and will not overwrite the verbosity level of the DataList once it is loaded.

Returns:

The loaded DataList object.

Return type:

DataList

property save_dir

property combined_profile: tuple | None

Returns the combined profile and its errors. If the combined profile has not been calculated yet, it will attempt to combine the profiles without any exclusions.

Returns:: The combined profile and its errors, or None if not available.
Return type:: tuple[Array1D, Array1D]|None

property results

Returns a list of Result objects for each Data instance in the DataList. If a Data instance does not have a result, None is returned for that order. This property is useful for not reaccessing ther result each time a plot is made.

Returns:: A list of Result objects or None for each order in the DataList.
Return type:: list[Result|None]

property data_list

combine_profiles(exclude: int | Array1D | None = None, must_have_converged: bool = False) → None[source]

Calculates the combined profile and its errors across all orders, excluding any orders specified in the exclude argument.

Parameters:

exclude (int | list[int] | None) – Orders to exclude from the combined profile calculation.
must_have_converged (bool) – If True, only includes orders that have converged in the combined profile calculation. Default is False, which includes all orders regardless of convergence status.

plot_combined_profile(return_fig: bool = False) → None | tuple[Figure, Axes][source]

Plots the combined profile across all orders

Parameters:: return_fig (bool) – If True, returns the figure and axis objects instead of displaying the plot.
Returns:: The figure and axis objects if return_fig is True, otherwise None.
Return type:: tuple[plt.Figure, plt.Axes] | None

fit_profile(**kwargs) → None | tuple[Figure, Axes][source]

Fits the combined profile across all orders.

Parameters:: **kwargs (dict) – Keyword arguments to pass to the :py:function:`Profiles.plot_fit` method.
Returns:: The figure and axis objects from the profile fit plot if return_fig is True, otherwise None.
Return type:: tuple[plt.Figure, plt.Axes] | None

plot_all_profiles(return_fig: bool = False) → None | tuple[Figure, Axes][source]

Plots all the profiles for each order in the DataList.

Parameters:: return_fig (bool) – If True, returns the figure and axis objects instead of displaying the plot.
Returns:: The figure and axis objects if return_fig is True, otherwise None.
Return type:: tuple[plt.Figure, plt.Axes] | None

plot_mean_profile_errors(return_fig: bool = False) → None | tuple[Figure, Axes][source]

Plots the errors of all the profiles for each order in the DataList.

Parameters:: return_fig (bool) – If True, returns the figure and axis objects instead of displaying the plot.
Returns:: The figure and axis objects if return_fig is True, otherwise None.
Return type:: tuple[plt.Figure, plt.Axes] | None

plot_chi2(return_fig: bool = False) → None | tuple[Figure, Axes][source]

Plots the chi-squared values against order in the DataList. This helps diagnose which orders have a bad fit and may need to be excluded from the combined profile.

Parameters:: return_fig (bool) – If True, returns the figure and axis objects instead of displaying the plot.
Returns:: The figure and axis objects if return_fig is True, otherwise None.
Return type:: tuple[plt.Figure, plt.Axes] | None