ACID_code.Acid

Accurate Continuum fItting and Deconvolution (ACID) class. This class contains the ACID method which fits the continuum of spectra and performs Least Squares Deconvolution (LSD) to obtain LSD profiles for each spectrum. It also contains many internal methods used within the main ACID method. See Dolan et al (2024) for more details on the ACID method and its applications.

Notes

Initialises the Acid class with inputted parameters. The class keeps calculations stored in the Data class and run configurations in the Config class (stored in Data for convenience). Both Data and the Result class (passed after ACID) have save and load methods which can save their state, with the Result class handling saving the Data class together, see Results and Plotting.

As of 2.0, ACID is now designed to be run on one order at a time, for running and keeping track of multiple orders, please see the DataList class for a natural implementation of running ACID on multiple orders and keeping track of which orders have been run and which haven’t, as well as storing the results for each order. The DataList class has been designed with parallelization on HPC’s in mind, allowing orders (which are independent) to be run by different jobs. See also the Multiprocessing and datalists sections.

Important note: All defaults in the signature are None, meaning if any values are input, they will override the default Config and/or Data values or any values that have already been input. The defaults within the config are written below. The config defaults can also be accessed via ACID_code.Config.defaults (returning a dictionary of defaults for both initialisation and the ACID method).

All parameters below and in the ACID method are stored in the Config instance, unless explicitly stated to be in the Data instance. The Config instance is for runtime settings and the Data instance is for storing data and any calculations.

Parameters:

velocities (Array1D, optional) – Velocity grid for LSD profiles (in km/s). For example, use: np.arange(-25, 25, 0.82) to create one. If None, a default grid from -25 to 25 km/s is used with a spacing calculated by calc_deltav after the wavelengths are provided. It is highly recommended to choose your own velocity grid, by default None, stored in the Data instance.
linelist (Array2D | str | LineList | dict`, optional) – The linelist to use for LSD. The linelist should have wavelengths in angstroms and relative depths between 0 and 1. This is a required parameter. It can be of the forms: - String: A path to a VALD linelist in string format. Support for other linelists may be added in the future or on request. - Array2D: A 2D array-like object indexed such that 0 is wavelengths and 1 is depths. - dict: A dictionary with keys “wavelengths” and “depths”, each containing array-like objects for the wavelengths and depths respectively. - LineList: The LineList class is used to expose the linelist for masking or getting/plotting the linelist. You can input an instance if you have one.
order (IntLike, optional) – If this ACID instance is intended as a run on a specific order, then you can designate this instance for that order. This will allow the resulting Data instance to track of which order the profiles correspond to. Note that orders can be indexed by the correct indexing of the spectrograph (ie. some spectrographs start at order ~20). By default 0.
order_range (Array1D, optional) – Optionally also give ACID the full order range of the spectograph for the observation. ACID only ever runs on one order at a time, but this will allows ACID and eventually the DataList to keep track of which orders have been run and which haven’t, and will be used in the future for plotting and saving results. As with order (above), the orders can be indexed to the spectrograph orders. By default [0]
verbose (bool | IntLike | str, optional) – The verbosity for printing and plotting the progress and warnings of ACID. The verbosities are natively stored as integers corresponding to: 0: No printing or plotting, all warnings are ignored. 1: Only printing warnings. 2: Printing progress and warnings. 3: Printing progress and warnings, as well as additional plots and helpful information about the run. The possible input types are described below: - Integer: Must be between 0 and 3, corresponding to the verbosities described above. - Boolean: If True, defaults to 2. If False, defaults to 0. - String: Can be one of [“none”, “low”, “medium”, “high”] or their common variants. By default 2 (medium).
sampler_progress (bool, optional) – A verbosity override for just the MCMC sampling progress. By default None which does not override, but if True/False, it will overwrite with that value, and use/don’t use a tqdm output for the sampler.
masking_lines (dict | MaskingLines, optional) – Telluric lines (in angstroms) and widths in (km/s) to mask from the wavelength regions from. Unless you’d like to change the default masking lines, we recommend just using the defaults (leaving this as None), which are based on telluric lines and strong hydrogen/metal lines in the optical and near infrared. For a guide on using your own/modifying the defaults, see Using your own line masks. By default None, stored in the Config instance.
seed (IntLike, optional) – Random seed for reproducibility, leave it on None for a random seed, by default None.
save_path (str, optional) – The path to save the data instance (containing the results) to. If None, results are not saved to disk, by default None. If a string is input, the data instance will be saved to this path as a .pkl file when the results are finished. Should be a valid file path that ends with “.pkl”. If the directory containing it does not exist, it will be created. If a file already exists at this path, it will be overwritten on Acid initialization.
sampler_path (str, optional) – The path to save the sampler HDF5 backend file to. If None, the sampler is not saved and only stored in memory. By default None. Note that if your path points to an existing file, it will be overwritten on Acid initialization. If existing, we use the emcee HDF5 backend to store and load the sampler. Should be a valid file path that ends with “.h5”. If the directory containing it does not exist, it will be created. Note that if you later try and save the sampler through the data class, it is converted to a HD5 backend.
data (Data | DataList, optional) – An optional backend Data object to use for storing data. Allows previously calculated results to be used skipped. If None, a new Data object is created. Please note that if the Data class already has a saved ACID config class, then any inputs to the Acid initialisation and ACID method will overwrite these config values. If a DataList instance is inputted, the Data instance corresponding to the inputted order is used.
config (Config, optional) – An optional Config object to use for storing the configuration. Allows you to override the config values stored in the Data object, otherwise, inputs to the initialisation here and the ACID method will overwrite these config values again (if entered). If None, an empty Config is created and stored in the Data instance.
**kwargs (dict, optional) – Unused except to catch if users use the “linelist_path” input rather than the now “linelist” input.

Raises:

BeartypeError – See Type Validations to understand input validation errors.

Notes

Fits the continuum of the given spectra and performs LSD on the continuum corrected spectra, returning an LSD profile for each spectrum given. Spectra must cover a similiar wavelength range.

Important note: All defaults in the signature are None, meaning if any values are input, they will override the default Config and/or Data values or any values that have already been input. The defaults within the config are written below. The config defaults can also be accessed by: ACID_code.Config.defaults (returning a dictionary of defaults for both initialisation and the ACID method).

All parameters below are stored in the Config instance, unless explicitly stated to be in the Data instance. The Config instance is for runtime settings and the Data instance is for storing data and any calculations.

Parameters:

wavelengths (Array1D | Array2D, optional) – An array of wavelengths for each frame (in Angstroms). For multiple frames this should be a 2D array such that wavelengths[i] corresponds to the wavelengths for the ith frame. Can only be None if a data instance was provided in initialisation. If a 2D array is provided, they are treated as multiple frames (not orders), by default None, stored in the Data instance.
flux (Array1D | Array2D, optional) – An array of spectral frames (in flux). For multiple frames this should be a 2D array such that flux[i] corresponds to the spectral fluxes for the ith frame. Can only be None if a data instance was provided in initialisation. If a 2D array is provided, they are treated as multiple frames (not orders), by default None, stored in the Data instance.
errors (Array1D | Array2D, optional) – Errors for each frame (in flux). For multiple frames this should be a 2D array such that errors[i] corresponds to the spectral errors for the ith frame. If a 2D array is provided, they are treated as multiple frames (not orders). If no errors are provided, but the SN is provided, the errors will be estimated from the flux and SN, but we highly recommend providing errors if possible, by default None, stored in the Data instance.
sn (Scalar | IntLike | Array1D, optional) – Average signal-to-noise ratio for each frame (used to calculate minimum line depth to consider from line list). Each frame should have only one S/N value, so for multiple frames this should be a 1D array such that sn[i] corresponds to the S/N for the ith frame. If you prefer to use a per-pixel SN value, ACID will use the :py:function:`utils.collapse_SNR` function to calculate a single S/N value for each frame from the central 2/3rds of the input spectra. In which case, a 2D array can be If None, the S/N will be estimated from the input spectra and errors, by default None, stored in the Data instance.
deterministic_profile (bool, optional) – If True, fits both the continuum and the LSD profile simultaneously. If False, only fits the continuum in mcmc, the profile is inferred from the continuum fit. This is a new feature that has been set to the default as it significantly decrease convergence time and computation time per step, while fully maintaining accuracy. Setting this to False will match legacy behaviour, by default True.
poly_ord (IntLike, optional) – Order of polynomial to fit as the continuum, by default 3
continuum_percentile (IntLike, optional) – The percentile to use when fitting the continuum, by default 90. For example, if 90, the continuum fit will be performed on the points in the spectra that are above the 90th percentile in flux in each spectral bin (determined by bin_size below).
bin_size (IntLike, optional) – The size of bins to use when performing the continuum fit. The spectra are split into bins with this number of pixels, and the continuum is fit to the median wavelength and the specified percentile of flux in each bin. By default 100 pixels.
pix_chunk (IntLike, optional) – Size of ‘bad’ regions in pixels. ‘bad’ areas are identified by the residuals between an inital model and the data. If the residuals deviate by a specified percentage (see dev_perc below) for this number (pix_chunk) of pixels, then this chunk of pixels are masked in the spectra. By default 20
dev_perc (IntLike, optional) – Allowed deviation percentage. ‘bad’ areas are identified by the residuals between an inital model and the data. If a residual deviates by this percentage for a specified number of pixels, then this chunk of pixels are masked in the spectra. By default 25
n_sig (IntLike, optional) – Number of sigma to keep in sigma clipping. Ill fitting lines are identified by sigma-clipping the residuals between an inital model and the data. Regions that lie outside the median +- n_sig STDEVs are clipped. The clipped regions will be masked in the spectra. This masking is only applied to find the continuum fit and is removed when LSD is applied to obtain the final profiles, by default 3
skips (IntLike, optional) – An option to only run acid on one in every n pixels, where n is the integer argument. This is only useful for testing to get a quicker result especially for larger wavelength ranges or datasets, by default 1 (no skipping)
od (bool, optional) – If True, runs ACID in optical depth, otherwise, the LSD methods and ACID fitting is performed in flux. By default None which defaults to True. Note that the whole point of ACID is to run LSD in OD, we highly recommend leaving this unless you specifically want to compare.
sampler_type (str, optional) – If you really try to wish to use the dynesty nested sampler, you can set this to “dynesty”. It is almost entirely unsupported by the rest of the code other than to just get a finished result object, and much slower. We highly recommend using None or “emcee” (default). The only reason I added this was to get the Bayesian evidence for model comparison. If “dynesty” is chosen, the dynesty package needs to be installed, and the nsteps parameter is treated as “nlive” to be passed to the NestedSampler.
parallel (bool, optional) – If True uses multiprocessing to calculate the profiles for each frame in parallel, see https://acid-code.readthedocs.io/en/stable/using_ACID.html#multiprocessing for more details. By default True
cores (IntLike, optional) – Number of cores to use if parallel=True. If None, all available cores will be used, by default None
nwalkers (IntLike, optional) – A manual override for the number of walkers for the MCMC sampler. By default, uses the emcee recommendation which is 3 times the number of dimensions. For the deterministic model, this is just the poly_ord + 1, for the non-deterministic model, it is poly_ord + 1 + nvelocity points.
nsteps (IntLike, optional) – Number of steps for the MCMC to run, by default 10000, the initial steps are stored in the config as nsteps, but the true count of steps taken is stored in the Data instance as Data.nsteps, which can be higher than this if continue_sampling is used to continue sampling after the initial run.
max_steps (IntLike, optional) – If set, the sampler will run until max_steps or convergence is reached by estimation using the emcee autocorrelation time (tau). The sampler will check for convergence every ‘check_interval’ steps, and will require a minimum number of checks (‘min_checks’) and a minimum tau factor (‘min_tau_factor’) before it can stop. The stopping criterion is met when the change in tau is less than ‘tau_tol’ for all parameters. By default None, which means no maximum. If a value is inputted, the nsteps parameter is ignored. The continue_sampling method in Result or Acid can still be used normally to continue sampling after either stopping criterion is reached.
check_interval (IntLike, optional) – Interval (in steps) at which to check for MCMC convergence if max_steps is set, by default 1000. Only used if max_steps is set.
min_checks (IntLike, optional) – Minimum number of checks before MCMC can be stopped, by default 1. Only used if max_steps is set.
min_tau_factor (IntLike, optional) – Minimum tau factor for MCMC stopping criterion, by default 50, which is the emcee recommendation, it’s not recommend to set a value below 50 unless you want to force convergence for the deterministic_profile=False option. Only used if max_steps is set.
tau_tol (float, optional) – Tolerance for tau convergence in MCMC stopping criterion, by default 0.1. Only used if max_steps is set.
moves (list[tuple], optional) –
A list of tuples specifying the moves for the MCMC sampler. The format tries to follow the emcee documentation as closely as possible. However, the config cannot store classes directly, so move names are used instead and converted when building the sampler.

Each tuple should have the form:
```
(move_name: str, fraction: float, move_kwargs: dict | None)
```
where:
- ”move_name” is the name of the emcee move. Supported variants currently include “RedBlueMove”, “StretchMove”, “WalkMove”, “KDEMove”, “DEMove”, “DESnookerMove”, “MHMove”, and “GaussianMove”. Refer to the emcee documentation for more details on each move type. Input move names are checked against the “emcee.moves” module, so other moves from that module will work.
- ”fraction” is the fraction of walkers to which this move should be applied.
- ”move_kwargs” is an optional dictionary of keyword arguments passed to the move class initialisation.
run_mcmc (bool, optional) – If True, runs the MCMC to fit the model, by default True. Can be set to False to perform all of the preparation for MCMC without actually running it. The ACID function will still update the class and data attributes. If True, the method returns a Result object, and if False, the method returns None, but attributes are updated.
**kwargs (dict, optional) – Unused except to catch accidental inputs of initialisation arguments into the ACID method and warn if so.

Returns:

A Result object containing the LSD profiles and associated data. See the Result class for available methods and attributes.

If “run_mcmc” is False, “None” is returned, but the class attributes are still updated.

Return type:

Result | None

Raises:

BeartypeError – See Type Validations to understand input validation errors.
ValueError – If other input arguments do not conform to the expected formats and requirements.

ACID_HARPS(*args, **kwargs)[source]: This method is no longer supported in ACID. Please use the ACID function with the appropriate inputs for HARPS spectra instead. Future versions of ACID will provide functions to load and configure data from a range of different standard instruments. If you still really wish to use ACID_HARPS, the last stable version of ACID with the method is 1.4.5. Try: pip install ACID_code_v2==1.4.5

Combines the multiple inputted spectral frames into one spectrum, or just passes through the single frame if only one was input. The frames are interpolated onto a common wavelength grid of the spectrum with the highest S/N, and then a weighted average is used based on the errors. The S/N of the combined spectrum is also calculated based on the input S/N and the weights.

Parameters:

frame_wavelengths (Array1D | Array2D, optional) – Wavelengths for the spectral frames, by default None
frame_flux (Array1D | Array2D, optional) – Fluxes for the spectral frames, by default None
frame_errors (Array1D | Array2D, optional) – Errors for the spectral frames, by default None
frame_sns (Array1D | Array2D, optional) – Signal-to-noise ratio for the spectral frames, by default None
output (bool, optional) – Whether to output the combined spectrum, by default True

Returns:

tuple | None, if output is True, containing –

combined_wavelengthsnp.ndarray
Wavelengths for the combined spectrum

combined_spectrumnp.ndarray
Fluxes for the combined spectrum

combined_errorsnp.ndarray
Errors for the combined spectrum

combined_snfloat
Signal-to-noise ratio for the combined spectrum
None, if output is False, but the combined spectrum is still saved in the data class attributes.

continuumfit(wavelengths: ACID_code.Array1D, fluxes: ACID_code.Array1D, errors: ACID_code.Array1D, poly_ord: ACID_code.IntLike = 3, plot_result: bool = False, plot_type: str = 'initial') → tuple[source]

Provides an initial, normalised continuum fit using inputted spectra.

Parameters:

wavelengths (np.ndarray) – The wavelengths corresponding to the spectrum.
fluxes (np.ndarray) – The flux values of the spectrum.
errors (np.ndarray) – The error values associated with the spectrum.
poly_ord (int) – The order of the polynomial to fit to the continuum. By default 3.
plot_result (bool, optional) – Whether to plot the continuum fit result, by default False.
plot_type (str, optional) – The type of plot to generate, either “initial” or “masked”, by default “initial”

Returns:

Polynomial coefficients: The coefficients of the fitted polynomial, ordered from highest degree to lowest.
Normalized flux: The flux values normalized by the fitted continuum.
Normalized errors: The error values normalized by the fitted continuum.

Return type:

tuple containing

residual_mask() → None[source]: Masks regions of the spectrum based on residuals from an initial model fit. A purely class method not to be used elsewhere. This function is really only supposed to be used in the class, so no inputs are accepted. It is only used once in ACID and could be put directly in the method, but this allows a clearer checkpoint which segments saving the result of the mask for analysis.

get_initial_state() → ndarray[source]

run_mcmc(nsteps: ACID_code.IntLike, state=None) → None[source]: Runs MCMC for a specified number of steps. A purely class method that I do not recommend you use directly. Use Acid.ACID(run_mcmc=True) to run MCMC for the first pass if not already done, which will skip already performed calculations. Otherwise, use Acid.continue_sampling or Result.continue_sampling if you have already run MCMC and want to continue.

run_mcmc_until_converged(max_steps: ACID_code.IntLike, state=None) → None[source]: Runs MCMC until convergence is reached. A purely class method that I do not recommend you use directly. Use Acid.ACID(run_mcmc=True) to run MCMC for the first pass if not already done, which will skip already performed calculations. Otherwise, use Acid.continue_sampling or Result.continue_sampling if you have already run MCMC and want to continue.

continue_sampling(nsteps: IntLike | None = None, max_steps: IntLike | None = None, max_steps_kwargs: dict | None = None, parallel: bool = None, cores: int = None, moves: dict = None, return_sampler: bool = False) → EnsembleSampler | None[source]

Continue MCMC sampling for additional steps. This should be called in Result class by the user. This necessarily requires a Data instance to have been put into the ACID init.

Parameters:

nsteps (IntLike, optional) – Number of additional steps to run the MCMC for.
max_steps (IntLike, optional) – Maximum number of steps to run the MCMC for in total (including previous steps). If specified, the MCMC will stop if this number of steps is reached even if convergence has not been reached, by default None. If input, nsteps is ignored.
max_steps_kwargs (dict, optional) – Additional keyword arguments to be passed to the run_mcmc_until_converged function if max_steps is specified, by default None. The kwargs description can be found in Acid.ACID(), they are the 4 kwargs appearing after max_steps. Typos for kwargs are silently ignored.
parallel (bool, optional) – Overwrites config with whether to run the MCMC in parallel. If None, uses already existing configuration. Default is None.
cores (int, optional) – Overwrites config with the number of cores to use for parallel MCMC. If None, uses already existing configuration. Default is None.
moves (dict, optional) – Overwrites config with the dictionary specifying the moves to use for MCMC sampling. If None, uses already existing configuration. Default is None. See :py:function:`Acid.ACID` for format.
return_sampler (bool, optional) – Whether to return the sampler after continuing sampling. Default is True.

Returns:

The MCMC sampler after running for the additional steps, or None if return_sampler is False.

Return type:

emcee.EnsembleSampler | None

property result: Result

Return a Result object for this instance or one passed explicitly.

Returns:: The Result object for the given Acid instance.
Return type:: Result

property sampler: Returns the sampler stored in the Data class.