Image Reduction Workflow

The package’s image reduction application contains the libraries and pipeline workflow to run aperture photometry on imaging dataset.

The application includes two core pipeline workflows, both in the `infrastructure/` subdirectory:

  • `aperture_pipeline.py`: Performs aperture photometry for a single

    dataset, consisting of multiple images obtained with the same class of imaging instrument and the same filter.

  • `reduction_manager.py`: Orchestrates the reduction of multiple

    imaging datasets in parallel.

Reducing a Single Dataset

Configuring a Single Dataset for Reduction

All images for a single dataset’s reduction are collected into a single directory, referred to as `red_dir`. In addition to the FITS image files, the directory also needs to contain a copy of `microlensing_photometry/configuration/example_reduction_configuration.yaml`, named reduction_config.yaml. Note that it is important that this file has this name. This file needs to be edited to provide the parameters specific to that dataset.

target:
  name: 'TEST'
  RA: '17:30:25.5'      # Sexigesimal string
  Dec: '-25:30:30.5'    # Sexigesimal string
photometry:
  aperture_arcsec: 2.0
tom:
  upload: True
  config_file: /path/to/config.yaml
  data_label: 'TEST'

The `aperture_arcsec` parameter determines the radius of the aperture that will be used in the photometry.

The parameters in the `tom` dictionary control whether the timeseries photometry for the target object will be uploaded to a TOM system once the pipeline has completed its reduction. This function can be switched on or off via the `upload` parameter. The `config_file` parameter gives the full path to the user’s `tom_config.yaml` file. This should be a local copy of `/microlensing_photometry/image_reduction/configuration/example_tom_config.yaml`, updated to contain the URL of the TOM system and the user’s account details.

Running the Reduction

With the configuration file in place, the dataset can be reduced by passing the full path to the configuration file to the `aperture_pipeline.py` workflow.

Note that the prefect server has to be running before starting the download workflow (see Prefect Workflows for details).

venv> cd microlensing_photometry/
venv> poetry run python image_reduction/infrastructure/aperture_pipeline.py <path_to_dataset_dir>

The process of the pipeline as it runs can be monitored from the Prefect dashboard.

The pipeline also writes detailed logging output for each stage to the `red_dir` in a file called `aperture_pipeline.log`.

Dataset Locks

It should be noted that, to avoid accidently starting multiple reductions of the same dataset, the `aperture_pipeline.py` workflow automatically locks an active `red_dir` until it has finished processing.

Reducing Multiple Datasets

For larger reduction tasks, `reduction_manager.py` provides a convenient way to parallelize the processing of multiple datasets.

This process assumes that each dataset has a custom reduction configuration file already present in the `red_dir`, as this will be created in normal operations by the `data_download.py` pipeline. If not, each dataset should be configured as for a single dataset reduction.

Configuring Multiple Datasets for Reduction

The template reduction manager pipeline configuration should be copied to the user’s local configuration directory:

cp ./microlensing_photometry/image_reduction/configuration/example_reduction_manager_configuration.yaml <root_path>/<data_reduction_dir>/<config_dir>/reduction_manager_config.yaml

The parameters can then be configured as follows:

log_dir: '/path/to/logging/directory/'
data_reduction_dir: '/path/to/top-level/reduction/directory/'
software_dir: '/path/to/software/installation/directory/'
instrument_list: ['sinistro', 'qhy']
dataset_selection:
  group: 'file'
  file: '/path/to/datasets/file/reduce_datasets.txt'
  start_date: 'None'
  end_date: 'None'
  ndays: 0
max_parallel: 5

The directory path parameters should be the full path strings to the data directories as described in Data Directory Structure). The software directory path should point to the top-level of the package’s own installation i.e. `<path>/microlensing_photometry/`.

Entries in the `instrument_list` parameter will be used to identify the subdirectories of each target’s directory that should be searched for data to process. That is, if the list includes `sinistro`, the pipeline will scan all target directories with `target/sinistro/` subdirectories for datasets to process.

The pipeline allows the user to restrict the number of parallel reductions that can be triggered at any one time using the `max_parallel` parameter.

The reduction manager can be configured to process different groups of data using the `data_selection` dictionary, and the following options are supported.

The `reduction_manager.py` respects dataset locks, and will not duplicate a reduction under any configuration, so locked datasets will not be processed.

  • group: ‘all’

    The pipeline will scan all available targets for unlocked datasets to process. The other `data_selection` configuration parameters will be ignored.

  • group: ‘date’

    The pipeline will scan all available targets for unlocked datasets containing data obtained between the configured `start_date` and `end_date`. The format of both date strings should be ‘%Y-%m-%d’.

  • group: ‘recent’

    The pipeline will scan all available targets for unlocked datasets with data obtained within `ndays` of the current date, where `ndays` can be an integer or float. This is used to set appropriate `start_date` and `end_date` parameters, overriding any given in the configuration.

  • group: ‘file’

    The pipeline will load a file containing the paths of specific datasets to be processed. Note that it will still exclude any locked datasets. The `file` parameter should give the full path to this file, which should be in ASCII text, and contain a list of relative paths with respect to the `data_reduction_dir`, e.g.

OGLE-2024-BLG-0034/sinistro/gp
OGLE-2024-BLG-0034/sinistro/ip

Running Multiple Reductions

The workflow can then be used to trigger parallelized pipeline runs by passing the full path to the configuration file to `reduction_manager.py`. As before, the prefect server has to be running before starting the download workflow (see Prefect Workflows for details).

venv> cd microlensing_photometry/
venv> poetry run python image_reduction/infrastructure/reduction_manager.py <path_to_reduction_manager_config.yaml>

The process of the pipeline as it runs can be monitored from the Prefect dashboard.

The workflow also writes detailed logging output for each stage to the pipeline’s `log_dir` in a file called `arcon_pipeline.log`.