Speaker : Anastasia Aidini (Postdoctoral Research)
Date : 16th of April
Location: FORTH
Paper Abstract:
Astronomical data is full of holes. While there are many reasons for this missing
data, the data can be randomly missing, caused by things like data corruptions or
unfavourable observing conditions. We test some simple data imputation methods
(Mean, Median, Minimum, Maximum and k-Nearest Neighbours ( kNN )), as well
as two more complex methods (Multivariate Imputation by using Chained Equation
( MICE) and Generative Adversarial Imputation Network ( GAIN )) against data
where increasing amounts are randomly set to missing. We then use the imputed
datasets to estimate the redshift of the galaxies, using the kNN and Random
Forest ML techniques. We find that the MICE algorithm provides the lowest Root
Mean Square Error and consequently the lowest prediction error, with the GAIN
algorithm the next best.
Download Presentation