Curating flood extent data and leveraging citizen science for benchmarking machine learning solutions