Tutorial DEVINE the precomputed method 1
There are two ways to use DEVINE to downscale wind fields on entire maps. The first one is the “patch method”.
It is used in the article Emulating the Adaptation of Wind Fields to Complex Terrain with Deep Learning
https://journals.ametsoc.org/view/journals/aies/2/1/AIES-D-22-0034.1.xml
The patch method looks like this:
This method produces “chess board like patterns”, aka squared patches in predictions that are detrimental for snow applications.
This is why I developed an alternative method called the “pre computed method”, described in my PhD manuscript and implemented here.
From my PhD:
We came up with a different approach to DEVINE called the ”precomputed method” because it brings many advantages from DEVINE’s features. Firslty, we note that operations in DEVINE’s CNN only depend on the NWP direction, since the NWP speed is only used after calling the CNN, in the scaling step. Secondly, operations in a fully convolutional neural network are independent from the size of the input. In other word, it is possible to vary the size of the input topography in DEVINE’s CNN. Thus, the alternative implementation of DEVINE uses the following approach: (i) a topographic patch larger than the domain size is used as input to the CNN and (ii) the CNN precomputes each acceleration factors and deflection angles for each angular direction (0 to 360 with a step of 1). This is done by rotating the input with a rotation angle γ for γ ∈ [1 ... 360] and applying DEVINE’s CNN for each rotated input. This operation yields acceleration ratios and angular deflections for each incoming large scale direction on the whole input domain, that are saved on memory.
In this tutorial in two stages, we will see the implementation of the precomputed method.
In the first stage, we will see the implementation of the text below.
We came up with a different approach to DEVINE called the ”precomputed method” because it brings many advantages from DEVINE’s features. Firslty, we note that operations in DEVINE’s CNN only depend on the NWP direction, since the NWP speed is only used after calling the CNN, in the scaling step. Secondly, oper- ations in a fully convolutional neural network are independent from the size of the input. In other word, it is possible to vary the size of the input topography in DEVINE’s CNN. Thus, the alternative implementation of DEVINE uses the following approach: (i) a topographic patch larger than the domain size is used as input to the CNN and (ii) the CNN precomputes each acceleration factors and deflection angles for each angular direction (0 to 360 with a step of 1). This is done by rotating the input with a rotation angle γ for γ ∈ [1 ... 360] and ap- plying DEVINE’s CNN for each rotated input. This operation yields acceleration ratios and angular deflections for each incoming large scale direction on the whole input domain, that are saved on memory.
In the second stage, we will see:
Independently from this operation, NWP large scale horizontal wind components are linearly interpolated onto the target high resolution topographic map. Finally, for each large scale interpolated direction, a lookup function looks at the precomputed acceleration factor and angular deflection in the pre-saved maps obtained with the CNN. Then, scaling is performed as in the ”patch method” and downscaled winds are obtained.
Both tutorials permit to downscale wind maps from AROME (or any other NWP). As an example, we will downscale wind fields on the Grandes Rousses domain (domain test from Ange Haddjeri and Matthieu Baron PhDs)
# Librairies for DEVINE
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
First, you need to download and use my librairy “bias_correction”
So far, I do not use pip install (as I should), but prefer the sys.path.append method.
The library is available on github: https://github.com/louisletoumelin/bias_correction
Use the command “git clone https://github.com/louisletoumelin/bias_correction” in your terminal
# Path to module
import sys
sys.path.append("/home/letoumelinl/bias_correction/src/")
sys.path.append("//home/mrmn/letoumelinl/bias_correction/src/")
# Standard librairies
from typing import Union
import gzip
import gc
import os
# Import from my module
from bias_correction.config.config_custom_devine import config
from bias_correction.train.model import CustomModel
from bias_correction.train.dataloader import CustomDataHandler
from bias_correction.utils_bc.context_manager import timer_context
from bias_correction.train.experience_manager import ExperienceManager
from bias_correction.train.wind_utils import wind2comp, comp2speed, comp2dir
Configuration
The configuration file is initiated in bias_correction.config.config_custom_devine file. However, few other important parameters are specified in the cell below.
# We ensure that arctan scaling is not used in the first step of the precomputed method
# see the publication for arctan scaling
config["use_scaling"] = False
# The outputs of DEVINE will be a map for speed and a map for angular deviations ("alpha")
config["type_of_output"] = "map_speed_alpha"
# Batch size control how many maps are sent to the GPU at once for prediction
batch_size = 1
# The name of the file containing acceleration and deviations saved on disk
# You can choose the name you want.
result_name = "tutorial_pre_computed_method"
# Are you working on the labia network?
labia = False
# See next section
if labia:
large_dem_path= "/scratch/mrmn/letoumelinl/bias_correction/Data/1_Raw/DEM/domaine_grande_rousse_large.nc"
small_dem_path = "/scratch/mrmn/letoumelinl/bias_correction/Data/1_Raw/DEM/domaine_grande_rousse_small.nc"
else:
large_dem_path = "/home/letoumelinl/bias_correction/Data/1_Raw/DEM/domaine_grande_rousse_large.nc"
small_dem_path = "/home/letoumelinl/bias_correction/Data/1_Raw/DEM/domaine_grande_rousse_small.nc"
# If you want to plot intermediate results
_plot = True
# Where to save CNN predictions
if labia:
save_path = f"/scratch/mrmn/letoumelinl/bias_correction/Data/1_Raw/DEM/{result_name}.nc"
else:
save_path = f"{result_name}.nc"
Predict with DEVINE
This task consists of predicting acceleration and deviation factors with DEVINE on a given domain. For large domain (e.g. the Grandes Rousses domain), I recommend to use a GPU.
Load the large dem
In order to obtain downscaled wind fields on a given domain, we will need:
- The target domain “small_dem”
This domain can be characterized by any grid spacing larger than 30m. It should be given by a digital elevation model. For example, the Grandes Rousses domain at 250m is an example of target domain. A DEM on the same domain with 30m grid spacing also works. In other words, this is the DEM “given by your clients”, where they expect downscaled winds.
This will be referred to as small_dem.
- The working domain “large_dem”
The working domain is used to perform computations. It comes from a DEM at 30m horizontal grid spacing (important). If you don’t have any, 25m or close to 30m will work, but it’s not the perfect setup. This DEM is used to make downscaling computations.
This DEM should be larger than the smaller DEM. My calculations indicate that twice the size of the original DEM is enough. I recommend to use three times the dimensions, because when cropping data, we always have issues and keeping a margin is easy and works well.
We refer to this DEM as the large_dem.
# Load the DEM. In my case, the large_dem is saved as xarray DataArray.
# It can be also saved in other formats: .tiff, xarray Datasets ... etc
large_dem = xr.open_dataarray(large_dem_path)
# Note 1: Try except is used here because depending on how you save the dem into .nc file,
# elevation values can have different names. This operation will be removed and replaced by a cleaner dataloader in the future.
# Note 2: np.expand_dims permits to add another dimension, which is required in Tensorflow ("channels" in the last dimension)
try:
dem_array = np.expand_dims(large_dem.isel(band=0).values, axis=-1)
except AttributeError:
dem_array = np.expand_dims(large_dem.isel(band=0).band_data.values, axis=-1)
# The small dem is the target grid.
small_dem = xr.open_dataarray(small_dem_path)
Illustration
print("np.shape(dem_array): ", np.shape(dem_array))
print("np.shape(small_dem): ", np.shape(small_dem))
plt.figure()
plt.subplot(121)
plt.imshow(dem_array)
plt.title("dem_array (large_dem) 30m")
plt.subplot(122)
plt.imshow(small_dem)
plt.title("small_dem = target dem")
Create input data
We will build the input data so it fit tensorflow data pipelines (important to obtained good computing performances).
For my paper “A two-folds deep learning strategy to correct and downscale winds over mountains”, I built a pipeline that takes topography and wind at given observation stations and downscales it around the station.
This is different from the input data necessary for predicting on a large domain.
Usually, topographies are stored in a dictionary and wind conditions in a pandas DataFrame. Here we will build input with the same philosophy.
For the pre computed method, we downscale wind fields on the domain for each direction. The input speed is fixed at 3 m/s (as in the training step), and the scaling (transfer the 3 m/s to the real speed given by AROME or any other NWP) will be performed later.
# Create topo_dict
dict_topo_custom = {"custom": {"data": dem_array}}
# Create time_series
custom_time_series = pd.DataFrame()
# Speed = [3, 3, 3, ...] and direction = [0, 1, 2, ...]
custom_time_series["Wind"] = [3]*360
custom_time_series["Wind_DIR"] = list(range(360))
# Order of 'Wind' and 'Wind_DIR' in dataset is important
assert custom_time_series.columns[0] == "Wind"
assert custom_time_series.columns[1] == "Wind_DIR"
Illustration
custom_time_series
# Ignore this technical step.
# If you really want to know why: my code uses an ExperienceManager that is responsible for tracking my experiments, storing the results, saving the weights...etc. Here we don't need it but we need to let the code know that we don't need it
exp = None
Instantiate a dataloader
The dataloader is responsible for loading data (not in this tutorial), storing it and sending it to the GPU.
The dataloader understands builds upon tensorflow dataloaders (but can do many more tasks that are specific to my studies).
data_loader = CustomDataHandler(config)
Instantiate a custom model
cm = CustomModel(exp, config)
Compute matrix shapes
We are sure that given any rotation, we will find finite values in an array with min_shape, centered on the initial array.
This is the shape used to make predictions with the CNN. This is also the shape of the maps in the CNN outputs.
Rotations of the predictions will then produce nans again: these nans might be present in the outputs but will be dealt with in the second tutorial.
np.intp(np.min(np.squeeze(dem_array).shape) / np.sqrt(2))
min_shape = np.intp(np.min(np.squeeze(dem_array).shape) / np.sqrt(2)) + 1
output_shape = list(tuple([2]) + tuple([len(custom_time_series)]) + tuple([min_shape, min_shape]))
print(f"dem_array.shape: {dem_array.shape}")
print("\n")
print(f"min shape: {min_shape}")
print("\n")
print(f"output_shape: [wind component, wind direction, x, y]")
print(f"output_shape: {output_shape}")
# list is important because tuples are immutable and we require mutation in the computations
config["custom_input_shape"] = list(dem_array.shape)
Preprocess data
Just call the prepare_custom_devine_data method to preprocess data, it knows what to do.
data_loader.prepare_custom_devine_data(custom_time_series, dict_topo_custom)
Inputs are tensorflow datasets. This is a custom format to fit with the deep learning models. Use the method get_tf_zipped_inputs to obtain these inputs.
Bacth indicates the batch_size and defines how many inputs are sent to the GPU. The larger, the quicker the computations (this is particularly true on GPUs, less impacting on CPUs). However, larger batches increase memory consumption. On my personal laptop, I use a batch size of one.
inputs = data_loader.get_tf_zipped_inputs(mode="custom", output_shapes=config["custom_input_shape"]).batch(batch_size)
Predict
# Predict
with timer_context("Predict test set"):
results = cm.predict_multiple_batches(inputs, batch_size=batch_size, output_shape=output_shape, force_build=True)
print(results.shape)
Predictions are launched. Here we see that we need 20 seconds for one batch (“20s 20s/step”). That means it takes approximately 2h to downscale (on a CPU) the Grandes Rousses domain with DEVINE’s CNN (more precisely, to compute the acceleration factors and angular deviations).
If you use a GPU, it will be much faster (few minutes on the GPU from the LabIA).
Now a few more steps are required before saving the file
Postprocessing steps (mandatory)
Downscaled winds are in the “results” variable. The first dimension of the matrix indicates the variable (speed or direction). 0 is the dimension for speed predictions and 1 for angular deviation predictions.
Normalize
Speed predictions are valid for an input speed of 3 m/s. We need to divide the results by three in order to obtain acceleration factors.
with timer_context("results[0, :, :, :] = results[0, :, :, :]/3"):
results[0, :, :, :] = results[0, :, :, :] / 3
Downscast
I suggest to downscast results to 16 bits in order to save space on disk and on RAM.
Accelerations are between 0 and 10. Angular deviations are between -180° and +180°. Decimals are not crucial. Hence 16 bits is enough to save this type of result.
with timer_context("Change dtype"):
results[0, :, :, :] = np.float16(results[0, :, :, :])
results[1, :, :, :] = np.float16(results[1, :, :, :])
Save results
encoding is used to reduce the size of the file on disk. Experimentally, zlib=True, complevel=3 give good compression results.
Data are saved in netcdf format
with timer_context("Save results"):
ds = xr.Dataset(
data_vars={"acceleration": (("angle", "y", "x"), results[0, :, :, :]),
"alpha": (("angle", "y", "x"), results[1, :, :, :])})
comp = dict(zlib=True, complevel=3)
encoding = {"acceleration": {"zlib": True, "complevel": 3},
"alpha": {"zlib": True, "complevel": 3}}
ds.to_netcdf(save_path, encoding=encoding)