2.6.2. Data Assimilation
This page gives a summary of the functions available in the data assimilation library. Differentiation of persistence diagrams is exploited to optimize data driven model coefficients by minimizing topological differences between model the model forecast and measurements. More information on the details of the TADA algorithm can be found in, “Topological Approach for Data Assimilation.” We plan to implement more data assimilation tools here in the future.
Warning
TADA requires tensorflow for optimization features. Please install teaspoon using the command: pip install “teaspoon[full]” to install the necessary packages.
- teaspoon.DAF.data_assimilation.TADA(u_obs, window_size, model_parameters, n_epochs=1, train_len=4000, opt_params=[1e-05, 0.99], window_number=1, j2_sample_size=50)[source]
Compute optimal model weights using data assimilation and persistent homology. Heavily modified code from https://github.com/GUDHI/TDA-tutorial.git
- Parameters:
u_obs (array) – Array of observations (D x N) D is the dimension, N is the current number of time points (INCLUDING TRAINING DATA)
window_size (int) – Number of points included in the sliding window
model_parameters (list) – List of parameters used to generate a forecast. Must contain current model weights as a list tensorflow variables (W_A), original model weights (W0), list of model specific internal parameters (mu), forecast model function (G) and number of past points to use for predictions (p).
n_epochs (int) – Number of optimization epochs for each assimilation window
train_len (int) – Number of points used for training the original model
opt_params (list) – List of parameters used for gradient descent optimization. Must contain a learning rate and decay rate. Decay only occurs between assimilation windows.
window_number (int) – Current window number. Used to determine how many points to forecast into the future for the current window.
j2_sample_size (int) – Number of points to sample for J_2 loss function. This is a random sample of the training data.
- Returns:
Updated model weights using TADA algorithm.
- Return type:
(array)
- teaspoon.DAF.data_assimilation.get_initial_pds(X)[source]
Function to compute the initial persistence diagrams of the measurements and model forecast using the Vietoris Rips complex.
- Parameters:
X (array) – Point cloud array for computing persistence.
- Returns:
0D and 1D persistence pairs.
- Return type:
(2 arrays)
- teaspoon.DAF.data_assimilation.rafda(u_obs, Dr, Gamma, M=1000, g=1000, w=0.005, b=4.0, beta=4e-05, seed=None)[source]
Function for generating a RAFDA model based on the method presented in https://doi.org/10.1016/j.physd.2021.132911.
- Parameters:
u_obs (array) – Array of observations (D x N) D is the dimension, N is the number of training points.
Dr (int) – Reservoir dimension
Gamma (array) – Observational covariance matrix.
M (int) – Ensemble size.
g (float) – Initial random weight ensemble distribution parameter.
w (float) – Random feature weight matrix distribution width parameter.
b (float) – Random feature bias vector distribution parameter.
beta (float) – Ridge regression regularization parameter.
seed (int) – Random seed (optional)
- Returns:
Optimal RAFDA model weights. W_in (array): Random weight matrix. b_in (array): Random bias vector.
- Return type:
W_RAFDA (array)
Random Feature Map Example:
import numpy as np
from teaspoon.MakeData.DynSysLib.autonomous_dissipative_flows import lorenz
from teaspoon.DAF.data_assimilation import TADA
from teaspoon.DAF.forecasting import forecast_time
from teaspoon.DAF.forecasting import random_feature_map_model
from teaspoon.DAF.forecasting import get_forecast
from teaspoon.DAF.forecasting import G_rfm
import tensorflow as tf
# Set random seed
r_seed = 48824
np.random.seed(r_seed)
# Set TADA parameters
snr = 100.0
lr = 1e-6
train_len = 4000
forecast_len = 2000
window_size = 50
max_window_number = 100
# Get training and validation data at random initial condition
ics = np.random.uniform(0, 1, size=(3,))
t, ts = lorenz(L=500, fs=50, SampleSize=6001, parameters=[28,10.0,8.0/3.0], InitialConditions=ics)
ts = np.array(ts)
# Get signal and noise amplitudes using signal-to-noise ratio
a_sig = np.sqrt(np.mean(np.square(ts),axis=1))
a_noise = a_sig * 10**(-snr/20)
# Add noise to the signal
Gamma = np.diag(a_noise)
noise = np.random.normal(size=np.shape(ts[:,0:train_len+forecast_len]))
u_obs = ts[:,0:train_len+forecast_len] + Gamma@noise
# Train model
W_LR, W_in, b_in = random_feature_map_model(u_obs[:,0:train_len],Dr=300, seed=r_seed)
# Set optimization parameters
d_rate = 0.99
opt_params = [lr, d_rate]
W_opt = [tf.Variable(W_LR, trainable=True, dtype=tf.float64)]
mu = (W_in, b_in)
p=1
# TADA optimization loop
for window_number in range(1,max_window_number):
model_parameters = [W_opt, W_LR, mu, G_rfm, p]
W_opt = TADA(u_obs, window_size, model_parameters, train_len=train_len, n_epochs=1, opt_params=opt_params, window_number=window_number, j2_sample_size=100)
# Set forecast parameters
window_number = 1000
start = train_len
end = train_len + window_number + 1
# Forecast TADA and LR models and get measurements
X_model_tada = get_forecast(u_obs[:,train_len].reshape(-1,1), W_opt, mu,forecast_len=end-train_len, G=G_rfm)
X_model_lr =get_forecast(u_obs[:,train_len].reshape(-1,1), W_LR, mu,forecast_len=end-train_len, G=G_rfm)
X_meas = u_obs[:,start:end]
# Compute and print forecast times
tada_time = forecast_time(X_model_tada, X_meas, dt=0.02, lambda_max=0.91, threshold=0.05)
lr_time = forecast_time(X_model_lr, X_meas, dt=0.02, lambda_max=0.91, threshold=0.05)
print(f"TADA Forecast Time: {tada_time}")
print(f"LR Forecast Time: {lr_time}")
LSTM Example:
import numpy as np
from teaspoon.MakeData.DynSysLib.autonomous_dissipative_flows import lorenz
from teaspoon.DAF.forecasting import lstm_model
from teaspoon.DAF.forecasting import get_forecast
from teaspoon.DAF.forecasting import G_lstm
from sklearn.preprocessing import StandardScaler
from teaspoon.DAF.data_assimilation import TADA
from teaspoon.DAF.forecasting import forecast_time
import tensorflow as tf
r_seed = 48824
np.random.seed(r_seed)
# Set TADA parameters
snr = 60.0
lr = 1e-5
train_len = 4000
forecast_len = 2000
window_size = 50
max_window_number = 10
lr_time = 0
tada_time = 0
# Get training and tesing data at random initial condition
ICs = list(np.random.normal(size=(3,1)).reshape(-1,))
t, ts = lorenz(L=500, fs=50, SampleSize=6001, parameters=[28,10.0,8.0/3.0],InitialConditions=ICs)
ts = np.array(ts)
# Add noise to signals
noise = np.random.normal(scale=0.01, size=np.shape(ts[:,0:train_len+forecast_len]))
u_obs = ts[:,0:train_len+forecast_len] + noise
# Scale u_obs using StandardScaler
scaler = StandardScaler()
u_obs = scaler.fit_transform(u_obs.T).T
# Train model
lstm_units = 500
p=5
model = lstm_model(u_obs[:,0:train_len], p=p, epochs=5, units=lstm_units, batch_size=50)
# Set optimization parameters
d_rate = 0.99
opt_params = [lr, d_rate]
mu = (lstm_units, p, model)
W_lstm = model.get_weights()
W_opt = model.trainable_weights
# Set forecast parameters
window_number = 200
start = train_len
end = train_len + window_number
# Get initial forecast
X_model_initial = get_forecast(u_obs[:,train_len-p+1:train_len+1], W_lstm, mu=(lstm_units, p, model),forecast_len=end-train_len, G=G_lstm)
# TADA optimization loop
for window_number in range(1,max_window_number):
mu = (lstm_units, p, model)
model_parameters = [W_opt, W_lstm, mu, G_lstm, p]
W_opt = TADA(u_obs, window_size, model_parameters, train_len=train_len, n_epochs=1, opt_params=opt_params, window_number=window_number, j2_sample_size=100)
# Forecast original TADA model and get measurements
X_model_tada = get_forecast(u_obs[:,train_len-p+1:train_len+1], W_opt, mu=(lstm_units, p, model),forecast_len=end-train_len, G=G_lstm)
X_meas = u_obs[:,start:end]
# Invert the scale on X_model and X_meas
X_model_tada = scaler.inverse_transform(X_model_tada.T).T
X_model_initial = scaler.inverse_transform(X_model_initial.T).T
X_meas = scaler.inverse_transform(X_meas.T).T
# Compute and print forecast times
tada_time = forecast_time(X_model_tada, X_meas, dt=0.02, lambda_max=0.91, threshold=0.05)
initial_time = forecast_time(X_model_initial, X_meas, dt=0.02, lambda_max=0.91, threshold=0.05)
print(f"TADA Forecast Time: {tada_time}")
print(f"Initial Forecast Time: {initial_time}")
Note
Resulting forecast times may vary depending on the operating system.