HMC by Hand: Gradients, Heartbeats, PDEs and Other Tales
Creators & Contributors
When we exercise, our hearts run faster, which is associated with a decrease in the time between heartbeats. This time between heartbeats is also known by R-R intervals. If you have ever seen an electrocardiogram (EKG) curve before, each time your heart pumps you'll see a small spike in your EKG; well, those are your R-R intervals (RRi for short). So, you'll imagine that this RRi decreases during exercise (i.e., your heart run faster) but when you are resting your RRi increases (i.e., this is your heart slowing down). This is, in very simple terms, the overall dynamics of your RRi: How ever there's more to it than just saying that your heart run faster or slower. The curve describing its fluctuations can be described using the Castillo-Aguilar et al. (2025) RRi-vs-Time model, that describe the exercise-induced RRi dynamics. Here's an example RRi curve estimated using the aforementioned model: As you can see, the model parameters (e.g., decay rates, time constants) controlling the observed curve are numerous. So, as you can imagine, trying to estimate each parameter value to reproduce the observed curve, considering that the inter-individual cardiovascular response is highly heterogeneous, is for challenging (for say the least). In this context, many methods exist to estimate model parameters for a given target function. However, one method above all the others allow us to quantify the uncertainty associated with each of these curve knobs. Yes, Bayesian models are what we are discussing here, but not any type of Bayesian algorithm. We are talking about Hamiltonian Monte Carlo (HMC). Let's remember that HMC uses the analogy of a particle gliding through the parameter space, visiting more frequently the areas associated with greater probability, depicting an accurate representation of the true probability distribution of model parameters. In a previous post we've discussed how to implement this very same model (i.e., the RRi-vs-time model) using HMC in R. However, we did not delved into the methodology to estimate the gradient that will guide our particle over this parameter space. In HMC, the position The probability of a set of parameters is given by the likelihood function, which determines the amount of probability associated with a specific position in this space. So in order to simulate the movement of this particle through this parameter space, we need to estimate the slope that will move this particle using Hamiltonian mechanics principles. Hamiltonian mechanics provides a physics-inspired framework to explore the parameter space. Let's briefly recall its mathematical foundation Hamilton's equations describe a system using position ( This is denoted in the classical form of the Hamiltonian: And the set of differential equations that govern the overall dynamics of the system. Here, we say that the rate of change in the position ( This means that we can use the momentum parameter However, we use the gradient information (i.e., the slope of the parameter space) of the current position Nonetheless, the challenge arrives when we need to estimate the gradient of the surface we are located, which is the gradient of the likelihood function, which usually involves at least a vector of partial derivatives of every single parameter of our target function, updating the momentum that then will update the position, where we will again estimate the gradient to update the momentum and so on. To reflect this principle, the notation also can be imagined as: Many softwares and statistical packages compute the gradient of the target likelihood function numerically or even symbolically. However, manually deriving gradients builds intuition for why HMC outperforms gradient-free MCMC methods (e.g., Metropolis-Hastings) in complex models, avoiding computational bottlenecks like random-walk behavior. Even though you might think this is a monumental work (which in fact, it is), it's also a very educational journey that will give us full intuition of what is actually happening when we call Stan or your favorite Bayesian sample software (that uses HMC behind scenes). So buckle up! And let's go for a very math and nerdy ride! Warning: the following sections are very technical and dense in terms of notation, which might be hard to follow. I'll do my best to explain each step of the way the rationale and the manual derivations steps so you can follow along. Let's assume that the observed RRi signal ( However, we assume that this true signal So let However, we need to estimate the parameters of the data generation process ( Remember that we assumed that the data generation process came from a Gaussian distribution. Thus, lets consider that the Gaussian probability function has this form: So, the likelihood for a set of data points would have the following form: But for computational efficiency and stability, we work with logarithms (given the very small probabilities we are computing). Hence, the logarithm of our likelihood would look like the following: where we can simplify the first term by the quotient rule and the second one by canceling the euler (given that which can be further simplified by solving Now we can solve the summation symbolically for the first term and leave it only on the variable term, leaving with our final log likelihood gaussian function: And like this, we reach the final form for our likelihood function To estimate the gradient of our log likelihood function, we need to derive it with respect to each parameter term. Let's start by solving for Here, we see that the first term does not contain Here, we can remove out of the denominator the term And now we apply the chain rule, where we derive the exponent and then the Remember that So by passing the 2 out the summation, we can cancel the 2 in the outside denominator by Now to solve for Here, let's start with the first term, and remember that the And now, we can discard the first term given it doesn't contain our target parameter And by applying the power rule, we can pull out of the log term the square term out of the log, leaving the term Additionally and for convenience, let's pull the And finally, let's derive this bad boy. For the first term, the derivative of any log is the inverse of the term, so our Which can be expressed as: And this conclude our gradient estimation section for our log likelihood function. These functions can be further used to estimate the gradient and update the parameters when using gradient-based methods to explore the parameter space, like Hamiltonian Monte Carlo methods. Up next, we'll calculate analytically the partial derivatives of the Castillo-Aguilar RRi-vs-time model. Let's recall our original target function: So the gradient of this function But in order to produce this beauty, we need to derive each term separately. And you know the saying: every journey begins with a single step, let's go! Deriving the alpha ( Lucky for us, this is the easiest of the seven model parameters, given that it is a constant, it is not being transformed in any way. For this reason its derivative is just 1 (because the derivative of any constant is 1). Deriving the beta ( The beta parameter is present on both logistic functions. However it is on the top position, so we can derive it while mantaining the rest of the parameters constant and dropping the ones where the term Deriving the The derivation of Deriving the lambda ( Here is where it gets more difficult. Given that Let's recall the part of the function that contains Here, we can assume that which can be further rearranged as: Deriving the phi ( Here, we apply the same logic to the one we used earlier for which can be further rearranged into: Deriving the tau ( For So the derivative form (applying our previous derivation rule) would be the following: Note that the negative sign in the pulled rate parameter is propagated from the negative sign in the Deriving the delta ( For the There we have it! Now the final gradient of our function (by taking the whole equations) would be something like this: And this monstruosity is the whole gradient of our RRi-vs-time model, which as you can see, is a vector of partial derivatives that describe the change in the function value with respect a change in the parameter being derived (i.e., a measure of how sensitive is). Now we have all the necessary tools to compute the gradient of the log likelihood function, which remember, involves not only the parameters of To honor Hamilton's memory, we'll refer to the gradient of the log likelihood as However, its important to considerate that in Hamiltonian Monte Carlo (HMC), the potential energy is defined as Let's recall our derived (log) likelihood function with respect to And with respect to So our gradient function would look like something like this: But recall that This is the cherry and cream on the top of gradient-based estimation methods, because with this gradient function we can now compute the slope of this probability landscape and ride it without problems. Now the next challenge is transforming the hand-derived equations into a valid R code that will actually be useful to estimate anything. Okay, after almost igniting the latex out of my keyboard, now we'll have to pull these fancy equations into an efficient set of functions in R that will be able to actually do something. This "something" will be estimating the gradient of the surface where this particle is in space (using We'll start simple. First we will implement the Castillo-Aguilar Equation (the RRi-vs-Time model) in order to accept an input vector as time and a named list of parameter values. This will serve our purpose when we need to compute the predicted RRi values, to then compute the residuals for the log likelihood and bla bla bla (i.e., what we already have been discussing so far). Let's recall what is the function we'll implement in R code: Where Now let's ensure ourselves that this code is actually working. We'll simulate some rest-exercise-rest dynamics using our function and then visualize it using So as you can see, we can model not only a time-varying RRi, but also, we can simulate the variations in the standard deviation of RRi over time. See how the dispersion changes from rest, exercise and recovery around the true curve and the observed RRi curve. Cool! now we have to implement the log likelihood function, which is the function that will give us the probability (in log scale) of the current parameter values on this parameter space, which will inform us if we are going into a place of greater probability (or not). Let's recall the equation we arrived earlier: Let's go implementing this bad boy as a function in R: Now let's see if we can visualize something using this function: So as you can see, we can now explore the probability of our parameter space (at least, in two dimensions). However, in reality, our parameter space has 8 parameters (7 from the RRi-vs-time model and the Some algorithms that are designed to find optimal parameter values, work by maximizing the likelihood of seeing the data for a given set of the parameters (e.g., maximum likelihood estimation, MLE for short). In those cases, some methods might even take advantage of the gradient information similar to HMC methods, like gradient-descent for instance. However, almost all methods require you to able to compute the likelihood of observing the data given a set of parameter values (which is what we just plotted). Up next, we need to specify (in R code) the vector of partial derivatives we estimated earlier. Let's remember that this vector of derivatives (of the RRi-vs-time model) will be important to compute the gradient of the log likelihood, which is necessary to update the momentum vector to then update our parameter vector, by simulating our imaginary particle moving throught this parameter space. Let's recall our vector of partial derivatives for the RRi-vs-time model: Yeah… Its a picture hard to erase from memory. Even though this look like ITS A LOT, it truly isn't. Let's focus on the implementation of these set of equations in R: These set of equation only tell us how sensitive is each parameter with regard to the observed error in the predicted RRi. Let's plot this for further understanding. In this plot, we can observed the effect of modifying in one unit each parameter on the observed RRi dynamics. This crucial piece of information is obtained thanks to the magic of partial derivatives. It allow us to describe how a system, composed of multiple parameters controlling the behavior of the observed dynamics, change relative to each parameter at a time. In reality, this set of equations describe a richer picture that we are intending to show here (but that's an adventure for another blog post). Finally, we now need to implement the gradient of the log likelihood function However, most of the work is already done, so we basically need to assemble the functions that take as input our observed data (RRi and time vectors), the named list of parameter values and Let's see how this implementation looks like: It's important to not forget that the gradient of the potential energy Now that we have our gradient function I have already described the leapfrog integrator in previous blog posts, however, for the sake of not interrumpting the flow of the narrative, let me remind you what was about. In the leapfrog integrator, we start with a random momentum, commonly drawn from a standard normal distribution (with However, its important to note that we need to draw a random standard normal vector of the same length of the position parameter vector Once we start with a random momentum, we need to updated it right away based on the gradient of our parameter space. To do this, let's recall the form of the rate of change of the momentum paramter This means that in order to update Then, with the updated momentum, we update the position vector And then, we end up the algorithm by half-updating the momentum again, but this time with the new position: And this is the whole algorithm to simulate the movement of the particle through this parameter space. As you see, the complicated stuff is not the algorithm itself, its the process to obtaining the equations to compute such gradients in the first place. We can compact this algorithm into this tiny R function: Now let's try simulating this bad boy! And that's it! Now we have to check how well our leapfrog algorithm performed. Here we can leverage the famous Phase-Space plots, which illustrates the Position-vs-Momentum evolution of the system on the parameter space: With this plot, we can already see which parameters behave accordingly to what we should expect (i.e., cyclic behavior like a pendulum) and which doesn't. At first glance we can see that the The intellectual passage from the abstract realm of mathematical derivation to the concrete execution of Hamiltonian Monte Carlo (HMC) code for the Castillo-Aguilar RRi-vs-Time model exemplifies a profound synergy between theoretical exactitude, computational dexterity, and domain-specific biological understanding. This discussion delves into the broader ramifications of this synthesis, acknowledging its inherent limitations while distilling the critical lessons learned for both Bayesian inference methodologies and the intricate modeling of cardiovascular dynamics. Furthermore, it contemplates avenues for enhancing model veracity, computational performance, and the clarity of interpretation. Perhaps the most significant realization emerging from this exercise is the deep epistemological value derived from the manual derivation of the model's gradients. While contemporary probabilistic programming frameworks conveniently automate gradient computation through automatic differentiation (AD), the rigorous intellectual effort involved in manually applying the chain rule to a sophisticated formulation like the Castillo-Aguilar equation cultivates an irreplaceable, nuanced intuition. This meticulous process compels the modeler to grapple directly with potential analytical obstacles, such as singularities arising from specific mathematical terms or challenges in parameter identifiability, which automated systems might otherwise mask or silently fail upon. Moreover, dissecting individual gradient components, such as the partial derivative of the model with respect to exercise onset time ( The remarkable efficiency of HMC, particularly its advantage over gradient-free Markov Chain Monte Carlo techniques like Metropolis-Hastings, is fundamentally predicated on its intelligent exploitation of gradient information to navigate the complex topology of the posterior distribution. Our implementation vividly illustrates this advantage, showcasing HMC's capacity to circumvent entrapment in local probability maxima—a common failing of random-walk algorithms in high-dimensional spaces. By employing momentum to traverse the parameter landscape, HMC effectively explores correlated parameter spaces, such as the relationship between exercise decay rate ( Despite the inherent power of HMC, its application invariably surfaces significant challenges, demanding careful consideration. The performance of the sampler is acutely sensitive to the choice of hyperparameters: the step size ( This endeavor highlights the persistent, often necessary, tension in Bayesian modeling between achieving high mechanistic fidelity—capturing the nuances of the underlying biological processes—and maintaining computational tractability. The Castillo-Aguilar model, rooted in physiological assumptions about exercise and recovery, offers interpretability but risks misrepresenting the system if the true dynamics deviate significantly from its sigmoidal assumptions. Conversely, purely phenomenological approaches, such as Gaussian processes, provide greater flexibility but often sacrifice direct physiological interpretation. An optimal path may lie in hybrid models that strategically combine mechanistic components for well-understood dynamics with non-parametric elements to capture residual variation or unmodeled effects. Similarly, the simplifying assumption of constant, Gaussian noise ( Beyond the technicalities of Bayesian inference, this work resonates with broader challenges and opportunities within the field of computational physiology. The inherent inter-individual heterogeneity in cardiovascular responses is not merely statistical noise but a reflection of genuine biological diversity. Extending models like ours into hierarchical frameworks offers a powerful means to parse this variability, simultaneously estimating population-level trends while tailoring parameters like decay rates ( The trajectory for future work is rich with potential, branching into algorithmic enhancement, model elaboration, and empirical validation. Integrating adaptive samplers like NUTS could automate hyperparameter tuning, while leveraging GPU acceleration or exploring approximate Bayesian methods like Variational Inference could drastically reduce computation times for large-scale applications. Model extensions could incorporate hierarchical structures for population studies, allow key parameters like ( In conclusion, the implementation and application of HMC to the Castillo-Aguilar model represent a compelling fusion of art and science. It demands mathematical acuity for gradient derivation, computational proficiency for robust implementation, and deep domain knowledge for meaningful interpretation. The ultimate value, however, transcends these individual components, residing in the synergistic insight generated when mathematical formalism illuminates biological complexity. This journey underscores several core tenets of sophisticated modeling: the indispensable intuition fostered by rigorous derivation, the inescapable necessity of navigating trade-offs between model complexity, inferential accuracy, and computational resources, and the paramount importance of interpreting results within their specific physiological context. The seemingly "monstrous" gradient vector, therefore, is far more than a collection of partial derivatives; it is a high-dimensional map guiding exploration at the confluence of mathematics and biology, promising deeper understanding for those willing to navigate its intricate terrain, one calculated step at a time.Introduction
flowchart TD
a((Rest)) --->| High RRi | b((Exercise))
b --->| RRi decreases | c((Recovery))
c -->| RRi increases | a
A Matter of Guessing
represents the model parameters, while momentum
is an auxiliary variable that enables efficient exploration of the posterior distribution, avoiding the random-walk behavior of traditional MCMC.
-dimensional space corresponds to a unique combination of the
model parameters we aim to estimate.
Hamiltonian Drama
) and momentum (
) parameters, which are used to compute the potential (
) and kinetic energy (
) of a physical system. One of the key features of Hamilton mechanics is that the total energy of the system (
) is conserved over time. This means that a change in potential energy
is compensated by an inversely proportional change in kinetic energy
.
) of our imaginary particle the system is equal to the derivative of the system with respect to the momentum (
):
to update the position of our particle
inside our system over time.
to update the future momentum (
), updating momentum in the direction of the negative gradient (
), which is analogous to a ball rolling downhill, going faster down the steepest the surface.
All by Hand
Data Generation Process
) comes from a Gaussian distribution (as the data generation process), centered around the true signal
with variance
. This can be denoted as:
comes from our model
.
, with
, the parameters that control the output of our true signal
. Where:
), in order to be able to reproduce the observed data.
Arriving to the Log Likelihood
) term:
and applying the power rule to the square root inside the log using
which comes out of the log term:
Estimating the Gradient of the Function
With Respect to
:
, so we can drop it from our calculations, leaving this expression:
like this:
term. So it would look like this:
its a function with multiple parameters (
parameters), so "deriving
" actually means taking the partial derivative of the function with respect to each parameter
, also preserving the sign inside the parenthesis.
and passing the negative sign in the partial derivative we can cancel the negative sign at the beginning of the function. So the whole expression would end up like this:
With Respect to
let's remember where were we with the derived log likelihood:
is multiplying the log term. We need to have this present given that we want to pull the
out of the log term, but in order to do so, we need to also distribute the
term with it, leaving it this way:
, leaving us with:
canceling out the 2 like this:
out of the denominator and let's call
for ease of notation. Additionally, let's express the
with negative exponent for clarity when we derive:
. And in our second term, we cancel out the 2 by pulling out the -2 from the exponent, leaving
as
. This whole show will look like this:
Estimation of RRi-vs-Time Gradient
would be a column vector of partial derivatives with the following form:
Deriving the RRi-vs-Time Equations
) parameter
) parameter
is not used like
, like this:
parameter
is relatively trivial given that it only appears on the numerator of the recovery logistic function. So the final derivative form would look like this:
) parameter
appears on the exponent we need to apply the chain rule plus some additional calculus. However, we can use a shortcut here. For any function in the form of
it's derivative ends up being
.
:
correspond to the
we previusly mentioned. With this being said, the derivative of
with respect to
would be
. Consequently, applying the aforementioned transformations to our original equation, the function would have the following form:
) parameter
. So the functional form would be:
) parameter
things gets complicated. This is because this parameter is present in both logistic components (exercise-induced decay and recovery components). So we need to apply our same trick, but instead of pulling the time-specific segment of the exponent, we need to pull the rate component in the exponent. Let's recall the components of the function that contains the
parameters:
parameter. Rearrenging the signs and the numerators, the final form would be:
) parameter
parameter, the final product is similar for
. This, in the sense that the final form is similar in the recovery function, given that its relation with the rate parameter
is conserved with respect to
. That is why, we obtain the same form for that side of the equation:
Expansion of the Gradient of
, but also the
parameter. The gradient of this function will update the momentum
of the imaginary particle, that then will update the position coordinates of this particle, which itself correspond to the values we're trying to estimate.
, given that the "position" parameter
under the Hamiltonian framework are the parameter values (which work as coordinates in the parameter space), and the gradient of the log likelihood (
) will give us the updated momentum
to then, update our parameter coordinates
.
. Here, we focus on the gradient of the log-likelihood
for simplicity (assuming uniform priors). For non-uniform priors,
.
(
):
(
):
is the partial derivative of the model
with respecto to each parameter value
, so
can be expanded with respect to each parameter value using the elements of the gradient of our RRi-vs-time model
. By incorporating these changes into
the result is the following:
as a shorthand for
(as indicated at the beginning of the math section). So plase consider that when I refer to
is equivalent to
Translating Math Into Code
) and updating its position
by leveraging momentum
to carry us away, sliding over higher probability landscapes.
will be the time vector and
the list of parameters. Let's see how this turns out!
Code
# ==============================================
# 1. RRi Model (Castillo-Aguilar Equation)
# ==============================================
rr_model <- function(t, params) {
alpha <- params$alpha
beta <- params$beta
c <- params$c
lambda <- params$lambda
phi <- params$phi
tau <- params$tau
delta <- params$delta
term2 <- beta / (1 + exp(lambda * (t - tau)))
term3 <- (-c * beta) / (1 + exp(phi * (t - tau - delta)))
alpha + term2 + term3
}ggplot2 package:Code
# Time vector
t <- seq(0.01, 12, 0.01)
# list with RRi parameters
params <- list(alpha = 800, beta = -300, c = 0.9,
lambda = -3, phi = -2,
tau = 5, delta = 2)
# list with parameters for SD dynacmis of RRi over time
params_sd <- list(alpha = 30, beta = -30, c = 0.7,
lambda = -1, phi = -0.8,
tau = 5, delta = 3)
# Simulate the "true" (smooth) RRi curve
y_true <- rr_model(t, params)
# And the fluctuations for RRi SD over time
y_sd <- rr_model(t, params_sd)
# And simulate the observed RRi signal
y <- y_true + rnorm(n = length(t), sd = y_sd)
ggplot() +
geom_line(aes(t, y), linewidth = 1/4, color = "purple") +
geom_line(aes(t, y_true), linewidth = 1, color = "purple") +
geom_line(aes(t, y_true + y_sd), linewidth = 1/2, linetype = 2, color = "purple") +
geom_line(aes(t, y_true - y_sd), linewidth = 1/2, linetype = 2, color = "purple") +
geom_vline(xintercept = c(5, 7), linetype = 2, color = "gray50") +
annotate("text", label = "Rest", x = 3.5, y = 650, color = "gray50") +
annotate("text", label = "Exercise", x = 6, y = 750, color = "gray50") +
annotate("text", label = "Recovery", x = 8.5, y = 650, color = "gray50") +
labs(y = "RRi (ms)", x = "Time (min)",
title = "Exercise-Induced RRi Dynamics",
subtitle = "Cardiac Autonomic Modulation")Code
# ==============================================
# 2. Gaussian Log-Likelihood
# ==============================================
log_likelihood <- function(y, t, params, sigma) {
M <- rr_model(t, params)
residuals <- y - M
N <- length(residuals)
sigma_squared <- sigma^2
-N/2 * log(2 * pi * sigma_squared) - sum(residuals^2) / (2 * sigma_squared)
}Code
# Same parameters as before
# Observed signal
y <- y_true + rnorm(n = length(t), sd = 30)
# Input parameters
param_data <- expand.grid(
alpha = seq(710, 890, length.out = 100),
c = seq(0.3, 1.5, length.out = 100)
)
# Compute log likelihood
param_data$log_likelihood <-
apply(param_data, 1, function(x) {
new_params <- within(params, {
alpha <- x[["alpha"]];
c <- x[["c"]]
})
log_likelihood(y, t, new_params, sigma = 20)
})
param_data <- as.data.table(param_data)
# Get unique alpha and c values for axes
z_vals <- as.matrix(dcast(param_data, alpha ~ c, value.var = "log_likelihood")[,-1])
alpha_vals <- unique(param_data$alpha)
c_vals <- unique(param_data$c)
# Create the 3D surface plot
library(plotly)
plot_ly() |>
add_trace(
x = c_vals,
y = alpha_vals,
z = z_vals,
type = "surface",
colorscale = "Viridis",
showscale = TRUE
) |>
layout(
scene = list(
xaxis = list(title = "C (Recovery proportion)"),
yaxis = list(title = "Alpha parameter"),
zaxis = list(title = "Log-Likelihood"),
camera = list(eye = list(x = -1, y = -2.5, z = 0.5))
),
title = "Log-Likelihood Landscape for alpha and c Parameters"
) parameter), which increases exponentially the number of computations needed to use grid approximation.
Code
# ==============================================
# 3. Partial Derivatives of the RRi Model
# ==============================================
compute_partials <- function(t, params) {
alpha <- params$alpha
beta <- params$beta
c <- params$c
lambda <- params$lambda
phi <- params$phi
tau <- params$tau
delta <- params$delta
# Precompute common terms
t_tau <- t - tau
t_tau_delta <- t - tau - delta
exp_lambda <- exp(lambda * t_tau)
exp_phi <- exp(phi * t_tau_delta)
denom1 <- (1 + exp_lambda)^2
denom2 <- (1 + exp_phi)^2
list(
d_alpha = rep(1, length(t)), # Is constant, so we repeat 1's
d_beta = 1/(1 + exp_lambda) - c/(1 + exp_phi), # ∂M/∂β
d_c = -beta / (1 + exp_phi), # ∂M/∂c
d_lambda = -beta * t_tau * exp_lambda / denom1, # ∂M/∂λ
d_phi = c * beta * t_tau_delta * exp_phi / denom2, # ∂M/∂φ
d_tau = beta * lambda * exp_lambda / denom1 - c * beta * phi * exp_phi / denom2, # ∂M/∂τ
d_delta = -c * beta * phi * exp_phi / denom2 # ∂M/∂δ
)
}Code
partials <-
compute_partials(t, params) |>
as.data.table()
names(partials) <-
c("alpha", "beta", "c",
"lambda", "phi", "tau", "delta")
data <-
cbind(time = t, partials) |>
melt.data.table(id.vars = "time")
ggplot(data, aes(time, value, col = variable)) +
facet_grid(rows = vars(variable), scales = "free_y",
labeller = label_parsed) +
geom_line(show.legend = FALSE, linewidth = 1) +
geom_vline(xintercept = c(5, 7), linetype = 2, color = "gray50") +
labs(x = "Time (min)", y = "Effect on RRi (ms)",
title = "Effect of parameter change on RRi",
subtitle = "Change in RRi for a unit change for each parameter"), which let's recall, it looked like this:
.
Code
# ==============================================
# 4. Gradient of Log-Likelihood (∇U(q))
# ==============================================
grad_U <- function(y, t, params, sigma) {
M <- rr_model(t, params)
residuals <- y - M
partials <- compute_partials(t, params)
N <- length(residuals)
# Precompute inverses
sigma_sq_inv <- 1 / sigma^2
sigma_cu_inv <- 1 / sigma^3
# Gradients for model parameters
grad_params <- list(
alpha = sum(residuals * partials$d_alpha) * sigma_sq_inv,
beta = sum(residuals * partials$d_beta) * sigma_sq_inv,
c = sum(residuals * partials$d_c) * sigma_sq_inv,
lambda = sum(residuals * partials$d_lambda) * sigma_sq_inv,
phi = sum(residuals * partials$d_phi) * sigma_sq_inv,
tau = sum(residuals * partials$d_tau) * sigma_sq_inv,
delta = sum(residuals * partials$d_delta) * sigma_sq_inv
)
# Gradient for sigma
grad_sigma <- (-N / sigma) + sum(residuals^2) * sigma_cu_inv
# Return -∇logL(q)
gradLL <- c(grad_params, list(sigma = grad_sigma))
lapply(gradLL, function(x) -x)
}, used to update the momentum parameter
is itself a vector of gradients for all parameters, including
:
, we'll need to use it to simulate how our particle moves through the parameter space. The common way to do this is to use what we know as symplectic integrators, which are a special kind of algorithms specifically design to leverage Hamiltonian parameters (
and
) to simulate the evolution of a Hamiltonian system over time. The most famous symplectic integrator used in common Bayesian analysis software is the leapfrog integrator.
and
):
, so that we'll have a momentum
for each value in
.
:
we need add the value of the
in the opposite direction, analogous to a ball rolling downhill. We manage to do this by "half-updates":
like the following:
Code
leapfrog <- function(q, p, dt, y, t) {
# Extract sigma and model parameters
sigma <- q$sigma
params <- q[names(q) != "sigma"]
# Compute gradient
grad <- grad_U(y, t, params, sigma)
# Update momentum
p <- p - 0.5 * dt * unlist(grad)
# Update position
q_new <- as.list(x = unlist(q) + dt * p)
# Recompute gradient at new position
new_sigma <- q_new$sigma
new_params <- q_new[names(q_new) != "sigma"]
grad_new <- grad_U(y, t, new_params, new_sigma)
# Final momentum update
p <- p - 0.5 * dt * unlist(grad_new)
list(q = q_new, p = p)
}Code
# Initial parameters ----
## Castillo-Aguilar's model
params <- list(
alpha = 800, beta = -300, c = 0.8,
lambda = -2, phi = -1, tau = 6, delta = 3
)
## Parameter for the standard deviation
sigma <- 10
# Data ----
## Time
t <- seq(0, 15, length.out = 1000)
## Observed RRi
y <- rr_model(t, params) + rnorm(1000, sd = sigma)
# Parameters for simulation ----
## Number of steps
steps <- 10000
## Stepsize at each step
stepsize <- 0.00025
## Random seed
set.seed(1234)
## Initiate random momentum vector
p <- rnorm(n = 8, mean = 0, sd = 1)
## Parameter
q <- c(params, list(sigma = sigma))
out <- vector("list", length = steps)
## Leapfrog integrator
for (step in 1:steps) {
out[[step]] <- leapfrog(q = q, p = p, dt = stepsize, y = y, t = t)
q <- out[[step]]$q
p <- out[[step]]$p
}Code
position <- lapply(out, function(x) {
as.data.table(x$q)
}) |> rbindlist()
momentum <- lapply(out, function(x) {
as.list(x$p)
}) |> rbindlist()
position$step <- 1:steps
momentum$step <- 1:steps
position <- melt.data.table(position, id.vars = "step", value.name = "p")
momentum <- melt.data.table(momentum, id.vars = "step", value.name = "q")
sim_data <- position[momentum, on = c("step", "variable")]
sim_data[, `:=`(q = scale(q), p = scale(p)), list(variable)]
ggplot(sim_data, aes(p, q)) +
facet_grid(rows = vars(variable), labeller = label_parsed) +
geom_path(aes(group = 1, alpha = step, col = variable), show.legend = FALSE) +
labs(x = "Momentum (p)", y = "Position (q)",
title = "Phase Space of RRi Model",
subtitle = "Particle Moving on the Parameter Space",
caption = "Values are Standardized For Each Parameter") parameter didn't reach a stationary behavior. This could be for a number of different reasons like hyperparameter tuning, and in the same way, we could try to fix it using several methods (e.g., step size, trajectory length). However, that's another beast for another article.
Synthesizing Theory, Computation, and Cardiovascular Insight
), illuminates the complex interplay of competing physiological forces—exercise versus recovery dynamics—mathematically encoded within the model. This granular understanding directly informs parameter sensitivity analysis and fosters a crucial, well-founded confidence in the inferential outcomes, particularly when confronting complex posterior geometries or apparent non-stationarity in sampling traces, allowing one to distinguish true model behavior from mere artifacts of the inference algorithm or its tuning. This experience strongly advocates for considering hybrid analytical workflows, where the manual derivation of particularly insightful or sensitive gradient components is judiciously combined with the efficiency of AD for others, thus achieving an optimal balance between profound conceptual grasp and practical computational execution, especially pertinent in hierarchical modeling structures.
) and recovery rate (
) in the RRi model, navigating the characteristic ridges of such posteriors far more adeptly than undirected random proposals. This gradient-informed exploration mechanism also confers significant scalability; whereas grid-based evaluations or random walks suffer exponentially from the curse of dimensionality, HMC's computational cost typically scales polynomially, rendering complex models with numerous parameters computationally tractable. The symplectic nature of the leapfrog integrator further ensures a delicate balance between broad exploration of the parameter space, facilitated by sustained momentum across potentially long trajectories, and detailed exploitation of high-probability regions, maintaining the necessary statistical properties for valid inference. Observing the phase-space dynamics, where stable oscillations for parameters like baseline RRi (
) contrast with divergent behavior for others like exercise magnitude (
), provides direct diagnostic feedback on the interplay between the model's geometry and the sampler's hyperparameters.
), which dictates the granularity of the leapfrog integration; the trajectory length (
), which determines the extent of exploration in each iteration; and the mass matrix, which adapts the sampler to the local curvature and scale of the posterior. Suboptimal choices, as potentially indicated by the observed behavior of the (
) parameter in our simulations, can lead to integration instability or inefficient exploration. While adaptive HMC variants like the No-U-Turn Sampler (NUTS) offer automated tuning, they introduce additional algorithmic complexity. Furthermore, the very flexibility of sophisticated mechanistic models like the Castillo-Aguilar formulation can engender issues of parameter identifiability. Parameters governing timing, such as exercise onset (
) and recovery delay (
), may exhibit strong correlations, leading to elongated, curved posterior distributions that challenge even gradient-based samplers. Addressing this requires thoughtful strategies, including model reparameterization to decouple correlated variables or the incorporation of informative prior distributions derived from existing physiological knowledge. Finally, the computational burden associated with gradient evaluation, scaling linearly with the number of data points, remains a practical constraint, particularly for extensive datasets like continuous, long-term ECG recordings, potentially necessitating the adoption of stochastic gradient methods or hardware acceleration.
) overlooks the common physiological reality of heteroscedasticity in RR intervals, where variability often changes with physiological state. Enhancing the model to accommodate time-varying or state-dependent noise would increase realism but inevitably add layers of computational and inferential complexity, demanding a careful cost-benefit analysis guided by the specific research question.
,
) and timing (
,
) to individual subjects, thereby paving the way for more personalized physiological assessment. The translation of such theoretical models into practical clinical applications, however, necessitates bridging the gap between offline analysis and real-time monitoring. This requires developing robust prior distributions informed by population data, potentially exploring online HMC algorithms capable of processing streaming data, and ensuring rigorous validation against established clinical metrics. Furthermore, advancing the field responsibly hinges on a commitment to open science and reproducibility, extending beyond code sharing to encompass standardized data formats and the creation of benchmark datasets for objective model comparison.
) and (
) to vary dynamically with factors like exercise intensity, or employ more flexible non-Gaussian likelihoods (e.g., log-normal or Gamma distributions) to better capture the statistical properties of RR intervals. Crucially, the clinical relevance of these models must be established through rigorous benchmarking against existing gold-standard physiological measures and validated in intervention studies designed to assess whether estimated parameter shifts accurately reflect responses to therapeutic interventions or training regimens.
Citation
@misc{castillo-aguilar2025,
author = {Castillo-Aguilar, Matías},
title = {HMC by {Hand:} {Gradients,} {Heartbeats,} {PDEs} and {Other}
{Tales}},
date = {2025-04-19},
url = {https://bayesically-speaking.com/posts/2025-04-19 heartbeats-pdes-and-other-tales/},
langid = {en}
}
Additional details
Description
Introduction When we exercise, our hearts run faster, which is associated with a decrease in the time between heartbeats. This time between heartbeats is also known by R-R intervals. If you have ever seen an electrocardiogram (EKG) curve before, each time your heart pumps you'll see a small spike in your EKG;
Identifiers
- UUID
- dbeb14f1-a7ab-41f8-bb57-c47263c410fc
- GUID
- https://bayesically-speaking.com/posts/2025-04-19 heartbeats-pdes-and-other-tales/
- URL
- https://bayesically-speaking.com/posts/2025-04-19%20heartbeats-pdes-and-other-tales
Dates
- Issued
-
2025-04-19T03:00:00
- Updated
-
2025-04-19T03:00:00