Some underlying mathematics of Qgcomp
Here are some of the mathematical building blocks of the package. None of these are new, but they are helpful for understanding how the approach works.
Rules of variance:
Variance of a sum of random variables
\[Var(X + Z) = var(X) + Var(Z) + Cov(X,Z) + Cov(Z,X)\]
\[Var(X - Z) = Var(X + -Z)\]
\[= Var(X) + Var(-Z) + Cov(X,-Z) + Cov(-Z,X)\]
\[= Var(X) + Var(Z) - Cov(X,Z) - Cov(Z,X)\]
Variance of a sum of sums of random variables
\[A=X+Z, B=Y+W\]
\[Var(A)= Var(X) + Var(Z) + 2*Cov(X,Z)\]
\[Var(B)= Var(Y) + Var(W) + 2*Cov(Y,W)\]
\[Var(A+B)= Var(A) + Var(B) + 2*Cov(A,B)\]
Variance of a product of a random variable and a constant
\[Var(X*c) = E(X*c-E(X*c))^2\]
\[= E((X*c-cE(X))^2)\]
\[= E((X*c-cE(X))^2)\]
\[= E((c(X-E(X))^2)\]
\[= c^2 * E((X-E(X))^2)\]
\[= c^2 * Var(X)\]
Covariance of a random variable and a product of a random variable and a constant
\[Cov(X*c, Z) = E[(X*c-E(X*c))(Z-E(X)]\]
\[= c*E[(X-E(X))(Z-E(X)]\]
\[= c * Cov(X, Z)\]
Generalized linear model characteristics
\[E(g(y)|x) = β0 + β1*x1 + β2*x2\]
Standard error of a linear combination
\[RD = E(y|x=1) - E(y|x=0)\]
\[= β0 + β1*1 + β2*1 - (β0 + β1*0 + β2*0)\]
\[=β1 + β2\]
\[Var(RD) = Var(β1 + β2)\]
\[= Var(β1) + Var(β2) + 2*Cov(β1,β2)\]
Variance of a prediction
Define a prediction for a two exposure linear model as $E(y|x) = β0 + β1*x1 + β2*x2$
\[Var(E(y|x)) = Var(β0 + β1*x1 + β2*x2)\]
\[= Var(β0 + β1*x1 + β2*x2)\]
\[= Var(β0) + Var(β1*x1) + Var(β2*x2) + 2*Cov(β0,β1*x1) + 2*Cov(β0,β2*x2) + 2*Cov(β1*x1,β2*x2)\]
\[= Var(β0) + x1^2*Var(β1) + x2^2*Var(β2) + 2*Cov(β0,β1)*x1 + 2*Cov(β0,β2)*x2 + 2*x1*Cov(β1,β2)*x2\]
Variance/covariance matrix for a vector of predictions
\[\mathbf{1} = (1, 1, ..., 1)'\]
\[X1 = (x1_1, x1_2, ..., x1_n)'\]
\[X2 = (x2_1, x2_2, ..., x2_n)'\]
\[\mathbf{X} = (\mathbf{1}, X1, X2)\]
\[\mathbf{β} = (β0, β1, β2)'\]
\[Vcov(E(y|\mathbf{X})) = Vcov(β0 + β1*X1 + β2*X2)\]
\[Vcov(E(y|\mathbf{X})) = Vcov(\mathbf{Xβ})\]
\[= (\mathbf{Xβ} - E\mathbf{Xβ})(\mathbf{Xβ} - E\mathbf{Xβ})'\]
\[= (\mathbf{Xβ} - E\mathbf{Xβ})(\mathbf{β'X} - E\mathbf{β'X})\]
\[= \mathbf{Xββ'X} - E(\mathbf{Xβ})\mathbf{β'X} - \mathbf{Xβ}E(\mathbf{X'β}) + E(\mathbf{Xβ})*E(\mathbf{β'X})\]
using GLM, Random, LinearAlgebra
x = rand(100, 2)
y = 0.1 .+ x * [1.0, 0.3] + randn(100)
X = hcat(ones(length(y)), x)
ft = fit(LinearModel, X, y)
V = GLM.vcov(ft)
diag(X * V * X')
println("Coefficient estimate from Qgcomp")
grad = zeros(length(coef(ft)))
grad[2:3] .= 1 # second and third coefficients are exposures from the mixture
println(sum(coef(ft)[findall(grad .== 1)]))
println("Coefficient standard error from Qgcomp")
println(sqrt(grad' * V * grad))Coefficient estimate from Qgcomp
2.020850598572289
Coefficient standard error from Qgcomp
0.5470802529717882using Qgcomp, DataFrames
qgcomp_glm_noboot(@formula(y~x1+x2), DataFrame(y=y, x1=x[:,1], x2=x[:,2]), [:x1, :x2], nothing, Normal())Scaled effect size (negative direction)
(none)
Scaled effect size (positive direction)
2×4 DataFrame
Row │ exposure coef weight ψ_partial
│ String Float64 Float64 Float64
─────┼────────────────────────────────────────
1 │ x2 0.29543 0.146191 2.02085
2 │ x1 1.72542 0.853809 2.02085
StatsBase.CoefTable(Any[[-0.2616967804947699, 2.0208505985723], [0.30087650915174197, 0.5470802529717883], [-0.8697813638976634, 3.6938832787234794], [0.3844198977654208, 0.00022085520870221434], [-0.8514039022263211, 0.9485930060945311], [0.32801034123678136, 3.093108191050069]], ["Coef.", "Std. Error", "z", "Pr(>|z|)", "Lower 95%", "Upper 95%"], ["(Intercept)", "mixture"], 4, 3)