admin管理员组

文章数量:1023803

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here’s the situation: I’ve been provided with a dataset that has already been imputed using the mice package. The original dataset with missing values is not available, and the imputed dataset appears to have been generated using mice::complete(imputed_df, "long", include = FALSE). This is my starting point.

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)


The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here’s the situation: I’ve been provided with a dataset that has already been imputed using the mice package. The original dataset with missing values is not available, and the imputed dataset appears to have been generated using mice::complete(imputed_df, "long", include = FALSE). This is my starting point.

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)


The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

Share Improve this question edited Nov 19, 2024 at 16:28 jay.sf 74.7k8 gold badges64 silver badges126 bronze badges asked Nov 19, 2024 at 1:32 Clifton PintoClifton Pinto 133 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

I've done something similar there. It follows Rubin's logic1, 2.

> PF <- by(imputed_df_l, ~ .imp, \(x) {
+   summary(lrm(y ~ child_sex + bmi + gTrans(tiktok_ban, h), data=x))
+ }) |> 
+   simplify2array()
> 
> m. <- length(unique(imputed_df_l$.imp))
> Q <- rowMeans(PF[, 'Effect', ])  ## calculate mean estimates
> U <- rowMeans(PF[, 'S.E.', ])  ## calculate within variances
> B <- rowSums(((PF[, 'Effect', ] - Q)^2))/(m. - 1)  ## calculate between variances 
> T <- U + (1 + 1/m.)*B  ## calculate total variances 
> cbind(Estimate=Q, 'Std. Error'=sqrt(T))
                             Estimate Std. Error
bmi                        -0.2392742  0.8981817
 Odds Ratio                 0.8175118         NA
tiktok_ban                  9.2942707  2.8305722
 Odds Ratio             10985.6518883         NA
child_sex - Male:Female     0.6758367  0.9786733
 Odds Ratio                 1.9660930         NA

I leave the CIs and the plot to you.

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here’s the situation: I’ve been provided with a dataset that has already been imputed using the mice package. The original dataset with missing values is not available, and the imputed dataset appears to have been generated using mice::complete(imputed_df, "long", include = FALSE). This is my starting point.

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)


The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here’s the situation: I’ve been provided with a dataset that has already been imputed using the mice package. The original dataset with missing values is not available, and the imputed dataset appears to have been generated using mice::complete(imputed_df, "long", include = FALSE). This is my starting point.

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)


The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

Share Improve this question edited Nov 19, 2024 at 16:28 jay.sf 74.7k8 gold badges64 silver badges126 bronze badges asked Nov 19, 2024 at 1:32 Clifton PintoClifton Pinto 133 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

I've done something similar there. It follows Rubin's logic1, 2.

> PF <- by(imputed_df_l, ~ .imp, \(x) {
+   summary(lrm(y ~ child_sex + bmi + gTrans(tiktok_ban, h), data=x))
+ }) |> 
+   simplify2array()
> 
> m. <- length(unique(imputed_df_l$.imp))
> Q <- rowMeans(PF[, 'Effect', ])  ## calculate mean estimates
> U <- rowMeans(PF[, 'S.E.', ])  ## calculate within variances
> B <- rowSums(((PF[, 'Effect', ] - Q)^2))/(m. - 1)  ## calculate between variances 
> T <- U + (1 + 1/m.)*B  ## calculate total variances 
> cbind(Estimate=Q, 'Std. Error'=sqrt(T))
                             Estimate Std. Error
bmi                        -0.2392742  0.8981817
 Odds Ratio                 0.8175118         NA
tiktok_ban                  9.2942707  2.8305722
 Odds Ratio             10985.6518883         NA
child_sex - Male:Female     0.6758367  0.9786733
 Odds Ratio                 1.9660930         NA

I leave the CIs and the plot to you.

本文标签: regressionr lrm from rms package and imputed data from miceStack Overflow