regression - r lrm from rms package and imputed data from mice

admin管理员组
文章数量:1023803

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here’s the situation: I’ve been provided with a dataset that has already been imputed using the mice package. The original dataset with missing values is not available, and the imputed dataset appears to have been generated using mice::complete(imputed_df, "long", include = FALSE). This is my starting point.

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)

The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)

The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

Share Improve this question edited Nov 19, 2024 at 16:28 jay.sf 74.7k8 gold badges64 silver badges126 bronze badges asked Nov 19, 2024 at 1:32 Clifton Pinto 133 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

I've done something similar there. It follows Rubin's logic^{1, 2}.

> PF <- by(imputed_df_l, ~ .imp, \(x) {
+   summary(lrm(y ~ child_sex + bmi + gTrans(tiktok_ban, h), data=x))
+ }) |> 
+   simplify2array()
> 
> m. <- length(unique(imputed_df_l$.imp))
> Q <- rowMeans(PF[, 'Effect', ])  ## calculate mean estimates
> U <- rowMeans(PF[, 'S.E.', ])  ## calculate within variances
> B <- rowSums(((PF[, 'Effect', ] - Q)^2))/(m. - 1)  ## calculate between variances 
> T <- U + (1 + 1/m.)*B  ## calculate total variances 
> cbind(Estimate=Q, 'Std. Error'=sqrt(T))
                             Estimate Std. Error
bmi                        -0.2392742  0.8981817
 Odds Ratio                 0.8175118         NA
tiktok_ban                  9.2942707  2.8305722
 Odds Ratio             10985.6518883         NA
child_sex - Male:Female     0.6758367  0.9786733
 Odds Ratio                 1.9660930         NA

I leave the CIs and the plot to you.

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)

The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

I apologize for my ignorance, I’m jumping into an analysis midway and would appreciate guidance on the appropriate functions to use for estimating odds ratios (ORs).

Here's the code to reproduce this dataset

library(mice)
library(rms)
library(parameters)
library(splines)

set.seed(123) # For reproducibility
# Number of observations
n <- 100

# outcome
y <- rbinom(n, size = 1, prob = 0.07)

# bmi
bmi <- runif(n, min = 16, max = 60)
bmi[sample(1:n, size = round(0.12 * n))] <- NA # Introduce missing values

# tiktok_ban
tiktok_ban <- sample(1:90, size = n, replace = TRUE)
#tiktok_ban[sample(1:n, size = round(0.05 * n))] <- 0 # Force some values to be 0

# sex
child_sex <- sample(c("Male", "Female"), size = n, replace = TRUE, prob = c(0.49, 0.51))
child_sex[sample(1:n, size = round(0.05 * n))] <- NA # Introduce missing values

# Combine into a data frame
df <- data.frame(y, bmi, tiktok_ban, child_sex)

# View first rows of the dataset
head(df)

#

imputed_df <- mice(df, print = FALSE, m = 20, seed = 24415, method = "pmm", printFlag = FALSE)

imputed_df_l <- mice::complete(imputed_df, "long", include = FALSE)

The goal is to estimate the Odds Ratio of an event (y).

I am modelling tiktok_ban as nonlinear spline with jump at day 20

d    <- imputed_df_l
dd   <- datadist(d);  options(datadist='dd')


Hmisc::describe(imputed_df_l$tiktok_ban)

k  <- attr(rcs(imputed_df_l$tiktok_ban, 6), 'parms')
k

h <- function(x) {
  z <- cbind(rcspline.eval(x, k),
             jump=x >= 20)
  attr(z, 'nonlinear') <- 2 : ncol(z)
  z
}

I am able to estimate the OR for this model on individual imputed dataset

f <- lrm(y ~ child_sex + bmi +
             gTrans(tiktok_ban, h),
             data= subset(imputed_df_l, .imp == 1)) 

summary(f)

How do I scale this code to run on multiple imputed datasets and generate pooled Odds Ratios and create a plot of Odds Ratios versus tiktok_ban based on the final pooled results?

Please note I cannot change the imputed datasets imputed_df_l , this is what I was given. Thanks in advance for any help

I have tried fit.mult.impute but ran into errors and unable to make it work for my usecase.

Share Improve this question edited Nov 19, 2024 at 16:28 jay.sf 74.7k8 gold badges64 silver badges126 bronze badges asked Nov 19, 2024 at 1:32 Clifton Pinto 133 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

I've done something similar there. It follows Rubin's logic^{1, 2}.

> PF <- by(imputed_df_l, ~ .imp, \(x) {
+   summary(lrm(y ~ child_sex + bmi + gTrans(tiktok_ban, h), data=x))
+ }) |> 
+   simplify2array()
> 
> m. <- length(unique(imputed_df_l$.imp))
> Q <- rowMeans(PF[, 'Effect', ])  ## calculate mean estimates
> U <- rowMeans(PF[, 'S.E.', ])  ## calculate within variances
> B <- rowSums(((PF[, 'Effect', ] - Q)^2))/(m. - 1)  ## calculate between variances 
> T <- U + (1 + 1/m.)*B  ## calculate total variances 
> cbind(Estimate=Q, 'Std. Error'=sqrt(T))
                             Estimate Std. Error
bmi                        -0.2392742  0.8981817
 Odds Ratio                 0.8175118         NA
tiktok_ban                  9.2942707  2.8305722
 Odds Ratio             10985.6518883         NA
child_sex - Male:Female     0.6758367  0.9786733
 Odds Ratio                 1.9660930         NA

I leave the CIs and the plot to you.

本文标签： regressionr lrm from rms package and imputed data from miceStack Overflow

版权声明：本文标题：regression - r lrm from rms package and imputed data from mice - Stack Overflow 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/questions/1745587142a2157665.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

regression - r lrm from rms package and imputed data from mice - Stack Overflow

1 Answer 1

1 Answer 1

更多相关文章