Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use KKT conditions to speed up mode search #12

Open
msuchard opened this issue Jan 8, 2015 · 4 comments
Open

Use KKT conditions to speed up mode search #12

msuchard opened this issue Jan 8, 2015 · 4 comments
Assignees

Comments

@msuchard
Copy link
Member

msuchard commented Jan 8, 2015

Hi @schuemie and @pbr6cornell ,

If you can generate a dataset with 100,000 or more covariates and many, many rows, I just implemented (in a15bb66) a mode search strategy that should be much faster than before. Please let me know your mileage, so I can tinker a bit with performance. The R commands are:

slowFit <- fitCyclopsModel(massiveCyclopsData,
                           forceNewObject = TRUE, # Cold start for fair comparison
                           prior = createPrior("laplace"))

fastFit <- fitCyclopsModel(massiveCyclopsData,
                          forceNewObject = TRUE, # Cold start for fair comparison
                          prior = createPrior("laplace"),
                          control = createControl(noiseLevel = "quiet",
                                                  useKKTSwindle = TRUE,
                                                  tuneSwindle = 10)) # Maybe try 50, 100 as well
@msuchard msuchard self-assigned this Jan 8, 2015
@schuemie
Copy link
Member

Did a first assessment with a 'medium-sized' set (24k covariates, 450k rows), which was not a big success:
With KKTSwindle: 8.5 hours
Without KKT Swindle: 16 minutes
Will try with a much bigger set now.

@msuchard
Copy link
Member Author

Well, that certainly seems disappointing, and counter to the small examples with which I played; using the swindle never took longer.

Can you share the data files? Or provide R commands that I can run against cdm_sim4 to generate something similar?

@msuchard
Copy link
Member Author

The following shows an approximately 2-fold speed-up:

    seed <- 666
    tolerance <- 1E-4

    data <- simulateCyclopsData(nstrata = 1,
                                nrows = 10000,
                                ncovars = 20000,
                                zeroEffectSizeProp = 0.99,
                                model = "logistic")

    cyclopsData <- convertToCyclopsData(data$outcomes,
                                        data$covariates,
                                        modelType = "lr",
                                        addIntercept = TRUE) 

    slowFit <- fitCyclopsModel(cyclopsData,
                               forceNewObject = TRUE, # Cold start for fair comparison
                               prior = createPrior("laplace", variance = 0.01, exclude = c(0)))


    fastFit <- fitCyclopsModel(cyclopsData,
                               forceNewObject = TRUE, # Cold start for fair comparison
                               prior = createPrior("laplace", variance = 0.01, exclude = c(0)),
                               control = createControl(noiseLevel = "silent",
                                                       useKKTSwindle = TRUE,
                                                       tuneSwindle = 100))    

    slowFit$timeFit
    fastFit$timeFit

    slowFit$timeLog # An apparent inefficiency in moving coefficient estimates back into R
    fastFit$timeLog #

    expect_equal(coef(slowFit), coef(fastFit), tolerance = tolerance)

@msuchard msuchard added this to the release 1.1 milestone Jan 27, 2015
@msuchard msuchard added bug and removed bug labels Jan 27, 2015
@msuchard
Copy link
Member Author

msuchard commented Jan 4, 2020

Am still wondering about this ....

@schuemie schuemie removed this from the release 1.1 milestone Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants