Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detecting databricks cluster is up #265

Open
mx2323 opened this issue Sep 10, 2024 · 8 comments
Open

detecting databricks cluster is up #265

mx2323 opened this issue Sep 10, 2024 · 8 comments

Comments

@mx2323
Copy link

mx2323 commented Sep 10, 2024

when connecting to a databricks warehouse, if it is not serverless, it can take minutes to start up.

we would like a nonblocking way of detecting whether a databricks warehouse is up and ready to go. is there a best practice on doing this in a nonblocking manner with this sdk?

the API currently blocks statements and waits for the cluster to be ready... we could probably do something like a SELECT 1 and wait 1 second for it to complete (and cancel the operation if it doensn't succeed in 1 second), but was just curious if there is a better way of going about whether the cluster is live and ready to go.

@kravets-levko
Copy link
Collaborator

Hi @mx2323! Unfortunately, there's no way to check cluster state using this library. Please also note that if you run query and then cancel it - cluster will continue to startup. I need to check if there's anything that can help with your case. Will get back to you soon

@mx2323
Copy link
Author

mx2323 commented Sep 12, 2024

thanks @kravets-levko for responding. we are OK if the cluster will continue to startup since we are waiting for it to startup...

agree though, some kind of a definitive check or guidelines on how to check for readiness would be great!

@kravets-levko
Copy link
Collaborator

@mx2323 there is a REST API endpoint you can use fro this purpose: https://docs.databricks.com/api/workspace/clusters/get I have no prior experience with it, so you'd have to figure things out yourself. But if you'll struggle with it - feel free to ask, I'll do my best to help you

@kravets-levko
Copy link
Collaborator

Actually, it turned out super simple:

const host = '....';
const clusterId = '....';
const token = 'dapi....';

const params = new URLSearchParams({
  cluster_id: clusterId,
});

const response = await fetch(`https://${host}/api/2.1/clusters/get?${params}`, {
  method: 'GET',
  headers: {
    Authorization: `Bearer ${token}`,
  },
});

const data = await response.json();

console.dir(data.state);

@kravets-levko
Copy link
Collaborator

For SQL warehouse everything is the same, just use different API endpoint - https://docs.databricks.com/api/workspace/warehouses/get

@kravets-levko
Copy link
Collaborator

TBH - I don't know what people usually do. As for SQL Warehouses, the startup is usually quite fast (rarely more than a minute on all instances I use for testing, often even 20-30s). Compute clusters indeed take a significant amount of time to start, but that's expected. And, of course, for both warehouses and clusters you can disable auto-stop, and they will remain running until manually stopped

@mx2323
Copy link
Author

mx2323 commented Sep 12, 2024

are there any issues with submitting a query of SELECT 1, and setting a timer for it to complete in 5 seconds, and cancelling if it doesnt complete in that time? that would be easier for us since it wouldnt require us to redo auth through the REST api.... and in the positive case, it'd just return quickly and in the negative case it'd wait 5 seconds which would be OK since we are just waiting.

@kravets-levko
Copy link
Collaborator

@mx2323 you can do it if it works for you. You can also use a queryTimeout option instead of timer. Just keep in mind that this options doesn't work with SQL Warehouses, only with clusters. See #167 (comment) and #167 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants