Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

add gcp storage to xgboost-operator #81

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

xfate123
Copy link
Contributor

Think about adding a another storage option for our xgboost-operator. Still working on it.

@kubeflow-bot
Copy link

This change is Reviewable

@k8s-ci-robot
Copy link

Hi @xfate123. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign johnugeorge
You can assign the PR to them by writing /assign @johnugeorge in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

draft updated. Appreciate further review
'feature_importance.json')

gcp_path = gcp_parameters['path']
logger.info('---- export model ----')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export model to GCP ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to GCP

Copy link
Contributor

@merlintang merlintang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also update YAML, and the readme to help user to use as well.

fscore_dict = booster.get_fscore()
with open(feature_importance, 'w') as file:
file.write(json.dumps(fscore_dict))
logger.info('---- chief dump model successfully!')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump model to local ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt it from dump to oss module, I think the logic is dump the model to local first, and then upload from local to the cloud

upload_gcp(gcp_parameters, model_fname, aux_path)
upload_gcp(gcp_parameters, text_model_fname, aux_path)
upload_gcp(gcp_parameters, feature_importance, aux_path)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the log to say that this model is updated success?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure

@xfate123
Copy link
Contributor Author

@merlintang update the README for user's convenience. And also specify the yaml for oss user and gcp user. Appreciate for further review

@@ -41,15 +46,36 @@ For Eg:
--oss_param=endpoint:http://oss-ap-south-1.aliyuncs.com,access_id:XXXXXXXXXXX,access_key:XXXXXXXXXXXXXXXXXXX,access_bucket:XXXXXX
Similarly, xgboostjob_v1alpha1_iris_predict.yaml is used to configure XGBoost job batch prediction.

**Configure GCP parameter**
For training jobs in GCP , you could configure xgboostjob_v1alpha1_iris_train.yaml and xgboostjob_v1alpha1_iris_predict.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the yaml file name is correct.

For training jobs in GCP , you could configure xgboostjob_v1alpha1_iris_train.yaml and xgboostjob_v1alpha1_iris_predict.yaml
Note, we use [GCP](https://cloud.google.com/) to store the trained model,
thus, you need to specify the GCP parameter in the yaml file. Therefore, remember to fill the GCP parameter in xgboostjob_v1alpha1_iris_train.yaml and xgboostjob_v1alpha1_iris_predict.yaml file.
The oss parameter includes the account information such as type, client_id, client_email,private_key_id,private_key and access_bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oss?

spec:
containers:
- name: xgboostjob
image: docker.io/merlintang/xgboost-dist-iris:1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the image name is not correct, you need to build the new image withe new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure, thanks for your advice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just double check, you mean build a new image with new python code and update the new image to all yaml files in this folder.
Do I understand correct?

@merlintang
Copy link
Contributor

merlintang commented May 16, 2020 via email

@xfate123
Copy link
Contributor Author

@merlintang already created the new image and update it to all the yaml file. still need further testing.
And I am thinking about a better way for user to add argument, then we don't have to have so many different yaml file for different storage-type.

@merlintang
Copy link
Contributor

change the PR title, you still have the work in progress.

@xfate123 xfate123 changed the title add gcp storage to xgboost-operator, still working on in add gcp storage to xgboost-operator[WIP] May 17, 2020
- --job_type=Predict
- --model_path=autoAI/xgb-opt/2
- --model_storage_type=gcp
- --gcp_param=unknown
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unknown here?

spec:
containers:
- name: xgboostjob
image: docker.io/xfate123/xgboost-dist-iris:1.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a local image

spec:
containers:
- name: xgboostjob
image: docker.io/xfate123/xgboost-dist-iris:1.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

imagePullPolicy: Always
args:
- --job_type=Predict
- --model_path=autoAI/xgb-opt/2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify the model path?

@@ -42,7 +42,7 @@ spec:
claimName: xgboostlocal
containers:
- name: xgboostjob
image: docker.io/merlintang/xgboost-dist-iris:1.1
image: docker.io/xfate123/xgboost-dist-iris:1.1
Copy link
Member

@terrytangyuan terrytangyuan May 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a Dockerfile for this image in this repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have image in this repo

@xfate123 xfate123 changed the title add gcp storage to xgboost-operator[WIP] add gcp storage to xgboost-operator May 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants