Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ai-content-moderation plugin #11541

Merged

Conversation

shreemaan-abhishek
Copy link
Contributor

@shreemaan-abhishek shreemaan-abhishek commented Aug 30, 2024

Description

The content-moderation plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold.

In later PRs, other plugins like ai-prompt-decorator and ai-prompt-template can use function from this plugin to ensure content moderation in requests proxying LLMs.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@shreemaan-abhishek shreemaan-abhishek marked this pull request as ready for review August 30, 2024 16:32
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. doc Documentation things plugin labels Aug 30, 2024
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Sep 2, 2024
@shreemaan-abhishek shreemaan-abhishek marked this pull request as draft September 2, 2024 07:39
@shreemaan-abhishek shreemaan-abhishek marked this pull request as ready for review September 10, 2024 08:33
@@ -334,6 +335,26 @@ function _M.get_body(max_size, ctx)
end


function _M.get_body_table()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai-proxy PR also has this code so later we can merge from master after ai-proxy is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the name of the method there changed :D


The `ai-content-moderation` plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold.

**_This plugin must be used in routes that proxy requests to LLMs only._**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just routes? no services?

Or do you just want to stress the upstream should be LLM providers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on routes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you just want to stress the upstream should be LLM providers.

this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I actually think this sentence is redundant. how about just mention It is used when integrating with LLMs. in the paragraph above?

"ai-proxy": {
"auth": {
"header": {
"Authorization": "Bearer token"
Copy link
Member

@kayx23 kayx23 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Authorization": "Bearer token"
"Authorization": "Bearer <your-api-token>"

or

 "Authorization": "Bearer '"$OPENAI_API_KEY"'"

this takes env var


```shell
curl http://127.0.0.1:9080/post -i -XPOST -H 'Content-Type: application/json' -d '{
"info": "<some very seriously profane message>"
Copy link
Member

@kayx23 kayx23 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a dummy? Shouldn't the format to OpenAI be something like this?

    "messages": [
      { "role": "system", "content": "system prompt goes here" },
      { "role": "user", "content": "offensive user prompts" }
    ]

request body exceeds toxicity threshold
```

Send a request with normal request body:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Send a request with normal request body:
Send a request with compliant content in the request body:

Send a request with normal request body:

```shell
curl http://127.0.0.1:9080/post -i -XPOST -H 'Content-Type: application/json' -d 'APISIX is wonderful'
Copy link
Member

@kayx23 kayx23 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opening paragraph says This plugin must be used in routes that proxy requests to LLMs only yet the example does not involve proxying to LLM. It feels a bit self-conflicting.

The example demonstrates exactly the integration could be used for general purpose and checking requests NOT proxying to LLM.

| provider.aws_comprehend.secret_access_key | Yes | String | AWS secret access key |
| provider.aws_comprehend.region | Yes | String | AWS region |
| provider.aws_comprehend.endpoint | No | String | AWS Comprehend service endpoint. Must match the pattern `^https?://` |
| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT |
Copy link
Member

@kayx23 kayx23 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT |
| moderation_categories | No | Object | Key-value pairs of moderation category and their score. In each pair, the key should be one of the `PROFANITY`, `HATE_SPEECH`, `INSULT`, `HARASSMENT_OR_ABUSE`, `SEXUAL`, or `VIOLENCE_OR_THREAT`; and the value should be between 0 and 1 (inclusive). |

| provider.aws_comprehend.region | Yes | String | AWS region |
| provider.aws_comprehend.endpoint | No | String | AWS Comprehend service endpoint. Must match the pattern `^https?://` |
| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT |
| toxicity_level | No | Number | Threshold for overall toxicity detection. Range: 0 - 1. Default: 0.5 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| toxicity_level | No | Number | Threshold for overall toxicity detection. Range: 0 - 1. Default: 0.5 |
| toxicity_level | No | Number | The degree to which content is harmful, offensive, or inappropriate. A higher value indicates more toxic content allowed. Range: 0 - 1. Default: 0.5 |

@kayx23 kayx23 mentioned this pull request Sep 30, 2024
5 tasks

**_This plugin must be used in routes that proxy requests to LLMs only._**

**_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._**
**_As of now, the plugin only supports the integration with [AWS Comprehend](https://aws.amazon.com/comprehend/) for content moderation. PRs for introducing support for other service providers are welcomed._**

function _M.check_schema(conf)
return core.schema.check(schema, conf)
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blank lines between functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

type = "object",
properties = {
provider = {
type = "object",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type = "object",
type = "object",
maxProperties = 1,

To make sure next(conf.provider) always returns aws_comprehend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return bad_request, "messages not found in request body"
end

local provider = conf.provider[next(conf.provider)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current schema definition does not seem to be able to prevent multiple properties from being entered incorrectly. It is recommended that you consider adding a maxProperties = 1 constraint to the schema.

@zhoujiexiong
Copy link
Contributor

LGTM

@bzp2010 bzp2010 self-requested a review October 9, 2024 03:31
@Revolyssup Revolyssup self-requested a review October 9, 2024 19:08
@shreemaan-abhishek shreemaan-abhishek merged commit 695ea3c into apache:master Oct 10, 2024
33 checks passed
@shreemaan-abhishek shreemaan-abhishek deleted the content-moderation branch October 10, 2024 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation things plugin size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants