feat: ai-content-moderation plugin #11541

shreemaan-abhishek · 2024-08-30T03:22:11Z

Description

The content-moderation plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold.

In later PRs, other plugins like ai-prompt-decorator and ai-prompt-template can use function from this plugin to ensure content moderation in requests proxying LLMs.

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

…tion

apisix/plugins/content-moderation.lua

shreemaan-abhishek · 2024-09-10T09:12:26Z

apisix/core/request.lua

@@ -334,6 +335,26 @@ function _M.get_body(max_size, ctx)
 end


+function _M.get_body_table()


ai-proxy PR also has this code so later we can merge from master after ai-proxy is merged.

note that the name of the method there changed :D

kayx23 · 2024-09-27T06:59:45Z

docs/en/latest/plugins/ai-content-moderation.md

+
+The `ai-content-moderation` plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold.
+
+**_This plugin must be used in routes that proxy requests to LLMs only._**


Just routes? no services?

Or do you just want to stress the upstream should be LLM providers.

on routes?

Or do you just want to stress the upstream should be LLM providers.

this.

In that case I actually think this sentence is redundant. how about just mention It is used when integrating with LLMs. in the paragraph above?

kayx23 · 2024-09-27T07:24:55Z

docs/en/latest/plugins/ai-content-moderation.md

+      "ai-proxy": {
+        "auth": {
+          "header": {
+            "Authorization": "Bearer token"


Suggested change

"Authorization": "Bearer token"

"Authorization": "Bearer <your-api-token>"

or

"Authorization": "Bearer '"$OPENAI_API_KEY"'"

this takes env var

kayx23 · 2024-09-27T07:58:08Z

docs/en/latest/plugins/ai-content-moderation.md

+
+```shell
+curl http://127.0.0.1:9080/post -i -XPOST  -H 'Content-Type: application/json' -d '{
+  "info": "<some very seriously profane message>"


Is this a dummy? Shouldn't the format to OpenAI be something like this?

"messages": [ { "role": "system", "content": "system prompt goes here" }, { "role": "user", "content": "offensive user prompts" } ]

kayx23 · 2024-09-27T08:05:02Z

docs/en/latest/plugins/ai-content-moderation.md

+request body exceeds toxicity threshold
+```
+
+Send a request with normal request body:


Suggested change

Send a request with normal request body:

Send a request with compliant content in the request body:

kayx23 · 2024-09-27T08:08:13Z

docs/en/latest/plugins/ai-content-moderation.md

+Send a request with normal request body:
+
+```shell
+curl http://127.0.0.1:9080/post -i -XPOST  -H 'Content-Type: application/json' -d 'APISIX is wonderful'


The opening paragraph says This plugin must be used in routes that proxy requests to LLMs only yet the example does not involve proxying to LLM. It feels a bit self-conflicting.

The example demonstrates exactly the integration could be used for general purpose and checking requests NOT proxying to LLM.

kayx23 · 2024-09-29T07:25:11Z

docs/en/latest/plugins/ai-content-moderation.md

+| provider.aws_comprehend.secret_access_key | Yes          | String   | AWS secret access key                                                                                                                    |
+| provider.aws_comprehend.region            | Yes          | String   | AWS region                                                                                                                               |
+| provider.aws_comprehend.endpoint          | No           | String   | AWS Comprehend service endpoint. Must match the pattern `^https?://`                                                                     |
+| moderation_categories                     | No           | Object   | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT |


Suggested change

| moderation_categories | No | Object | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT |

| moderation_categories | No | Object | Key-value pairs of moderation category and their score. In each pair, the key should be one of the `PROFANITY`, `HATE_SPEECH`, `INSULT`, `HARASSMENT_OR_ABUSE`, `SEXUAL`, or `VIOLENCE_OR_THREAT`; and the value should be between 0 and 1 (inclusive). |

kayx23 · 2024-09-29T07:26:56Z

docs/en/latest/plugins/ai-content-moderation.md

+| provider.aws_comprehend.region            | Yes          | String   | AWS region                                                                                                                               |
+| provider.aws_comprehend.endpoint          | No           | String   | AWS Comprehend service endpoint. Must match the pattern `^https?://`                                                                     |
+| moderation_categories                     | No           | Object   | Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT |
+| toxicity_level                            | No           | Number   | Threshold for overall toxicity detection. Range: 0 - 1. Default: 0.5                                                                     |


kayx23 · 2024-09-30T07:13:05Z

docs/en/latest/plugins/ai-content-moderation.md

+
+**_This plugin must be used in routes that proxy requests to LLMs only._**
+
+**_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._**


Suggested change

**_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._**

**_As of now, the plugin only supports the integration with [AWS Comprehend](https://aws.amazon.com/comprehend/) for content moderation. PRs for introducing support for other service providers are welcomed._**

zhoujiexiong · 2024-09-30T14:10:09Z

apisix/plugins/ai-content-moderation.lua

+function _M.check_schema(conf)
+    return core.schema.check(schema, conf)
+end
+


Two blank lines between functions?

zhoujiexiong · 2024-10-01T10:35:26Z

apisix/plugins/ai-content-moderation.lua

+    type = "object",
+    properties = {
+        provider = {
+            type = "object",


Suggested change

type = "object",

type = "object",

maxProperties = 1,

To make sure next(conf.provider) always returns aws_comprehend

zhoujiexiong · 2024-10-01T10:41:15Z

apisix/plugins/ai-content-moderation.lua

+        return bad_request, "messages not found in request body"
+    end
+
+    local provider = conf.provider[next(conf.provider)]


The current schema definition does not seem to be able to prevent multiple properties from being entered incorrectly. It is recommended that you consider adding a maxProperties = 1 constraint to the schema.

zhoujiexiong · 2024-10-08T03:26:23Z

LGTM

shreemaan-abhishek added 10 commits August 30, 2024 09:06

feat: content-moderation plugin

c80f932

fix lint

d29f928

Merge branch 'master' of github.com:apache/apisix into content-modera…

8ac0738

…tion

change priority in lua file

7ffa489

change priority and plugins.t

5214d0d

lint fix

2fe1ea2

upgrade luarocks version

5cecf2a

add docs

cf08f04

format doc

460081e

add to config.json

d475eb0

shreemaan-abhishek marked this pull request as ready for review August 30, 2024 16:32

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. doc Documentation things plugin labels Aug 30, 2024

nic-6443 reviewed Sep 2, 2024

View reviewed changes

apisix/plugins/content-moderation.lua Outdated Show resolved Hide resolved

nic-6443 reviewed Sep 2, 2024

View reviewed changes

apisix/plugins/content-moderation.lua Outdated Show resolved Hide resolved

nic-6443 reviewed Sep 2, 2024

View reviewed changes

apisix/plugins/content-moderation.lua Outdated Show resolved Hide resolved

shreemaan-abhishek added 4 commits September 2, 2024 09:30

update doc

6350d15

cleanup

f713f87

support ai-model based moderation

e16a823

support secrets

b21b64b

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Sep 2, 2024

shreemaan-abhishek marked this pull request as draft September 2, 2024 07:39

shreemaan-abhishek added 4 commits September 2, 2024 14:40

rename to ai-content-moderation

ee34e37

modularise on basis of provider

12529f0

rename

57c59ab

cleanup

093d7a9

shreemaan-abhishek marked this pull request as ready for review September 10, 2024 08:33

shreemaan-abhishek commented Sep 10, 2024

View reviewed changes

add service provider related info

f3672fa

kayx23 reviewed Sep 27, 2024

View reviewed changes

kayx23 reviewed Sep 29, 2024

View reviewed changes

kayx23 mentioned this pull request Sep 30, 2024

feat: ai-rag plugin #11568

Merged

5 tasks

kayx23 reviewed Sep 30, 2024

View reviewed changes

zhoujiexiong suggested changes Oct 1, 2024

View reviewed changes

shreemaan-abhishek added 4 commits October 3, 2024 18:06

update with LLM proxy

8447d6d

suggestions

0949327

cleanup

3a616e1

cleanup lua

4c1f2a6

bzp2010 self-requested a review October 9, 2024 03:31

shreemaan-abhishek added 4 commits October 9, 2024 11:47

conf ssl_verify

5b1be91

cleanup

a3e47b2

toxicity_level -> moderation_threshold

81958e4

rm redundant file

3da00a2

bzp2010 approved these changes Oct 9, 2024

View reviewed changes

Revolyssup self-requested a review October 9, 2024 19:08

Revolyssup approved these changes Oct 9, 2024

View reviewed changes

shreemaan-abhishek requested review from nic-6443, kayx23 and membphis October 10, 2024 04:00

moonming approved these changes Oct 10, 2024

View reviewed changes

shreemaan-abhishek merged commit 695ea3c into apache:master Oct 10, 2024
33 checks passed

shreemaan-abhishek deleted the content-moderation branch October 10, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ai-content-moderation plugin #11541

feat: ai-content-moderation plugin #11541

shreemaan-abhishek commented Aug 30, 2024 •

edited

Loading

shreemaan-abhishek Sep 10, 2024

zhoujiexiong Sep 11, 2024

kayx23 Sep 27, 2024

kayx23 Sep 27, 2024

shreemaan-abhishek Sep 30, 2024

kayx23 Sep 30, 2024

kayx23 Sep 27, 2024 •

edited

Loading

kayx23 Sep 27, 2024 •

edited

Loading

kayx23 Sep 27, 2024

kayx23 Sep 27, 2024 •

edited

Loading

kayx23 Sep 29, 2024 •

edited

Loading

kayx23 Sep 29, 2024

kayx23 Sep 30, 2024

zhoujiexiong Sep 30, 2024

shreemaan-abhishek Oct 8, 2024

zhoujiexiong Oct 1, 2024

shreemaan-abhishek Oct 8, 2024

zhoujiexiong Oct 1, 2024

zhoujiexiong commented Oct 8, 2024

		@@ -334,6 +335,26 @@ function _M.get_body(max_size, ctx)
		end


		function _M.get_body_table()


		The `ai-content-moderation` plugin processes the request body to check for toxicity and rejects the request if it exceeds the configured threshold.

		_This plugin must be used in routes that proxy requests to LLMs only._

	"Authorization": "Bearer token"
	"Authorization": "Bearer <your-api-token>"

	Send a request with normal request body:
	Send a request with compliant content in the request body:

	\| moderation_categories \| No \| Object \| Configuration for moderation categories. Must be one of: PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, VIOLENCE_OR_THREAT \|
	\| moderation_categories \| No \| Object \| Key-value pairs of moderation category and their score. In each pair, the key should be one of the `PROFANITY`, `HATE_SPEECH`, `INSULT`, `HARASSMENT_OR_ABUSE`, `SEXUAL`, or `VIOLENCE_OR_THREAT`; and the value should be between 0 and 1 (inclusive). \|

	\| toxicity_level \| No \| Number \| Threshold for overall toxicity detection. Range: 0 - 1. Default: 0.5 \|
	\| toxicity_level \| No \| Number \| The degree to which content is harmful, offensive, or inappropriate. A higher value indicates more toxic content allowed. Range: 0 - 1. Default: 0.5 \|


		_This plugin must be used in routes that proxy requests to LLMs only._

		_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._

	_As of now only the AWS Comprehend service is supported for content moderation, PRs for introducing support for other service providers are welcome._
	_As of now, the plugin only supports the integration with [AWS Comprehend](https://aws.amazon.com/comprehend/) for content moderation. PRs for introducing support for other service providers are welcomed._

feat: ai-content-moderation plugin #11541

feat: ai-content-moderation plugin #11541

Conversation

shreemaan-abhishek commented Aug 30, 2024 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kayx23 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

kayx23 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kayx23 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

kayx23 Sep 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhoujiexiong commented Oct 8, 2024

shreemaan-abhishek commented Aug 30, 2024 •

edited

Loading

kayx23 Sep 27, 2024 •

edited

Loading

kayx23 Sep 27, 2024 •

edited

Loading

kayx23 Sep 27, 2024 •

edited

Loading

kayx23 Sep 29, 2024 •

edited

Loading