Getting Started with GuardrailsOrchestrator

GuardrailsOrchestrator is a service for large language model guardrailing underpinned by the open-source project fms-guardrails-orchestrator. GuardrailsOrchestrator is a component of the TrustyAI Kubernetes Operator. In this tutorial, you will learn how to create a GuardrailsOrchestrator CR to perform detections on text generation output.

GuardrailsOrchestrator is available in RHOAI 2.19+ via KServe Raw Deployment mode.

In order to use it on Open Data Hub or OpenShift AI, first enable KServe Raw Deployment. In the DataScienceIntialization resource, set the value of managementState for the serviceMesh component to Removed.

---
serviceMesh:
auth:
    audiences:
    - 'https://kubernetes.default.svc'
controlPlane:
    metricsCollection: Istio
    name: data-science-smcp
    namespace: istio-system
managementState: Removed
---

Next, in the DataScienceCluster resource,under the spec.components section, set the value of of kserve.serving.managementState to Removed.

The GuardrailsOrchestrator Service

The GuardrailsOrchestrator service defines a new Custom Resource Definition named GuardrailsOrchestrator. GuardrailsOrchestrator objects are monitored by the TrustyAI Kubernetes operator. A GuardrailsOrchestrator object represents an orchestration service that invokes detectors on text generation input/output and standalone detections.

Here is a minimal example of a GuardrailsOrchestrator object:

---
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: GuardrailsOrchestrator
metadata:
  name: gorch-sample
spec:
  orchestratorConfig: "fms-orchestr8-config-nlp" (1)
  replicas: 1 (2)
---

1	The orchestratorConfig field specifies a ConfigMap object that contains generator, detector, and chunker arguments.
2	The replicas field specifies the number of replicas for the orchestrator.

Here is a minimal example of an orchestratorConfig object:

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: fms-orchestr8-config-nlp
data:
  config.yaml: |
    generation: (1)
      service:
        hostname: llm-predictor.guardrails-test.svc.cluster.local
        port: 8033
    detectors: (2)
      hap <2.1>:
        service:
          hostname: http:/detector-host/api/v1/text/contents
          port: 8000
        chunker_id: whole_doc_chunker
        default_threshold: 0.5
---

1	The generation field specifies the hostname and port of the Large Language Model (LLM) predictor service.
2	The detectors field specifies the name, hostname, and port of the detector service, the chunker ID, and the default threshold. <2.1> The name of the detector. In this example, we are specifiying it as a Hateful and Profance (HAP) detector.

After you apply the example orchestratorConfig ConfigMap and GuardrailsOrchestrator CR above, you can guardrail against your LLM inputs and outputs:

Verify the orchestrator pod is up and running:

---
oc get pods -n <TEST_NAMESPACE> | grep gorch-sample
---

The expected output is:

---
gorch-sample-6776b64c58-xrxq9                    3/3     Running   0          4h19m
---

Retrieve the external HTTP route for the orchestrator:

---
GORCH_ROUTE_HTTP=$(oc get routes gorch-sample-http -o jsonpath='{.spec.host}' -n <TEST_NAMESPACE>)
---

Send a request to the v2/chat/completions-detection endpoint, specifying detections against HAP content in input text and generated outputs.

---
curl -X 'POST' \
 "https://$GORCH_ROUTE_HTTP/api/v2/chat/completions-detection" \
 -H 'accept: application/json' \
 -H 'Content-Type: application/json' \
 -d '{
   "model": "llm",
   "messages": [
       {
           "content": "You dotard, I really hate this stuff",
           "role": "user"
       }
   ],
   "detectors": {
       "input": {
           "hap": {}
       },
       "output": {
           "hap": {}
       }
   }
}'
---

Example output with HAP content detected:

---
{"id":"086980692dc1431f9c32cd56ba607067",
  "object":"",
  "created":1743084024,
  "model":"llm",
  "choices":[],"
  usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0},
  "detections":{
    "input":[{
      "message_index":0,
      "results":[{
        "start":0,"end":36,"text":"<explicit_text>, I really hate this stuff",
        "detection":"sequence_classifier",
        "detection_type": "sequence_classification",
        "detector_id":"hap",
        "score":0.9634239077568054
        }]
      }]
    },
  "warnings":[{
    "type":"UNSUITABLE_INPUT",
    "message":"Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed."
  }]
}
---

Details of GuardrailsOrchestrator

In this section, let’s review all the possible parameters for the GuardrailsOrchestrator object and their usage.

Parameter Description

Parameter	Description
`replicas`	The number of orchestrator pods to spin up
`orchestratorConfig`	The name of the ConfigMap object that contains generator, detector, and chunker arguments
`enableRegexDetectors (optional)`	Boolean value to inject the regex detector sidecar container into the orchestrator pod. The regex detector is a lightweight HTTP server designed to parse text using predefined patterns or custom regular expressions.
`enableGuardrailsGateway (optional)`	Boolean value to enable controlled interaction with the orchestrator service by enforcing stricter access to its exposed endpoints. It provides a mechanism of configuring fixed detector pipelines, and then provides a unique /v1/chat/completions endpoint per configured detector pipeline.
`guardrailsGatewayConfig (optional)`	The name of the ConfigMap object that specifies gateway configurations
`otelExporter (optional)`	List of paired name and value arguments for configuring OpenTelemetry traces and/or metrics `protocol` - sets the protocol for all the OTLP endpoints. Acceptable values are `grpc` or`http` `tracesProtocol` - overrides the protocol for traces. Acceptable values are `grpc` or `http` `metricsProtocol` - overrides the protocol for metrics. Acceptable values are either `grpc` or `http` `otlpEndpoint` - sets the OTLP endpoint. Defaults are `gRPC localhost:4317` and `HTTP localhost:4318` `metricsEndpoint` - overrides the OTLP endpoint for metrics `tracesEndpoint` - overrides the OTLP endpoint for traces `otlpExport` - specifies a list of data types to export. Acceptable values are `traces`, `metrics`, or `traces,metrics`

replicas

The number of orchestrator pods to spin up

orchestratorConfig

The name of the ConfigMap object that contains generator, detector, and chunker arguments

enableRegexDetectors (optional)

Boolean value to inject the regex detector sidecar container into the orchestrator pod. The regex detector is a lightweight HTTP server designed to parse text using predefined patterns or custom regular expressions.

enableGuardrailsGateway (optional)

Boolean value to enable controlled interaction with the orchestrator service by enforcing stricter access to its exposed endpoints. It provides a mechanism of configuring fixed detector pipelines, and then provides a unique /v1/chat/completions endpoint per configured detector pipeline.

guardrailsGatewayConfig (optional)

The name of the ConfigMap object that specifies gateway configurations

otelExporter (optional)

List of paired name and value arguments for configuring OpenTelemetry traces and/or metrics

protocol - sets the protocol for all the OTLP endpoints. Acceptable values are grpc or`http`
tracesProtocol - overrides the protocol for traces. Acceptable values are grpc or http
metricsProtocol - overrides the protocol for metrics. Acceptable values are either grpc or http
otlpEndpoint - sets the OTLP endpoint. Defaults are gRPC localhost:4317 and HTTP localhost:4318
metricsEndpoint - overrides the OTLP endpoint for metrics
tracesEndpoint - overrides the OTLP endpoint for traces
otlpExport - specifies a list of data types to export. Acceptable values are traces, metrics, or traces,metrics

Optional Configurations for GuardrailsOrchestrator

Configuring the Regex Detector and Guardrails Gateway

The regex detector and guardrails gateway are two sidecar images that can be used with the GuardrailsOrchestrator service, either individually or together. They are enabled via the GuardrailsOrchestrator CR.

Here is an example of a GuardrailsOrchestrator CR that references the regex detector and guardrails gateway images:

---
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: GuardrailsOrchestrator
metadata:
  name: gorch-sample
spec:
  orchestratorConfig: "fms-orchestr8-config-nlp"
  enableBuiltInDetectors: True (1)
  enableGuardrailsGateway: True (2)
  guardrailsGatewayConfig: "fms-orchestr8-config-gateway" (3)
  replicas: 1
---

1	The enabledBuiltInDetectors, if set to True, injects regex detectors as a sidecar container into the orchestrator pod
2	The enableGuardrailsGateway, if set to True, injects the vLLM gateway as a sidecar conatiner into the orchestrator pod
3	The guardrailsGatewayConfig field specifies a ConfigMap that reroutes the orchestrator and regex detector routes to specific paths

Here is an example orchestratorConfig named fms-orchestr8-config-nlp. Please take note that it differs from the previous example:

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: fms-orchestr8-config-nlp
data:
  config.yaml: |
    chat_generation:
      service:
        hostname: llm-predictor.guardrails-test.svc.cluster.local
        port: 8032
    detectors:
      regex:
        type: text_contents
        service:
            hostname: "127.0.0.1"
            port: 8080
        chunker_id: whole_doc_chunker
        default_threshold: 0.5
---

Here is an example of a guardrailsGatewayConfig named fms-orchestr8-config-gateway:

---
kind: ConfigMap
apiVersion: v1
metadata:
  name: fms-orchestr8-config-gateway
  labels:
    app: fmstack-nlp
data:
  config.yaml: |
    orchestrator:
      host: "localhost"
      port: 8032
    detectors:
      - name: regex
        detector_params:
          input: true
          output: true
          regex:
            - email
            - ssn
      - name: other_detector
    routes:
      - name: pii
        detectors:
          - regex
      - name: passthrough
        detectors:
---

Let’s review all the required arguments for the guardrailsGatewayConfig:

Parameter Description

Parameter	Description
`orchestrator`	The orchestrator service
`detectors`	A list of preconfigured regexes for common detection actions
`routes`	The resulting endpoints for detections

orchestrator

The orchestrator service

detectors

A list of preconfigured regexes for common detection actions

routes

The resulting endpoints for detections

Send a request to the /v1/chat/completions endpoint, specifying detections against PII content in input text and generated outputs.

---
curl "https://$GORCH_ROUTE_HTTP/pii/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-1.5B-Instruct",
        "messages": [
            {
                "role": "user",
                "content": "say hello to me at someemail@somedomain.com"
            },
            {
                "role": "user",
                "content": "btw here is my social 123456789"
            }
        ]
    }'
---

Example output with PII content detected:

---
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "audio": null,
        "content": "I'm sorry, I'm afraid I can't do that.",
        "refusal": null,
        "role": "assistant",
        "tool_calls": null
      }
    }
  ],
  "created": 1741182848,
  "detections": {
    "input": null,
    "output": [
      {
        "choice_index": 0,
        "results": [
          {
            "detection": "EmailAddress",
            "detection_type": "pii",
            "detector_id": "regex-language",
            "end": 176,
            "score": 1.0,
            "start": 152,
            "text": "someemail@somedomain.com"
          }
        ]
      }
    ]
  },
  "id": "16a0abbf4b0c431e885be5cfa4ff1c4b",
  "model": "Qwen/Qwen2.5-1.5B-Instruct",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 83,
    "prompt_tokens": 61,
    "total_tokens": 144
  },
  "warnings": [
    {
      "message": "Unsuitable output detected.",
      "type": "UNSUITABLE_OUTPUT"
    }
  ]
}
---

Configuring the OpenTelemetry Exporter for Metrics & Tracing

Traces and metrics are provided for the observability of the GuardrailsOrchestrator service via the OpenTelemetry Operator.

Pre-requisites:

Install the Red Hat OpenShift distributed tracing platform from the OperatorHub. Create a Jaeger instance using the default settings.
Install the Red Hat build of OpenTelemetry from the OperatorHub. Create an OpenTelemetry instance

Here is a minimal example of a GuardrailsOrchestrator object that has OpenTelemetry configured:

---
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: GuardrailsOrchestrator
metadata:
  name: gorch-test
spec:
  orchestratorConfig: "fms-orchestr8-config-nlp"
  replicas: 1
  otelExporter:
    protocol: "http"
    otlpEndpoint: "localhost:4318"
    otlpExport: "metrics"
---