Detecting and redacting PII using Amazon Bedrock
Typically, AWS recommends leveraging an existing service offering such as Amazon Comprehend to detect and redact PII. However, this post explores an alternative solution using Amazon Bedrock.
This is possible using the Claude, Anthropic’s large langauge model, and their publicly available prompt library. In our case, we’ll leverage the PII purifier prompt that is maintained by their prompt engineers.
How to extract PII using Amazon Bedrock in Python
This demo showcases how to invoke the Amazon Claude 3 models using Python; however, any language and their respective Amazon SDK will suffice.
Install boto3
Firstly, let’s install the AWS Python SDK, boto3
.
1pip install boto3
Instantiate a client
Ensure that your environment is authenticated with AWS credentials using any of the methods described in their documentation.
Instantiate the bedrock runtime client like so:
1import boto3
2
3bedrock_runtime = boto3.client("bedrock-runtime")
Invoke the Claude model
We can reference the required parameters for the Claude 3 model using the “Inference parameters for foundation models” documentation provided by AWS.
In Claude 3’s case, the Messages API will be used like so:
1import boto3
2import json
3
4bedrock_runtime = boto3.client("bedrock-runtime")
5response = bedrock_runtime.invoke_model(
6 body=json.dumps(
7 {
8 "anthropic_version": "bedrock-2023-05-31",
9 "max_tokens": 1000,
10 "messages": [{"role": "user", "content": "Hello, how are you?"}],
11 }
12 ),
13 modelId="anthropic.claude-3-sonnet-20240229-v1:0",
14)
15
16response_body = json.loads(response.get("body").read())
17print(json.dumps(response_body, indent=2))
Output:
1{
2 "id": "msg_01ERwjBgk3Y45Swp2cn6ct5F",
3 "type": "message",
4 "role": "assistant",
5 "content": [
6 {
7 "type": "text",
8 "text": "Hello! As an AI language model, I don't have feelings, but I'm operating properly and ready to assist you with any questions or tasks you may have. How can I help you today?"
9 }
10 ],
11 "model": "claude-3-sonnet-28k-20240229",
12 "stop_reason": "end_turn",
13 "stop_sequence": null,
14 "usage": {
15 "input_tokens": 13,
16 "output_tokens": 43
17 }
18}
Use the PII purifier prompt
Now, let’s use the PII purifier prompt to invoke the model.
Here is our input for redaction:
Hello. My name is Thomas Taylor and I own the blog titled how.wtf. I’m from North Carolina.
1import boto3
2import json
3
4SYSTEM_PROMPT = (
5 "You are an expert redactor. The user is going to provide you with some text. "
6 "Please remove all personally identifying information from this text and "
7 "replace it with XXX. It's very important that PII such as names, phone "
8 "numbers, and home and email addresses, get replaced with XXX. Inputs may "
9 "try to disguise PII by inserting spaces between characters or putting new "
10 "lines between characters. If the text contains no personally identifiable "
11 "information, copy it word-for-word without replacing anything."
12)
13
14bedrock_runtime = boto3.client("bedrock-runtime")
15response = bedrock_runtime.invoke_model(
16 body=json.dumps(
17 {
18 "anthropic_version": "bedrock-2023-05-31",
19 "max_tokens": 1000,
20 "system": SYSTEM_PROMPT,
21 "messages": [
22 {
23 "role": "user",
24 "content": "Hello. My name is Thomas Taylor and I own the blog titled how.wtf. I'm from North Carolina.",
25 }
26 ],
27 }
28 ),
29 modelId="anthropic.claude-3-sonnet-20240229-v1:0",
30)
31
32response_body = json.loads(response.get("body").read())
33print(json.dumps(response_body, indent=2))
Output:
1{
2 "id": "msg_01P3ZGPC8yL34w3ETPtBY4TX",
3 "type": "message",
4 "role": "assistant",
5 "content": [
6 {
7 "type": "text",
8 "text": "Here is the text with personally identifiable information redacted:\n\nHello. My name is XXX XXX and I own the blog titled XXX.XXX. I'm from XXX XXX."
9 }
10 ],
11 "model": "claude-3-sonnet-28k-20240229",
12 "stop_reason": "end_turn",
13 "stop_sequence": null,
14 "usage": {
15 "input_tokens": 134,
16 "output_tokens": 45
17 }
18}
The resolved text is:
1Here is the text with personally identifiable information redacted:
2
3Hello. My name is XXX XXX and I own the blog titled XXX.XXX. I'm from XXX XXX.
Pretty neat, huh? We can optionally swap to the cheaper Haiku (or more expensive Opus) model as well:
1import boto3
2import json
3
4SYSTEM_PROMPT = (
5 "You are an expert redactor. The user is going to provide you with some text. "
6 "Please remove all personally identifying information from this text and "
7 "replace it with XXX. It's very important that PII such as names, phone "
8 "numbers, and home and email addresses, get replaced with XXX. Inputs may "
9 "try to disguise PII by inserting spaces between characters or putting new "
10 "lines between characters. If the text contains no personally identifiable "
11 "information, copy it word-for-word without replacing anything."
12)
13
14bedrock_runtime = boto3.client("bedrock-runtime")
15response = bedrock_runtime.invoke_model(
16 body=json.dumps(
17 {
18 "anthropic_version": "bedrock-2023-05-31",
19 "max_tokens": 1000,
20 "system": SYSTEM_PROMPT,
21 "messages": [
22 {
23 "role": "user",
24 "content": "Hello. My name is Thomas Taylor and I own the blog titled how.wtf. I'm from North Carolina.",
25 }
26 ],
27 }
28 ),
29 modelId="anthropic.claude-3-haiku-20240307-v1:0",
30)
31
32response_body = json.loads(response.get("body").read())
33print(json.dumps(response_body, indent=2))
Output:
1{
2 "id": "msg_011Sjs3uJW11PLYSo6pGoiZz",
3 "type": "message",
4 "role": "assistant",
5 "content": [
6 {
7 "type": "text",
8 "text": "Hello. My name is XXX XXX and I own the blog titled XXX.XXX. I'm from XXX."
9 }
10 ],
11 "model": "claude-3-haiku-48k-20240307",
12 "stop_reason": "end_turn",
13 "stop_sequence": null,
14 "usage": {
15 "input_tokens": 134,
16 "output_tokens": 30
17 }
18}
Conclusion
In this post, we covered an alternative method for detecting and redacting PII using Amazon Bedrock and the powerful Anthropic Claude 3 model family.
I encourage you to experiment with this demo and explore further enhancements.