Amazon Bedrock Runtime examples using boto3

2023-12-17 · Thomas Taylor

how to invoke amazon bedrock runtime using boto3

Amazon Bedrock is a managed service provided by AWS that provides foundational models at your fingertips through a unified API. The service offers a range of features including foundational model invocations, fine-tuning, agents, guardrails, knowledge base searching, and more!

To read more about the service offerings, refer to its documentation.

What is Amazon Bedrock Runtime?

For this article, we’ll dive into using the Python AWS SDK, boto3, to call foundational models. Amazon Bedrock Runtime is the API entry point for invoking foundational models.

As of time of writing, Dec. 16th 2023, the supported API actions are:

InvokeModel
InvokeModelWithResponseStream

How to call models using Python boto3

Installing boto3

Invoking a foundational model using the InvokeModel API call is easy in Python! To begin, ensure that you have boto3 installed:

1pip3 install boto3

How to use the invoke_model method

Let’s instantiate the Amazon Bedrock Runtime client using boto3:

1import boto3
2
3client = boto3.client("bedrock-runtime")

then lookup the inference parameters needed for the body here. For the sake of this example, I’m invoking the Anthropic Claude model. Its inference parameters are:

1{
2    "prompt": "\n\nHuman:<prompt>\n\nAssistant:",
3    "temperature": 0.5,
4    "top_p": 1,
5    "top_k": 250,
6    "max_tokens_to_sample": 200,
7    "stop_sequences": ["\n\nHuman:"]
8}

The bare minimum request for Claude is:

1{
2    "prompt": "\n\nHuman:<prompt>\n\nAssistant:",
3    "max_tokens_to_sample": 200
4}

The body passed into the client must be a file or bytes. For Claude, the contentType as application/json.

 1import json
 2
 3import boto3
 4
 5client = boto3.client("bedrock-runtime", region_name="us-east-1")
 6
 7body = json.dumps(
 8    {
 9        "prompt": "\n\nHuman:What is your name?\n\nAssistant:",
10        "max_tokens_to_sample": 200,
11    }
12).encode()
13
14response = client.invoke_model(
15    body=body,
16    modelId="anthropic.claude-v2",
17    accept="application/json",
18    contentType="application/json",
19)
20
21response_body = json.loads(response["body"].read())
22print(response_body)

Here is the line-by-line breakdown of the code above:

Instantiate the boto3.client("bedrock-runtime") client with a region_name of us-east-1.
json.dumps a dictionary of Claude’s required inference parameters
Transform the JSON string into bytes using the .encode() method
Invoke the model by specifying the body (bytes), model id, accept, and contentType parameters.
Convert the StreamingBody to a JSON encoded string using .read() then turn the JSON string into a dictionary using json.loads

For more information about StreamingBody, refer to its documentation here. To keep it simple, we simply need to use the read method on the response body to get its contents.

Output:

1{'completion': ' My name is Claude.', 'stop_reason': 'stop_sequence', 'stop': '\n\nHuman:'}

The Claude JSON output includes a completion attribute with the text.

Here’s how to grab that information:

 1import json
 2
 3import boto3
 4
 5client = boto3.client("bedrock-runtime", region_name="us-east-1")
 6
 7body = json.dumps(
 8    {
 9        "prompt": "\n\nHuman:What is your name?\n\nAssistant:",
10        "max_tokens_to_sample": 200,
11    }
12).encode()
13
14response = client.invoke_model(
15    body=body,
16    modelId="anthropic.claude-v2",
17    accept="application/json",
18    contentType="application/json",
19)
20
21response_body = json.loads(response["body"].read())
22completion = response_body["completion"].strip()
23print(completion)

Output:

1My name is Claude.

How to stream models responses using Python boto3

Amazon Bedrock allows streaming LLM responses as well!

How to use the invoke_model with response stream method

Using the same example from above, let’s use the invoke_model_with_response_stream method:

 1import json
 2
 3import boto3
 4
 5client = boto3.client("bedrock-runtime", region_name="us-east-1")
 6
 7body = json.dumps(
 8    {
 9        "prompt": "\n\nHuman:Write me a 100 word essay about snickers candy bars\n\nAssistant:",
10        "max_tokens_to_sample": 200,
11    }
12).encode()
13
14response = client.invoke_model_with_response_stream(
15    body=body,
16    modelId="anthropic.claude-v2",
17    accept="application/json",
18    contentType="application/json",
19)
20
21stream = response["body"]
22if stream:
23    for event in stream:
24        chunk = event.get("chunk")
25        if chunk:
26            print(json.loads(chunk.get("bytes").decode()))

Output:

 1{'completion': ' Here', 'stop_reason': None, 'stop': None}
 2{'completion': ' is a 100 word essay', 'stop_reason': None, 'stop': None}
 3{'completion': ' about Snickers candy', 'stop_reason': None, 'stop': None}
 4{'completion': ' bars:\n\nS', 'stop_reason': None, 'stop': None}
 5{'completion': 'nickers is one of', 'stop_reason': None, 'stop': None}
 6{'completion': ' the most popular candy bars around. Introdu', 'stop_reason': None, 'stop': None}
 7{'completion': 'ced in 1930, it consists of nougat topped with', 'stop_reason': None, 'stop': None}
 8{'completion': ' caramel and peanuts that is encased in milk chocolate', 'stop_reason': None, 'stop': None}
 9{'completion': '. With its sweet and salty taste profile,', 'stop_reason': None, 'stop': None}
10{'completion': ' Snickers provides the perfect balance of flavors. The candy', 'stop_reason': None, 'stop': None}
11{'completion': " bar got its name from the Mars family's", 'stop_reason': None, 'stop': None}
12{'completion': ' favorite horse. Bite', 'stop_reason': None, 'stop': None}
13{'completion': ' into a Snickers and the rich', 'stop_reason': None, 'stop': None}
14{'completion': ' chocolate and caramel intermingle in your mouth while the', 'stop_reason': None, 'stop': None}
15{'completion': ' crunch of peanuts adds text', 'stop_reason': None, 'stop': None}
16{'completion': 'ural contrast. Loaded with sugar, Snick', 'stop_reason': None, 'stop': None}
17{'completion': 'ers gives you a quick burst of energy. It', 'stop_reason': None, 'stop': None}
18{'completion': "'s a classic candy bar that has endured for", 'stop_reason': None, 'stop': None}
19{'completion': ' decades thanks to its irresistible combination', 'stop_reason': None, 'stop': None}
20{'completion': ' of chocolate,', 'stop_reason': None, 'stop': None}
21{'completion': ' caramel, noug', 'stop_reason': None, 'stop': None}
22{'completion': "at and peanuts. Snickers' popularity shows", 'stop_reason': None, 'stop': None}
23{'completion': ' no signs of waning anytime soon.', 'stop_reason': 'stop_sequence', 'stop': '\n\nHuman:', 'amazon-bedrock-invocationMetrics': {'inputTokenCount': 21, 'outputTokenCount': 184, 'invocationLatency': 8756, 'firstByteLatency': 383}}

Instead of returning a StreamingBody like before, the response["body"] is an EventStream that can be iterated over in chunks.

#Aws #Python #Generative-Ai

Reply to this post by email ↪