App Usage
Initial Setup
After downloading the App, click the gear icon to open the settings and set them based on your Backend configuration (see below).
App Settings
OpenAI API Key
In order to use the STT and TTS functionality, it is required to provide an OpenAI API Token. Generate a key here: https://platform.openai.com/api-keys (requires login and purchased credits)
Webhook URL
The registered Webhook which should be called to communicate with the custom Agent.
Webhook Header Auth Value
A credential for securing the webhook. The Authentication Method is Header Auth (Key: Authorization)
Send and Receive Messages
To speak to the custom AI Agent, tap the microphone icon. After recording your voice, click the stop icon. It may take a few seconds until the voice has been transcribed, sent to the Agent through the webhook and the response has been processed. During that time the original message is being added to the chat history.
Finally the spoken version (defined in the Webhook response) of the resulting message will be read out loud. The text version is being added to the chat history.
Interrupt Speech
When the response is read out load, the speech can be interrupted by tapping on the wave animation.
Reset Session
To start a new session, click the three-dot-menu and click on “Start new Session”. It clears the chat history and generates a new Session ID which is sent to the custom AI Agent.
Webhook Configuration
Request Method
HTTP Request Method: POST
Authentication
Header Auth is required.
Key: “Authorization”
Value: String, eg. UUID
Example:
"authorization": "f077852d-44ac-42ae-a1df-2dbd94688333"
Payload
Contains JSON object
- body.prompt contains the chat message converted from speech
- body.sessionID contains a random UUID which stays the same until the user resets the session in the App
Example:
{
"body": {
"prompt": "Could you block focus time for tomorrow between 2 and 4PM?",
"sessionID": "1456015c-0b90-4730-9116-3cb5165ae137"
}
}
Response
Requires a JSON object
- response.text is the message, which should be displayed as text (supports Markdown)
- response.speech is the message, which should be converted to speech
Example:
{
"response": {
"text": "Here is the event. Shall I create it?\n\n> **Focus Time** \n> September 20, 2024 \n> 3:00pm-4:00pm",
"speech": "Here is the event. Shall I create it?"
}
}
Multi-Agent Template for n8n
Start with a simple template, which contains all fundamentals to connect it to other (existing) agents.
Setup n8n
Cloud version (affiliate): https://n8n.partnerlinks.io/e9nxy47g2jt2
Self-host: https://docs.n8n.io/hosting/
Clone & setup template
Copy the template from here and follow the instructions given on the canvas:
Click the orange link below the viewer to reveal the code and paste it into a new workflow in n8n.