App Usage

Initial Setup

After downloading the App, click the gear icon to open the settings and set them based on your Backend configuration (see below).

App Settings

OpenAI API Key

In order to use the STT and TTS functionality, it is required to provide an OpenAI API Token. Generate a key here: https://platform.openai.com/api-keys (requires login and purchased credits)

Webhook URL

The registered Webhook which should be called to communicate with the custom Agent.

Webhook Header Auth Value

A credential for securing the webhook. The Authentication Method is Header Auth (Key: Authorization)

Send and Receive Messages

To speak to the custom AI Agent, tap the microphone icon. After recording your voice, click the stop icon. It may take a few seconds until the voice has been transcribed, sent to the Agent through the webhook and the response has been processed. During that time the original message is being added to the chat history.

Finally the spoken version (defined in the Webhook response) of the resulting message will be read out loud. The text version is being added to the chat history.

Interrupt Speech

When the response is read out load, the speech can be interrupted by tapping on the wave animation.

Reset Session

To start a new session, click the three-dot-menu and click on “Start new Session”. It clears the chat history and generates a new Session ID which is sent to the custom AI Agent.

Webhook Configuration

Request Method

HTTP Request Method: POST

Authentication

Header Auth is required.

Key: “Authorization”

Value: String, eg. UUID

Example:

"authorization": "f077852d-44ac-42ae-a1df-2dbd94688333"

Payload

Contains JSON object

  • body.prompt contains the chat message converted from speech
  • body.sessionID contains a random UUID which stays the same until the user resets the session in the App

Example:

{
    "body": {
        "prompt": "Could you block focus time for tomorrow between 2 and 4PM?",
        "sessionID": "1456015c-0b90-4730-9116-3cb5165ae137"
    }
}

Response

Requires a JSON object

  • response.text is the message, which should be displayed as text (supports Markdown)
  • response.speech is the message, which should be converted to speech

Example:   

{
    "response": {
        "text": "Here is the event. Shall I create it?\n\n> **Focus Time** \n> September 20, 2024 \n> 3:00pm-4:00pm",
        "speech": "Here is the event. Shall I create it?"
    }
}


   

Multi-Agent Template for n8n

Start with a simple template, which contains all fundamentals to connect it to other (existing) agents.

Setup n8n

Cloud version (affiliate): https://n8n.partnerlinks.io/e9nxy47g2jt2

Self-host: https://docs.n8n.io/hosting/

Clone & setup template

Copy the template from here and follow the instructions given on the canvas:

Click the orange link below the viewer to reveal the code and paste it into a new workflow in n8n.