chat-about-video

Chat about a video clip using the powerful OpenAI GPT-4 Vision or GPT-4o.

chat-about-video is an open-source NPM package designed to accelerate the development of conversation applications about video content. Harnessing the capabilities of OpenAI GPT-4 Vision or GPT-4o services from Microsoft Azure or OpenAI, this package opens up a range of usage scenarios with minimal effort.

Key features:

ChatGPT models hosted in both Azure and OpenAI are supported.
Frame images are extracted from the input video, and uploaded for ChatGPT to consume.
It can automatically retry on receiving throttling (HTTP status code 429) responses from the API.
Options supported by the underlying API are exposed for customisation.

Usage scenarios

There are two approaches for feeding video content into GPT-4 Vision. chat-about-video supports both of them.

Frame image extraction:

Integrate GPT-4 Vision or GPT-4o from Microsoft Azure or OpenAI effortlessly.
Utilize ffmpeg integration provided by this package for frame image extraction or opt for a DIY approach.
Store frame images with ease, supporting Azure Blob Storage and AWS S3.
GPT-4 Vision hosted in Azure allows analysis of up to 10 frame images.
GPT-4 Vision or GPT-4o hosted in OpenAI allows analysis of more than 10 frame images.

Video indexing with Microsoft Azure:

Exclusively supported by GPT-4 Vision from Microsoft Azure.
Ingest videos seamlessly into Microsoft Azure's Video Retrieval Index.
Automatic extraction of up to 20 frame images using Video Retrieval Indexer.
Default integration of speech transcription for enhanced comprehension.
Flexible storage options with support for Azure Blob Storage and AWS S3.

Usage

Installation

Add chat-about-video as a dependency to your Node.js application using the following command:

npm i chat-about-video

Dependencies

If you intend to utilize ffmpeg for extracting video frame images, ensure it is installed on your system. You can install it using either a system package manager or a helper NPM package:

sudo apt install ffmpeg
# or
npm i @ffmpeg-installer/ffmpeg

If you plan to use Azure Blob Storage, include the following dependency:

npm i @azure/storage-blob

For using AWS S3, install the following dependencies:

npm i @handy-common-utils/aws-utils @aws-sdk/s3-request-presigner @aws-sdk/client-s3

Usage in code

To integrate chat-about-video into your Node.js application, follow these simple steps:

Instantiate the ChatAboutVideo class by creating an instance. The constructor allows you to pass in configuration options.

Most configuration options come with sensible default values, but you can specify your own for further customization.
The second constructor argument is a logger. If not specified, a default logging will be created for logging to the console. If logging is not needed, you can pass in undefined.

Use the startConversation(videoFilePath) function to initiate a conversation about a video clip. This function returns a Conversation object. The video file or its frame images are sent to Azure Blob Storage or AWS S3 during this step.
Interact with GPT by using the say(question, { maxTokens: 2000 }) function within the conversation. You can pass in a question, and will receive an answer.

Message history is automatically kept during the conversation, providing context for a more coherent dialogue.
The second parameter of the say(...) function allows you to specify your own for further customization.

Wrap up the conversation using the end() function. This ensures proper clean-up and resource management.

Examples

Below is an example chat application, which

uses GPT deployment (in this example, it is named 'gpt4vision') hosted in Microsoft Azure;
uses ffmpeg to extract video frame images;
stores video frame images in Azure Blob Storage;
- container name: 'vision-experiment-input'
- object path prefix: 'video-frames/'
reads credentials from environment variables
reads input video file path from environment variable 'DEMO_VIDEO'

import readline from 'node:readline';
import { ChatAboutVideo } from 'chat-about-video';

const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
const prompt = (question: string) => new Promise<string>((resolve) => rl.question(question, resolve));

async function demo() {
  const chat = new ChatAboutVideo({
    openAiEndpoint: process.env.AZURE_OPENAI_API_ENDPOINT!, // This line is not needed if you are using GTP provided by OpenAI rather than by Microsoft Azure.
    openAiApiKey: process.env.OPENAI_API_KEY!, // This is the API key.
    azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!, // This line is not needed if you'd like to use AWS S3.
    openAiDeploymentName: 'gpt4vision', // For GPT provided by OpenAI, this is the model name. For GPT provided by Microsoft Azure, this is the deployment name.
    storageContainerName: 'vision-experiment-input', // Blob container name in Azure or S3 bucket name in AWS
    storagePathPrefix: 'video-frames/',
  });

  const conversation = await chat.startConversation(process.env.DEMO_VIDEO!);
  
  while(true) {
    const question = await prompt('\nUser: ');
    if (!question) {
      continue;
    }
    if (['exit', 'quit'].includes(question.toLowerCase().trim())) {
      break;
    }
    const answer = await conversation.say(question, { maxTokens: 2000 });
    console.log('\nAI:' + answer);
  }
}

demo().catch((error) => console.error(error));

Below is an example showing how to create an instance of ChatAboutVideo that

uses GPT provided by OpenAI;
uses ffmpeg to extract video frame images;
stores video frame images in AWS S3;
- bucket name: 'my-s3-bucket'
- object path prefix: 'video-frames/'
reads API key from environment variable 'OPENAI_API_KEY'

  const chat = new ChatAboutVideo({
    openAiApiKey: process.env.OPENAI_API_KEY!,
    openAiDeploymentName: 'gpt-4-vision-preview', // or 'gpt-4o'
    storageContainerName: 'my-s3-bucket',
    storagePathPrefix: 'video-frames/',
    extractVideoFrames: {
      limit: 30,    // override default value 10
      interval: 2,  // override default value 5
    },
  });

Below is an example showing how to create an instance of ChatAboutVideo that

uses GPT deployment (in this example, it is named 'gpt4vision') hosted in Microsoft Azure;
uses Microsoft Video Retrieval Index to extract frames and analyse the video
- A randomly named index is created automatically.
- The index is also deleted automatically when the conversation ends.
stores video file in Azure Blob Storage;
- container name: 'vision-experiment-input'
- object path prefix: 'videos/'
reads credentials from environment variables

  const chat = new ChatAboutVideo({
    openAiEndpoint: process.env.AZURE_OPENAI_API_ENDPOINT!,
    openAiApiKey: process.env.AZURE_OPENAI_API_KEY!,
    azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!,
    openAiDeploymentName: 'gpt4vision',
    storageContainerName: 'vision-experiment-input',
    storagePathPrefix: 'videos/',
    videoRetrievalIndex: {
      endpoint: process.env.AZURE_CV_API_ENDPOINT!,
      apiKey: process.env.AZURE_CV_API_KEY!,
      createIndexIfNotExists: true,
      deleteIndexWhenConversationEnds: true,
    },
  });

API

chat-about-video

Classes

Class: VideoRetrievalApiClient

azure/video-retrieval-api-client.VideoRetrievalApiClient

Constructors

constructor

• new VideoRetrievalApiClient(endpointBaseUrl, apiKey, apiVersion?)

Parameters

Name	Type	Default value
`endpointBaseUrl`	`string`	`undefined`
`apiKey`	`string`	`undefined`
`apiVersion`	`string`	`'2023-05-01-preview'`

Methods

createIndex

▸ createIndex(indexName, indexOptions?): Promise<void>

Parameters

Name	Type
`indexName`	`string`
`indexOptions`	`CreateIndexOptions`

Returns

Promise<void>

createIndexIfNotExist

▸ createIndexIfNotExist(indexName, indexOptions?): Promise<void>

Parameters

Name	Type
`indexName`	`string`
`indexOptions?`	`CreateIndexOptions`

Returns

Promise<void>

createIngestion

▸ createIngestion(indexName, ingestionName, ingestion): Promise<void>

Parameters

Name	Type
`indexName`	`string`
`ingestionName`	`string`
`ingestion`	`IngestionRequest`

Returns

Promise<void>

deleteDocument

▸ deleteDocument(indexName, documentUrl): Promise<void>

Parameters

Name	Type
`indexName`	`string`
`documentUrl`	`string`

Returns

Promise<void>

deleteIndex

▸ deleteIndex(indexName): Promise<void>

Parameters

Name	Type
`indexName`	`string`

Returns

Promise<void>

getIndex

▸ getIndex(indexName): Promise<undefined | IndexSummary>

Parameters

Name	Type
`indexName`	`string`

Returns

Promise<undefined | IndexSummary>

getIngestion

▸ getIngestion(indexName, ingestionName): Promise<IngestionSummary>

Parameters

Name	Type
`indexName`	`string`
`ingestionName`	`string`

Returns

Promise<IngestionSummary>

ingest

▸ ingest(indexName, ingestionName, ingestion, backoff?): Promise<void>

Parameters

Name	Type
`indexName`	`string`
`ingestionName`	`string`
`ingestion`	`IngestionRequest`
`backoff`	`number`[]

Returns

Promise<void>

listDocuments

▸ listDocuments(indexName): Promise<DocumentSummary[]>

Parameters

Name	Type
`indexName`	`string`

Returns

Promise<DocumentSummary[]>

listIndexes

▸ listIndexes(): Promise<IndexSummary[]>

Returns

Promise<IndexSummary[]>

Class: ChatAboutVideo

chat.ChatAboutVideo

Constructors

constructor

• new ChatAboutVideo(options, log?)

Parameters

Name	Type
`options`	`ChatAboutVideoConstructorOptions`
`log`	`undefined` \| `LineLogger`<(`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`>

Properties

Property	Description
`Protected` client: `OpenAIClient`
`Protected` log: `undefined` \| `LineLogger`<(`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`>
`Protected` options: `ChatAboutVideoOptions`

Methods

prepareVideoFrames

▸ Protected prepareVideoFrames(conversationId, videoFile, extractVideoFramesOptions?): Promise<PreparationResult>

Parameters

Name	Type
`conversationId`	`string`
`videoFile`	`string`
`extractVideoFramesOptions?`	`Partial`<{ `extractor`: `VideoFramesExtractor` ; `height`: `undefined` \| `number` ; `interval`: `number` ; `limit`: `number` ; `width`: `undefined` \| `number` }>

Returns

Promise<PreparationResult>

prepareVideoRetrievalIndex

▸ Protected prepareVideoRetrievalIndex(conversationId, videoFile, videoRetrievalIndexOptions?): Promise<PreparationResult>

Parameters

Name	Type
`conversationId`	`string`
`videoFile`	`string`
`videoRetrievalIndexOptions?`	`Partial`<{ `apiKey`: `string` ; `createIndexIfNotExists?`: `boolean` ; `deleteDocumentWhenConversationEnds?`: `boolean` ; `deleteIndexWhenConversationEnds?`: `boolean` ; `endpoint`: `string` ; `indexName?`: `string` }>

Returns

Promise<PreparationResult>

startConversation

▸ startConversation(videoFile, options?): Promise<Conversation>

Start a conversation about a video.

Parameters

Name	Type	Description
`videoFile`	`string`	Path to a video file in local file system.
`options?`	`Object`	overriding options for this conversation
`options.chatCompletions?`	`Partial`<`ChatOptions`>	-
`options.extractVideoFrames?`	`Partial`<{ `extractor`: `VideoFramesExtractor` ; `height`: `undefined` \| `number` ; `interval`: `number` ; `limit`: `number` ; `width`: `undefined` \| `number` }>	-
`options.videoRetrievalIndex?`	`Partial`<{ `apiKey`: `string` ; `createIndexIfNotExists?`: `boolean` ; `deleteDocumentWhenConversationEnds?`: `boolean` ; `deleteIndexWhenConversationEnds?`: `boolean` ; `endpoint`: `string` ; `indexName?`: `string` }>	-

Returns

Promise<Conversation>

The conversation.

Class: Conversation

chat.Conversation

Constructors

constructor

• new Conversation(client, deploymentName, conversationId, messages, options?, cleanup?, log?)

Parameters

Name	Type
`client`	`OpenAIClient`
`deploymentName`	`string`
`conversationId`	`string`
`messages`	`ChatRequestMessage`[]
`options?`	`GetChatCompletionsOptions`
`cleanup?`	() => `Promise`<`void`>
`log`	`undefined` \| `LineLogger`<(`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`>

Properties

Property	Description
`Protected` `Optional` cleanup: () => `Promise`<`void`>
`Protected` client: `OpenAIClient`
`Protected` conversationId: `string`
`Protected` deploymentName: `string`
`Protected` log: `undefined` \| `LineLogger`<(`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`, (`message?`: `any`, ...`optionalParams`: `any`[]) => `void`>
`Protected` messages: `ChatRequestMessage`[]
`Protected` `Optional` options: `GetChatCompletionsOptions`

Methods

end

▸ end(): Promise<void>

Returns

Promise<void>

say

▸ say(message, options?): Promise<undefined | string>

Say something in the conversation, and get the response from AI

Parameters

Name	Type	Description
`message`	`string`	The message to say in the conversation.
`options?`	`ChatOptions`	Options for fine control.

Returns

Promise<undefined | string>

The response/completion

Interfaces

Interface: CreateIndexOptions

azure/video-retrieval-api-client.CreateIndexOptions

Properties

Property	Description
`Optional` features: `IndexFeature`[]
`Optional` metadataSchema: `IndexMetadataSchema`
`Optional` userData: `object`

Interface: DocumentSummary

azure/video-retrieval-api-client.DocumentSummary

Properties

Property	Description
createdDateTime: `string`
documentId: `string`
`Optional` documentUrl: `string`
lastModifiedDateTime: `string`
`Optional` metadata: `object`
`Optional` userData: `object`

Interface: IndexFeature

azure/video-retrieval-api-client.IndexFeature

Properties

Property	Description
`Optional` domain: `"surveillance"` \| `"generic"`
`Optional` modelVersion: `string`
name: `"vision"` \| `"speech"`

Interface: IndexMetadataSchema

azure/video-retrieval-api-client.IndexMetadataSchema

Properties

Property	Description
fields: `IndexMetadataSchemaField`[]
`Optional` language: `string`

Interface: IndexMetadataSchemaField

azure/video-retrieval-api-client.IndexMetadataSchemaField

Properties

Property	Description
filterable: `boolean`
name: `string`
searchable: `boolean`
type: `"string"` \| `"datetime"`

Interface: IndexSummary

azure/video-retrieval-api-client.IndexSummary

Properties

Property	Description
createdDateTime: `string`
eTag: `string`
`Optional` features: `IndexFeature`[]
lastModifiedDateTime: `string`
name: `string`
`Optional` userData: `object`

Interface: IngestionRequest

azure/video-retrieval-api-client.IngestionRequest

Properties

Property	Description
`Optional` filterDefectedFrames: `boolean`
`Optional` generateInsightIntervals: `boolean`
`Optional` includeSpeechTranscript: `boolean`
`Optional` moderation: `boolean`
videos: `VideoIngestion`[]

Interface: IngestionStatusDetail

azure/video-retrieval-api-client.IngestionStatusDetail

Properties

Property	Description
documentId: `string`
documentUrl: `string`
lastUpdatedTime: `string`
succeeded: `boolean`

Interface: IngestionSummary

azure/video-retrieval-api-client.IngestionSummary

Properties

Property	Description
`Optional` batchName: `string`
createdDateTime: `string`
`Optional` fileStatusDetails: `IngestionStatusDetail`[]
lastModifiedDateTime: `string`
name: `string`
state: `"NotStarted"` \| `"Running"` \| `"Completed"` \| `"Failed"` \| `"PartiallySucceeded"`

Interface: VideoIngestion

azure/video-retrieval-api-client.VideoIngestion

Properties

Property	Description
`Optional` documentId: `string`
documentUrl: `string`
`Optional` metadata: `object`
mode: `"update"` \| `"remove"` \| `"add"`
`Optional` userData: `object`

Interface: ChatAboutVideoOptions

chat.ChatAboutVideoOptions

Option settings for ChatAboutVideo

Properties

Property	Description
`Optional` extractVideoFrames: `Object`	Type declaration
fileBatchUploader: `FileBatchUploader`	Function for uploading files
`Optional` initialPrompts: `ChatRequestMessage`[]	Initial prompts to be added to the chat history before frame images.
openAiDeploymentName: `string`	Name/ID of the deployment
`Optional` startPrompts: `ChatRequestMessage`[]	Prompts to be added to the chat history right after frame images.
storageContainerName: `string`	Storage container for storing frame images of the video.
storagePathPrefix: `string`	Path prefix to be prepended for storing frame images of the video.
tmpDir: `string`	Temporary directory for storing temporary files. If not specified, them temporary directory of the OS will be used.
`Optional` videoRetrievalIndex: `Object`	Type declaration

Module: aws

Functions

createAwsS3FileBatchUploader

▸ createAwsS3FileBatchUploader(s3Client, expirationSeconds, parallelism?): FileBatchUploader

Parameters

Name	Type	Default value
`s3Client`	`S3Client`	`undefined`
`expirationSeconds`	`number`	`undefined`
`parallelism`	`number`	`3`

Returns

FileBatchUploader

Module: azure

References

Functions

createAzureBlobStorageFileBatchUploader

▸ createAzureBlobStorageFileBatchUploader(blobServiceClient, expirationSeconds, parallelism?): FileBatchUploader

Parameters

Name	Type	Default value
`blobServiceClient`	`BlobServiceClient`	`undefined`
`expirationSeconds`	`number`	`undefined`
`parallelism`	`number`	`3`

Returns

FileBatchUploader

Module: azure/video-retrieval-api-client

Classes

VideoRetrievalApiClient

Interfaces

CreateIndexOptions
DocumentSummary
IndexFeature
IndexMetadataSchema
IndexMetadataSchemaField
IndexSummary
IngestionRequest
IngestionStatusDetail
IngestionSummary
VideoIngestion

Type Aliases

PaginatedWithNextLink

Ƭ PaginatedWithNextLink<T>: Object

Type parameters

Name
`T`

Type declaration

Name	Type
`nextLink?`	`string`
`value`	`T`[]

Module: chat

Classes

ChatAboutVideo
Conversation

Interfaces

ChatAboutVideoOptions

Type Aliases

ChatAboutVideoConstructorOptions

Ƭ ChatAboutVideoConstructorOptions: Partial<Omit<ChatAboutVideoOptions, "videoRetrievalIndex" | "extractVideoFrames">> & Required<Pick<ChatAboutVideoOptions, "openAiDeploymentName" | "storageContainerName">> & { extractVideoFrames?: Partial<Exclude<ChatAboutVideoOptions["extractVideoFrames"], undefined>> ; videoRetrievalIndex?: Partial<ChatAboutVideoOptions["videoRetrievalIndex"]> & Pick<Exclude<ChatAboutVideoOptions["videoRetrievalIndex"], undefined>, "endpoint" | "apiKey"> } & { azureStorageConnectionString?: string ; downloadUrlExpirationSeconds?: number ; openAiApiKey: string ; openAiEndpoint?: string }

ChatOptions

Ƭ ChatOptions: GetChatCompletionsOptions & { throttleBackoff?: number[] }

ExtractVideoFramesOptions

Ƭ ExtractVideoFramesOptions: Exclude<ChatAboutVideoOptions["extractVideoFrames"], undefined>

VideoRetrievalIndexOptions

Ƭ VideoRetrievalIndexOptions: Exclude<ChatAboutVideoOptions["videoRetrievalIndex"], undefined>

Module: client-hack

Functions

fixClient

▸ fixClient(openAIClient): void

Parameters

Name	Type
`openAIClient`	`any`

Returns

void

Module: index

References

ChatAboutVideo

Re-exports ChatAboutVideo

ChatAboutVideoConstructorOptions

Re-exports ChatAboutVideoConstructorOptions

ChatAboutVideoOptions

Re-exports ChatAboutVideoOptions

ChatOptions

Re-exports ChatOptions

Conversation

Re-exports Conversation

ExtractVideoFramesOptions

Re-exports ExtractVideoFramesOptions

FileBatchUploader

Re-exports FileBatchUploader

VideoFramesExtractor

Re-exports VideoFramesExtractor

VideoRetrievalIndexOptions

Re-exports VideoRetrievalIndexOptions

extractVideoFramesWithFfmpeg

Re-exports extractVideoFramesWithFfmpeg

lazyCreatedFileBatchUploader

Re-exports lazyCreatedFileBatchUploader

lazyCreatedVideoFramesExtractor

Re-exports lazyCreatedVideoFramesExtractor

Module: storage

References

FileBatchUploader

Re-exports FileBatchUploader

Functions

lazyCreatedFileBatchUploader

▸ lazyCreatedFileBatchUploader(creator): FileBatchUploader

Parameters

Name	Type
`creator`	`Promise`<`FileBatchUploader`>

Returns

FileBatchUploader

Module: storage/types

Type Aliases

FileBatchUploader

Ƭ FileBatchUploader: (dir: string, fileNames: string[], containerName: string, blobPathPrefix: string) => Promise<string[]>

Type declaration

▸ (dir, fileNames, containerName, blobPathPrefix): Promise<string[]>

####### Parameters

Name	Type
`dir`	`string`
`fileNames`	`string`[]
`containerName`	`string`
`blobPathPrefix`	`string`

####### Returns

Promise<string[]>

Module: video

References

VideoFramesExtractor

Re-exports VideoFramesExtractor

extractVideoFramesWithFfmpeg

Re-exports extractVideoFramesWithFfmpeg

Functions

lazyCreatedVideoFramesExtractor

▸ lazyCreatedVideoFramesExtractor(creator): VideoFramesExtractor

Parameters

Name	Type
`creator`	`Promise`<`VideoFramesExtractor`>

Returns

VideoFramesExtractor

Module: video/ffmpeg

Functions

extractVideoFramesWithFfmpeg

▸ extractVideoFramesWithFfmpeg(inputFile, outputDir, intervalSec, format?, width?, height?, startSec?, endSec?): Promise<string[]>

Parameters

Name	Type
`inputFile`	`string`
`outputDir`	`string`
`intervalSec`	`number`
`format?`	`string`
`width?`	`number`
`height?`	`number`
`startSec?`	`number`
`endSec?`	`number`

Returns

Promise<string[]>

Module: video/types

Type Aliases

VideoFramesExtractor

Ƭ VideoFramesExtractor: (inputFile: string, outputDir: string, intervalSec: number, format?: string, width?: number, height?: number, startSec?: number, endSec?: number) => Promise<string[]>

Type declaration

▸ (inputFile, outputDir, intervalSec, format?, width?, height?, startSec?, endSec?): Promise<string[]>

####### Parameters

Name	Type
`inputFile`	`string`
`outputDir`	`string`
`intervalSec`	`number`
`format?`	`string`
`width?`	`number`
`height?`	`number`
`startSec?`	`number`
`endSec?`	`number`

####### Returns

Promise<string[]>