Chat about a video clip using the powerful OpenAI GPT-4 Vision or GPT-4o.
chat-about-video
is an open-source NPM package designed to accelerate the development of conversation applications about video content. Harnessing the capabilities of OpenAI GPT-4 Vision or GPT-4o services from Microsoft Azure or OpenAI, this package opens up a range of usage scenarios with minimal effort.
Key features:
- ChatGPT models hosted in both Azure and OpenAI are supported.
- Frame images are extracted from the input video, and uploaded for ChatGPT to consume.
- It can automatically retry on receiving throttling (HTTP status code 429) responses from the API.
- Options supported by the underlying API are exposed for customisation.
There are two approaches for feeding video content into GPT-4 Vision. chat-about-video
supports both of them.
Frame image extraction:
- Integrate GPT-4 Vision or GPT-4o from Microsoft Azure or OpenAI effortlessly.
- Utilize ffmpeg integration provided by this package for frame image extraction or opt for a DIY approach.
- Store frame images with ease, supporting Azure Blob Storage and AWS S3.
- GPT-4 Vision hosted in Azure allows analysis of up to 10 frame images.
- GPT-4 Vision or GPT-4o hosted in OpenAI allows analysis of more than 10 frame images.
Video indexing with Microsoft Azure:
- Exclusively supported by GPT-4 Vision from Microsoft Azure.
- Ingest videos seamlessly into Microsoft Azure's Video Retrieval Index.
- Automatic extraction of up to 20 frame images using Video Retrieval Indexer.
- Default integration of speech transcription for enhanced comprehension.
- Flexible storage options with support for Azure Blob Storage and AWS S3.
Add chat-about-video as a dependency to your Node.js application using the following command:
npm i chat-about-video
If you intend to utilize ffmpeg for extracting video frame images, ensure it is installed on your system. You can install it using either a system package manager or a helper NPM package:
sudo apt install ffmpeg
# or
npm i @ffmpeg-installer/ffmpeg
If you plan to use Azure Blob Storage, include the following dependency:
npm i @azure/storage-blob
For using AWS S3, install the following dependencies:
npm i @handy-common-utils/aws-utils @aws-sdk/s3-request-presigner @aws-sdk/client-s3
To integrate chat-about-video
into your Node.js application, follow these simple steps:
- Instantiate the
ChatAboutVideo
class by creating an instance. The constructor allows you to pass in configuration options.
- Most configuration options come with sensible default values, but you can specify your own for further customization.
- The second constructor argument is a logger. If not specified, a default logging will be created for logging to the console.
If logging is not needed, you can pass in
undefined
.
- Use the
startConversation(videoFilePath)
function to initiate a conversation about a video clip. This function returns aConversation
object. The video file or its frame images are sent to Azure Blob Storage or AWS S3 during this step. - Interact with GPT by using the
say(question, { maxTokens: 2000 })
function within the conversation. You can pass in a question, and will receive an answer.
- Message history is automatically kept during the conversation, providing context for a more coherent dialogue.
- The second parameter of the
say(...)
function allows you to specify your own for further customization.
- Wrap up the conversation using the
end()
function. This ensures proper clean-up and resource management.
Below is an example chat application, which
- uses GPT deployment (in this example, it is named 'gpt4vision') hosted in Microsoft Azure;
- uses ffmpeg to extract video frame images;
- stores video frame images in Azure Blob Storage;
- container name: 'vision-experiment-input'
- object path prefix: 'video-frames/'
- reads credentials from environment variables
- reads input video file path from environment variable 'DEMO_VIDEO'
import readline from 'node:readline';
import { ChatAboutVideo } from 'chat-about-video';
const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
const prompt = (question: string) => new Promise<string>((resolve) => rl.question(question, resolve));
async function demo() {
const chat = new ChatAboutVideo({
openAiEndpoint: process.env.AZURE_OPENAI_API_ENDPOINT!, // This line is not needed if you are using GTP provided by OpenAI rather than by Microsoft Azure.
openAiApiKey: process.env.OPENAI_API_KEY!, // This is the API key.
azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!, // This line is not needed if you'd like to use AWS S3.
openAiDeploymentName: 'gpt4vision', // For GPT provided by OpenAI, this is the model name. For GPT provided by Microsoft Azure, this is the deployment name.
storageContainerName: 'vision-experiment-input', // Blob container name in Azure or S3 bucket name in AWS
storagePathPrefix: 'video-frames/',
});
const conversation = await chat.startConversation(process.env.DEMO_VIDEO!);
while(true) {
const question = await prompt('\nUser: ');
if (!question) {
continue;
}
if (['exit', 'quit'].includes(question.toLowerCase().trim())) {
break;
}
const answer = await conversation.say(question, { maxTokens: 2000 });
console.log('\nAI:' + answer);
}
}
demo().catch((error) => console.error(error));
Below is an example showing how to create an instance of ChatAboutVideo
that
- uses GPT provided by OpenAI;
- uses ffmpeg to extract video frame images;
- stores video frame images in AWS S3;
- bucket name: 'my-s3-bucket'
- object path prefix: 'video-frames/'
- reads API key from environment variable 'OPENAI_API_KEY'
const chat = new ChatAboutVideo({
openAiApiKey: process.env.OPENAI_API_KEY!,
openAiDeploymentName: 'gpt-4-vision-preview', // or 'gpt-4o'
storageContainerName: 'my-s3-bucket',
storagePathPrefix: 'video-frames/',
extractVideoFrames: {
limit: 30, // override default value 10
interval: 2, // override default value 5
},
});
Below is an example showing how to create an instance of ChatAboutVideo
that
- uses GPT deployment (in this example, it is named 'gpt4vision') hosted in Microsoft Azure;
- uses Microsoft Video Retrieval Index to extract frames and analyse the video
- A randomly named index is created automatically.
- The index is also deleted automatically when the conversation ends.
- stores video file in Azure Blob Storage;
- container name: 'vision-experiment-input'
- object path prefix: 'videos/'
- reads credentials from environment variables
const chat = new ChatAboutVideo({
openAiEndpoint: process.env.AZURE_OPENAI_API_ENDPOINT!,
openAiApiKey: process.env.AZURE_OPENAI_API_KEY!,
azureStorageConnectionString: process.env.AZURE_STORAGE_CONNECTION_STRING!,
openAiDeploymentName: 'gpt4vision',
storageContainerName: 'vision-experiment-input',
storagePathPrefix: 'videos/',
videoRetrievalIndex: {
endpoint: process.env.AZURE_CV_API_ENDPOINT!,
apiKey: process.env.AZURE_CV_API_KEY!,
createIndexIfNotExists: true,
deleteIndexWhenConversationEnds: true,
},
});
- aws
- azure
- azure/video-retrieval-api-client
- chat
- client-hack
- index
- storage
- storage/types
- video
- video/ffmpeg
- video/types
azure/video-retrieval-api-client.VideoRetrievalApiClient
• new VideoRetrievalApiClient(endpointBaseUrl
, apiKey
, apiVersion?
)
Name | Type | Default value |
---|---|---|
endpointBaseUrl |
string |
undefined |
apiKey |
string |
undefined |
apiVersion |
string |
'2023-05-01-preview' |
▸ createIndex(indexName
, indexOptions?
): Promise
<void
>
Name | Type |
---|---|
indexName |
string |
indexOptions |
CreateIndexOptions |
Promise
<void
>
▸ createIndexIfNotExist(indexName
, indexOptions?
): Promise
<void
>
Name | Type |
---|---|
indexName |
string |
indexOptions? |
CreateIndexOptions |
Promise
<void
>
▸ createIngestion(indexName
, ingestionName
, ingestion
): Promise
<void
>
Name | Type |
---|---|
indexName |
string |
ingestionName |
string |
ingestion |
IngestionRequest |
Promise
<void
>
▸ deleteDocument(indexName
, documentUrl
): Promise
<void
>
Name | Type |
---|---|
indexName |
string |
documentUrl |
string |
Promise
<void
>
▸ deleteIndex(indexName
): Promise
<void
>
Name | Type |
---|---|
indexName |
string |
Promise
<void
>
▸ getIndex(indexName
): Promise
<undefined
| IndexSummary
>
Name | Type |
---|---|
indexName |
string |
Promise
<undefined
| IndexSummary
>
▸ getIngestion(indexName
, ingestionName
): Promise
<IngestionSummary
>
Name | Type |
---|---|
indexName |
string |
ingestionName |
string |
Promise
<IngestionSummary
>
▸ ingest(indexName
, ingestionName
, ingestion
, backoff?
): Promise
<void
>
Name | Type |
---|---|
indexName |
string |
ingestionName |
string |
ingestion |
IngestionRequest |
backoff |
number [] |
Promise
<void
>
▸ listDocuments(indexName
): Promise
<DocumentSummary
[]>
Name | Type |
---|---|
indexName |
string |
Promise
<DocumentSummary
[]>
▸ listIndexes(): Promise
<IndexSummary
[]>
Promise
<IndexSummary
[]>
chat.ChatAboutVideo
• new ChatAboutVideo(options
, log?
)
Name | Type |
---|---|
options |
ChatAboutVideoConstructorOptions |
log |
undefined | LineLogger <(message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void > |
Property | Description |
---|---|
Protected client: OpenAIClient
|
|
Protected log: undefined | LineLogger <(message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void > |
|
Protected options: ChatAboutVideoOptions
|
▸ Protected
prepareVideoFrames(conversationId
, videoFile
, extractVideoFramesOptions?
): Promise
<PreparationResult
>
Name | Type |
---|---|
conversationId |
string |
videoFile |
string |
extractVideoFramesOptions? |
Partial <{ extractor : VideoFramesExtractor ; height : undefined | number ; interval : number ; limit : number ; width : undefined | number }> |
Promise
<PreparationResult
>
▸ Protected
prepareVideoRetrievalIndex(conversationId
, videoFile
, videoRetrievalIndexOptions?
): Promise
<PreparationResult
>
Name | Type |
---|---|
conversationId |
string |
videoFile |
string |
videoRetrievalIndexOptions? |
Partial <{ apiKey : string ; createIndexIfNotExists? : boolean ; deleteDocumentWhenConversationEnds? : boolean ; deleteIndexWhenConversationEnds? : boolean ; endpoint : string ; indexName? : string }> |
Promise
<PreparationResult
>
▸ startConversation(videoFile
, options?
): Promise
<Conversation
>
Start a conversation about a video.
Name | Type | Description |
---|---|---|
videoFile |
string |
Path to a video file in local file system. |
options? |
Object |
overriding options for this conversation |
options.chatCompletions? |
Partial <ChatOptions > |
- |
options.extractVideoFrames? |
Partial <{ extractor : VideoFramesExtractor ; height : undefined | number ; interval : number ; limit : number ; width : undefined | number }> |
- |
options.videoRetrievalIndex? |
Partial <{ apiKey : string ; createIndexIfNotExists? : boolean ; deleteDocumentWhenConversationEnds? : boolean ; deleteIndexWhenConversationEnds? : boolean ; endpoint : string ; indexName? : string }> |
- |
Promise
<Conversation
>
The conversation.
chat.Conversation
• new Conversation(client
, deploymentName
, conversationId
, messages
, options?
, cleanup?
, log?
)
Name | Type |
---|---|
client |
OpenAIClient |
deploymentName |
string |
conversationId |
string |
messages |
ChatRequestMessage [] |
options? |
GetChatCompletionsOptions |
cleanup? |
() => Promise <void > |
log |
undefined | LineLogger <(message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void > |
Property | Description |
---|---|
Protected Optional cleanup: () => Promise <void > |
|
Protected client: OpenAIClient
|
|
Protected conversationId: string
|
|
Protected deploymentName: string
|
|
Protected log: undefined | LineLogger <(message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void , (message? : any , ...optionalParams : any []) => void > |
|
Protected messages: ChatRequestMessage [] |
|
Protected Optional options: GetChatCompletionsOptions
|
▸ end(): Promise
<void
>
Promise
<void
>
▸ say(message
, options?
): Promise
<undefined
| string
>
Say something in the conversation, and get the response from AI
Name | Type | Description |
---|---|---|
message |
string |
The message to say in the conversation. |
options? |
ChatOptions |
Options for fine control. |
Promise
<undefined
| string
>
The response/completion
azure/video-retrieval-api-client.CreateIndexOptions
Property | Description |
---|---|
Optional features: IndexFeature [] |
|
Optional metadataSchema: IndexMetadataSchema
|
|
Optional userData: object
|
azure/video-retrieval-api-client.DocumentSummary
Property | Description |
---|---|
createdDateTime: string
|
|
documentId: string
|
|
Optional documentUrl: string
|
|
lastModifiedDateTime: string
|
|
Optional metadata: object
|
|
Optional userData: object
|
azure/video-retrieval-api-client.IndexFeature
Property | Description |
---|---|
Optional domain: "surveillance" | "generic"
|
|
Optional modelVersion: string
|
|
name: "vision" | "speech"
|
azure/video-retrieval-api-client.IndexMetadataSchema
Property | Description |
---|---|
fields: IndexMetadataSchemaField [] |
|
Optional language: string
|
azure/video-retrieval-api-client.IndexMetadataSchemaField
Property | Description |
---|---|
filterable: boolean
|
|
name: string
|
|
searchable: boolean
|
|
type: "string" | "datetime"
|
azure/video-retrieval-api-client.IndexSummary
Property | Description |
---|---|
createdDateTime: string
|
|
eTag: string
|
|
Optional features: IndexFeature [] |
|
lastModifiedDateTime: string
|
|
name: string
|
|
Optional userData: object
|
azure/video-retrieval-api-client.IngestionRequest
Property | Description |
---|---|
Optional filterDefectedFrames: boolean
|
|
Optional generateInsightIntervals: boolean
|
|
Optional includeSpeechTranscript: boolean
|
|
Optional moderation: boolean
|
|
videos: VideoIngestion [] |
azure/video-retrieval-api-client.IngestionStatusDetail
Property | Description |
---|---|
documentId: string
|
|
documentUrl: string
|
|
lastUpdatedTime: string
|
|
succeeded: boolean
|
azure/video-retrieval-api-client.IngestionSummary
Property | Description |
---|---|
Optional batchName: string
|
|
createdDateTime: string
|
|
Optional fileStatusDetails: IngestionStatusDetail [] |
|
lastModifiedDateTime: string
|
|
name: string
|
|
state: "NotStarted" | "Running" | "Completed" | "Failed" | "PartiallySucceeded"
|
azure/video-retrieval-api-client.VideoIngestion
Property | Description |
---|---|
Optional documentId: string
|
|
documentUrl: string
|
|
Optional metadata: object
|
|
mode: "update" | "remove" | "add"
|
|
Optional userData: object
|
chat.ChatAboutVideoOptions
Option settings for ChatAboutVideo
Property | Description |
---|---|
Optional extractVideoFrames: Object
|
Type declaration |
fileBatchUploader: FileBatchUploader
|
Function for uploading files |
Optional initialPrompts: ChatRequestMessage [] |
Initial prompts to be added to the chat history before frame images. |
openAiDeploymentName: string
|
Name/ID of the deployment |
Optional startPrompts: ChatRequestMessage [] |
Prompts to be added to the chat history right after frame images. |
storageContainerName: string
|
Storage container for storing frame images of the video. |
storagePathPrefix: string
|
Path prefix to be prepended for storing frame images of the video. |
tmpDir: string
|
Temporary directory for storing temporary files. If not specified, them temporary directory of the OS will be used. |
Optional videoRetrievalIndex: Object
|
Type declaration |
▸ createAwsS3FileBatchUploader(s3Client
, expirationSeconds
, parallelism?
): FileBatchUploader
Name | Type | Default value |
---|---|---|
s3Client |
S3Client |
undefined |
expirationSeconds |
number |
undefined |
parallelism |
number |
3 |
Re-exports CreateIndexOptions
Re-exports DocumentSummary
Re-exports IndexFeature
Re-exports IndexMetadataSchema
Re-exports IndexMetadataSchemaField
Re-exports IndexSummary
Re-exports IngestionRequest
Re-exports IngestionStatusDetail
Re-exports IngestionSummary
Re-exports PaginatedWithNextLink
Re-exports VideoIngestion
Re-exports VideoRetrievalApiClient
▸ createAzureBlobStorageFileBatchUploader(blobServiceClient
, expirationSeconds
, parallelism?
): FileBatchUploader
Name | Type | Default value |
---|---|---|
blobServiceClient |
BlobServiceClient |
undefined |
expirationSeconds |
number |
undefined |
parallelism |
number |
3 |
- CreateIndexOptions
- DocumentSummary
- IndexFeature
- IndexMetadataSchema
- IndexMetadataSchemaField
- IndexSummary
- IngestionRequest
- IngestionStatusDetail
- IngestionSummary
- VideoIngestion
Ƭ PaginatedWithNextLink<T
>: Object
Name |
---|
T |
Name | Type |
---|---|
nextLink? |
string |
value |
T [] |
Ƭ ChatAboutVideoConstructorOptions: Partial
<Omit
<ChatAboutVideoOptions
, "videoRetrievalIndex"
| "extractVideoFrames"
>> & Required
<Pick
<ChatAboutVideoOptions
, "openAiDeploymentName"
| "storageContainerName"
>> & { extractVideoFrames?
: Partial
<Exclude
<ChatAboutVideoOptions
["extractVideoFrames"
], undefined
>> ; videoRetrievalIndex?
: Partial
<ChatAboutVideoOptions
["videoRetrievalIndex"
]> & Pick
<Exclude
<ChatAboutVideoOptions
["videoRetrievalIndex"
], undefined
>, "endpoint"
| "apiKey"
> } & { azureStorageConnectionString?
: string
; downloadUrlExpirationSeconds?
: number
; openAiApiKey
: string
; openAiEndpoint?
: string
}
Ƭ ChatOptions: GetChatCompletionsOptions
& { throttleBackoff?
: number
[] }
Ƭ ExtractVideoFramesOptions: Exclude
<ChatAboutVideoOptions
["extractVideoFrames"
], undefined
>
Ƭ VideoRetrievalIndexOptions: Exclude
<ChatAboutVideoOptions
["videoRetrievalIndex"
], undefined
>
▸ fixClient(openAIClient
): void
Name | Type |
---|---|
openAIClient |
any |
void
Re-exports ChatAboutVideo
Re-exports ChatAboutVideoConstructorOptions
Re-exports ChatAboutVideoOptions
Re-exports ChatOptions
Re-exports Conversation
Re-exports ExtractVideoFramesOptions
Re-exports FileBatchUploader
Re-exports VideoFramesExtractor
Re-exports VideoRetrievalIndexOptions
Re-exports extractVideoFramesWithFfmpeg
Re-exports lazyCreatedFileBatchUploader
Re-exports lazyCreatedVideoFramesExtractor
Re-exports FileBatchUploader
▸ lazyCreatedFileBatchUploader(creator
): FileBatchUploader
Name | Type |
---|---|
creator |
Promise <FileBatchUploader > |
Ƭ FileBatchUploader: (dir
: string
, fileNames
: string
[], containerName
: string
, blobPathPrefix
: string
) => Promise
<string
[]>
▸ (dir
, fileNames
, containerName
, blobPathPrefix
): Promise
<string
[]>
####### Parameters
Name | Type |
---|---|
dir |
string |
fileNames |
string [] |
containerName |
string |
blobPathPrefix |
string |
####### Returns
Promise
<string
[]>
Re-exports VideoFramesExtractor
Re-exports extractVideoFramesWithFfmpeg
▸ lazyCreatedVideoFramesExtractor(creator
): VideoFramesExtractor
Name | Type |
---|---|
creator |
Promise <VideoFramesExtractor > |
▸ extractVideoFramesWithFfmpeg(inputFile
, outputDir
, intervalSec
, format?
, width?
, height?
, startSec?
, endSec?
): Promise
<string
[]>
Name | Type |
---|---|
inputFile |
string |
outputDir |
string |
intervalSec |
number |
format? |
string |
width? |
number |
height? |
number |
startSec? |
number |
endSec? |
number |
Promise
<string
[]>
Ƭ VideoFramesExtractor: (inputFile
: string
, outputDir
: string
, intervalSec
: number
, format?
: string
, width?
: number
, height?
: number
, startSec?
: number
, endSec?
: number
) => Promise
<string
[]>
▸ (inputFile
, outputDir
, intervalSec
, format?
, width?
, height?
, startSec?
, endSec?
): Promise
<string
[]>
####### Parameters
Name | Type |
---|---|
inputFile |
string |
outputDir |
string |
intervalSec |
number |
format? |
string |
width? |
number |
height? |
number |
startSec? |
number |
endSec? |
number |
####### Returns
Promise
<string
[]>