Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error either the bucket name or prefix in the S3 data source is duplicated #664

Open
konokenj opened this issue Dec 27, 2024 · 2 comments
Labels
bug Something isn't working identified Indicates that the root cause of the bug has been determined.

Comments

@konokenj
Copy link
Contributor

🚨 Please Note 🚨

To ensure efficient investigation of the issue, please fill out the fields below with as much detail as possible. Reports that do not follow this template may be closed without notification. We appreciate your cooperation.

Describe the bug

At knowledge settings, if user inputs duplicated bucket name or prefix in S3 datasource, causes error at cdk synth.

If bucket name is duplicated:

  • s3://hoge/fuga/
  • s3://hoge/piyo/
Error: There is already a Construct with name 'hoge' in BedrockCustomBotStack [BrChatKbStack01JG3V7FQZYG5NGX7R87966JTT]
    at Node.addChild (/codebuild/output/src1957739235/src/cdk/node_modules/constructs/src/construct.ts:447:13)
    at new Node (/codebuild/output/src1957739235/src/cdk/node_modules/constructs/src/construct.ts:71:17)
    at new Construct (/codebuild/output/src1957739235/src/cdk/node_modules/constructs/src/construct.ts:499:17)
    at new Resource (/codebuild/output/src1957739235/src/cdk/node_modules/aws-cdk-lib/core/lib/resource.js:1:1309)
    at new BucketBase (/codebuild/output/src1957739235/src/cdk/node_modules/aws-cdk-lib/aws-s3/lib/bucket.js:1:2172)
    at new Import (/codebuild/output/src1957739235/src/cdk/node_modules/aws-cdk-lib/aws-s3/lib/bucket.js:1:16085)
    at Function.fromBucketAttributes (/codebuild/output/src1957739235/src/cdk/node_modules/aws-cdk-lib/aws-s3/lib/bucket.js:1:16997)
    at Function.fromBucketName (/codebuild/output/src1957739235/src/cdk/node_modules/aws-cdk-lib/aws-s3/lib/bucket.js:1:14937)
    at /codebuild/output/src1957739235/src/cdk/lib/bedrock-custom-bot-stack.ts:353:29
    at Array.forEach (<anonymous>)

If prefix is duplicated:

  • s3://hoge/fuga/
  • s3://piyo/fuga/
Error: There is already a Construct with name 'DataSourcehoge--' in BedrockCustomBotStack [BrChatKbStack01JG3V7FQZYG5NGX7R87966JTT]
    at Node.addChild (/codebuild/output/src3955119784/src/cdk/node_modules/constructs/src/construct.ts:447:13)
    at new Node (/codebuild/output/src3955119784/src/cdk/node_modules/constructs/src/construct.ts:71:17)
    at new Construct (/codebuild/output/src3955119784/src/cdk/node_modules/constructs/src/construct.ts:499:17)
    at new Resource (/codebuild/output/src3955119784/src/cdk/node_modules/aws-cdk-lib/core/lib/resource.js:1:1309)
    at new DataSourceBase (/codebuild/output/src3955119784/src/cdk/node_modules/@cdklabs/generative-ai-cdk-constructs/src/cdk-lib/bedrock/data-sources/base-data-source.ts:92:1)
    at new DataSourceNew (/codebuild/output/src3955119784/src/cdk/node_modules/@cdklabs/generative-ai-cdk-constructs/src/cdk-lib/bedrock/data-sources/base-data-source.ts:162:1)
    at new S3DataSource (/codebuild/output/src3955119784/src/cdk/node_modules/@cdklabs/generative-ai-cdk-constructs/src/cdk-lib/bedrock/data-sources/s3-data-source.ts:98:5)
    at /codebuild/output/src3955119784/src/cdk/lib/bedrock-custom-bot-stack.ts:119:16
    at Array.map (<anonymous>)
    at new BedrockCustomBotStack (/codebuild/output/src3955119784/src/cdk/lib/bedrock-custom-bot-stack.ts:116:49)

Also, the prefix must end with / , but there is no such explanation, and no error message is displayed on UI.

"Value error, Invalid S3 URL format (must end with a '/'): s3://hoge/hoge"

To Reproduce

Steps to reproduce the behavior:

  1. Go to 'bot console'
  2. Click on 'create new bot'
  3. Scroll down to 'S3 datasources'
  4. Input duplicated:
  • duplicated bucket name like s3://hoge/fuga/ and s3://hoge/piyo/
  • duplicated prefix like s3://hoge/fuga/ and s3://piyo/fuga/
  1. See errors in CodeBuild

Screenshots

image

Additional context

Add any other context about the problem here.

@statefb
Copy link
Contributor

statefb commented Jan 15, 2025

Cause: duplication of CDK construct id (source)

  • Duplicated construct id (bucket name)
if (props.existingS3Urls && props.existingS3Urls.length > 0) {
      props.existingS3Urls.forEach((url) => {
        const { bucketName, prefix } = this.parseS3Url(url);
        docBucketsAndPrefixes.push({
          bucket: s3.Bucket.fromBucketName(this, bucketName, bucketName),
          prefix: prefix,
        });
      });
    }
  • Duplicated construct id (data source)
const dataSources = docBucketsAndPrefixes.map(({ bucket, prefix }) => {
        bucket.grantRead(kb.role);
        const inclusionPrefixes = prefix === "" ? undefined : [prefix];
        return new S3DataSource(this, `DataSource${prefix}`, {
          bucket: bucket,
          knowledgeBase: kb,
          dataSourceName: bucket.bucketName,
          chunkingStrategy: props.chunkingStrategy,
          parsingStrategy: props.parsingModel ? ParsingStategy.foundationModel({
            parsingModel: props.parsingModel.asIModel(this),
          }) : undefined,
          inclusionPrefixes: inclusionPrefixes,
        });
      });    

@statefb statefb added bug Something isn't working identified Indicates that the root cause of the bug has been determined. and removed needs-triage labels Jan 15, 2025
@statefb
Copy link
Contributor

statefb commented Jan 15, 2025

Before implementation, need to confirm that the existing KB behavior does not change if modify the DataSource construct ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working identified Indicates that the root cause of the bug has been determined.
Projects
None yet
Development

No branches or pull requests

2 participants