Skip to content
Ayhan Sipahi Ayhan Sipahi

Triggering AppSync Subscriptions From Outside AppSync

AppSync subscriptions fire only on mutations. This explores bridging downstream BFF events into a NONE-data-source mutation with EventBridge and CDK.

AWS AppSync gives you managed WebSocket subscriptions, but they fire only as a response to an AppSync GraphQL mutation. In a multi-service Backend-for-Frontend (BFF), the truth lives downstream in the order, inventory, and payments services, and those state changes never pass through an AppSync mutation. So “managing subscriptions” in a BFF is really one problem: bridging downstream events into an internal mutation backed by a NONE data source, then letting AppSync fan the payload out. The wiring below is an exploration in AWS CDK; the proof of concept runs after publishing, so it is described in future tense and nothing here is yet measured.

The naive mental model is that you subscribe to a field and AppSync streams database changes to you. That model is wrong, and the docs are blunt about it: “Subscriptions in AWS AppSync are invoked as a response to a mutation.” There is no database-change-to-subscription path. Something has to call a mutation. For AppSync resolver, data source, and caching basics, see the production AppSync guide; those fundamentals are assumed here, and the focus stays on the bridge.

Why subscriptions are mutation-bound

A subscription field carries an @aws_subscribe directive that names the mutations it listens to:

type Mutation {
  publishOrderUpdate(input: OrderUpdateInput!): OrderUpdate
    @aws_iam
}

type Subscription {
  onOrderUpdate(orderId: ID): OrderUpdate
    @aws_subscribe(mutations: ["publishOrderUpdate"])
}

The subscription field itself names no data source. The mutation it listens to carries the resolver, and AppSync sends the mutation’s selection set to subscribers. The transport is pure WebSockets between client and service; the older MQTT-over-WebSocket protocol was removed on January 1, 2022, so any guide still mentioning MQTT is stale.

Because the trigger is always a mutation, a downstream event that wants to reach a subscriber must end up calling one. That single fact shapes the entire pattern.

The bridge pattern

This is the AWS-documented recipe, generalized for a BFF where many services publish events. AWS re:Post frames the question directly: how do you notify subscribers of external updates that client-side mutations do not perform? The answer is a four-step bridge.

  1. A downstream service emits an event. The re:Post article uses a DynamoDB stream; for a multi-service BFF, prefer EventBridge so any service can publish without coupling to AppSync.
  2. EventBridge (natively, or via a Lambda) calls an internal AppSync mutation. This is what causes AppSync to notify subscribers.
  3. That mutation field has a local resolver backed by a NONE data source. The resolver just echoes its arguments; field resolution never leaves AppSync.
  4. A subscription field subscribes to that mutation through @aws_subscribe, and AppSync fans the payload out.

The NONE data source is the key. The docs describe a local resolver as one that “will just forward the result of the request handler to the response handler,” and name the most popular use case as “to publish notifications without triggering a data source call.” The trigger mutation must be a pure echo; if you write to a datastore inside it, you create a double write and drift.

// resolvers/publishOrderUpdate.js  (NONE data source, JS runtime)
export function request(ctx) {
  return { payload: ctx.args.input };
}

export function response(ctx) {
  return ctx.result;
}
Subscribers (onOrderUpdate)AppSync (publishOrderUpdate, NONE resolver)EventBridgeOrder ServiceSubscribers (onOrderUpdate)AppSync (publishOrderUpdate, NONE resolver)EventBridgeOrder Servicelocal resolver echoes args, no data source callemit OrderStatusChangedinvoke publishOrderUpdate mutationfan out via enhanced filters

The trigger leg: native target first, Lambda as fallback

The trigger leg is where the most interesting practical tension lives. The CDK README documents two ways to invoke that mutation, and the right default is the one with the least code.

Option A, the native EventBridge target. The aws-cdk-lib/aws-events-targets package ships a targets.AppSync target that calls a GraphQL operation directly from an EventBridge rule. No Lambda sits in the path. The README notes the API “needs to be configured with AWS_IAM authorization mode” and that the target relies on graphQLEndpointArn.

import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';

rule.addTarget(
  new targets.AppSync(api, {
    graphQLOperation:
      'mutation Publish($input: OrderUpdateInput!) { publishOrderUpdate(input: $input) { orderId status } }',
    variables: events.RuleTargetInput.fromObject({
      input: {
        orderId: events.EventField.fromPath('$.detail.orderId'),
        status: events.EventField.fromPath('$.detail.status'),
      },
    }),
  }),
);

If the event payload is usable as-is, this is the whole bridge. There is no function to deploy, no cold start, and no extra code path to test.

Option B, the Lambda trigger. Reach for a Lambda only when you need code between the event and the mutation: aggregating or transforming the payload, enforcing custom auth logic, or fanning one event into several mutations. The Lambda makes a SigV4-signed GraphQL call to the AppSync endpoint, and its execution role needs appsync:GraphQL scoped to the specific mutation field.

No, raw bidirectional

No, device-scale fan-out

Yes

Yes

No: transform, fan-out, custom auth

Downstream event

Clients speak GraphQL and event maps to a mutation?

API Gateway WebSocket

AWS IoT Core

Payload usable as-is?

Native EventBridge to AppSync target

Lambda trigger, SigV4 call

The trade-off is small but real. The native target removes an entire moving part, which is why it is the default. The Lambda buys you a place to run logic, at the cost of a function to own, a cold-start path, and an IAM role that must be scoped tightly. Start native; add the Lambda only when an event genuinely cannot be shaped into the mutation variables.

CDK wiring

The API uses IAM as its default authorization mode so the server-side trigger can call the mutation, and adds Cognito or API key as additional modes for client traffic. The current way to attach a schema is Definition.fromFile; the older schema: prop is superseded.

const api = new appsync.GraphqlApi(this, 'BffApi', {
  name: 'bff',
  definition: appsync.Definition.fromFile(
    path.join(__dirname, 'schema.graphql'),
  ),
  authorizationConfig: {
    defaultAuthorization: { authorizationType: appsync.AuthorizationType.IAM },
    // add Cognito / API key as additionalAuthorizationModes for client traffic
  },
  xrayEnabled: true,
});

The aggregation query resolvers that make this a BFF use a Direct Lambda data source, which reads cleanly for a platform audience:

const aggregateFn = new lambda.Function(this, 'AggregateFn', {
  /* runtime, handler, code */
});
const aggregateDs = api.addLambdaDataSource('aggregateDs', aggregateFn);

aggregateDs.createResolver('GetOrderResolver', {
  typeName: 'Query',
  fieldName: 'order',
});

A Direct Lambda data source needs no mapping templates: AppSync lets you supply a request template, a response template, or neither, and with neither it passes the full context to the function and returns its result unchanged. That is exactly what the createResolver call above does; reach for MappingTemplate.lambdaRequest() / lambdaResult() only when you need to reshape the payload between AppSync and the function.

The trigger mutation gets a NONE data source and the JS local resolver shown earlier:

const noneDs = api.addNoneDataSource('publishDs');

noneDs.createResolver('PublishOrderUpdateResolver', {
  typeName: 'Mutation',
  fieldName: 'publishOrderUpdate',
  runtime: appsync.FunctionRuntime.JS_1_0_0,
  code: appsync.Code.fromAsset(
    path.join(__dirname, 'resolvers/publishOrderUpdate.js'),
  ),
});

The README’s JS examples mostly show pipeline resolvers (AppsyncFunction plus Resolver with pipelineConfig). A unit (non-pipeline) JS resolver via createResolver({ runtime, code }) may or may not be accepted by the current L2 for this NONE trigger. Both shapes are documented, so the plan is to try the unit form first and fall back to the pipeline shape if cdk synth rejects it:

// Fallback if the unit JS resolver is not accepted:
const publishFn = new appsync.AppsyncFunction(this, 'PublishFn', {
  api,
  dataSource: noneDs,
  name: 'publishOrderUpdate',
  runtime: appsync.FunctionRuntime.JS_1_0_0,
  code: appsync.Code.fromAsset(
    path.join(__dirname, 'resolvers/publishOrderUpdate.js'),
  ),
});
new appsync.Resolver(this, 'PublishResolver', {
  api,
  typeName: 'Mutation',
  fieldName: 'publishOrderUpdate',
  runtime: appsync.FunctionRuntime.JS_1_0_0,
  code: appsync.Code.fromInline(
    'export function request(){return {}} export function response(ctx){return ctx.prev.result}',
  ),
  pipelineConfig: [publishFn],
});

IAM for the server-side mutation

Client-facing fields keep their Cognito or API key mode; the trigger mutation must allow IAM. Mark the field with @aws_iam in the schema (shown earlier), then grant the caller appsync:GraphQL scoped to that field’s ARN. CDK provides grant helpers so no ARN is hand-written:

// for the native EventBridge target's role, or the Lambda trigger's role
api.grantMutation(triggerRole, 'publishOrderUpdate');

// granular equivalent:
api.grant(
  triggerRole,
  appsync.IamResource.ofType('Mutation', 'publishOrderUpdate'),
  'appsync:GraphQL',
);

The underlying policy these helpers produce scopes the action to one field:

{
  "Effect": "Allow",
  "Action": ["appsync:GraphQL"],
  "Resource": [
    "arn:aws:appsync:REGION:ACCOUNT_ID:apis/GRAPHQL_ID/types/Mutation/fields/publishOrderUpdate"
  ]
}

The auth-mode mismatch is a common trap. Clients authenticate with Cognito or an API key; the server authenticates with IAM and SigV4. With multi-auth configured, the trigger field must allow both the client mode and IAM, or the server-side mutation fails silently from the client’s point of view.

Hardening the trigger leg: retries and a dead-letter queue

The silent-failure risk deserves more than a pitfall bullet. When the trigger leg drops an event, no subscriber sees an error; the update simply never arrives. The safety net is the retry policy and dead-letter queue that every EventBridge target accepts, so I plan to wire both onto the trigger target from the start rather than bolt them on after the first lost update.

const triggerDlq = new sqs.Queue(this, 'PublishTriggerDlq');

rule.addTarget(
  new targets.AppSync(api, {
    graphQLOperation:
      'mutation Publish($input: OrderUpdateInput!) { publishOrderUpdate(input: $input) { orderId status } }',
    variables: events.RuleTargetInput.fromObject({ /* mapped from the event */ }),
    retryAttempts: 5,
    maxEventAge: Duration.hours(2),
    deadLetterQueue: triggerDlq,
  }),
);

Two failure modes behave differently, and the docs are explicit about it. A transient failure is retried with backoff up to retryAttempts, or until maxEventAge elapses, and only then does the event land in the DLQ. Other errors skip retries entirely: missing permissions, a target that no longer exists, or an invalid endpoint go straight to the DLQ, because retrying cannot help until the underlying problem is fixed. The DLQ is a standard SQS queue (FIFO is not supported), and each dead-lettered message carries the error code, the exhausted-retry condition, the retry-attempt count, and the rule and target ARNs, which is enough to tell a bad IAM grant apart from a genuine downstream outage.

The Lambda fallback path adds its own layer: an on-failure destination or a dead-letter queue on the function catches what fails inside the function after EventBridge has handed the event over. Either way, a subscriber-facing failure only ever signals server-side, so the plan is a CloudWatch alarm on the rule’s failed-invocation metric and on DLQ depth, with a redrive once the cause is fixed. Without that, a broken trigger leg looks exactly like a quiet system.

Managing the subscriptions: per-client routing

Without filters, every connected client receives every event. That is both a firehose and a cost problem, because each delivered message is billed as a real-time update. Enhanced subscription filters fix this. Filters are written as JSON in the response handler of a resolver attached to the subscription field (NONE data source), using extensions.setSubscriptionFilter():

import { util, extensions } from '@aws-appsync/utils';

export function request(ctx) {
  return { payload: null };
}

export function response(ctx) {
  const filter = {
    or: [
      { orderId: { eq: ctx.args.orderId } },
      { status: { in: ['shipped', 'delivered'] } },
    ],
  };
  extensions.setSubscriptionFilter(
    util.transform.toSubscriptionFilter(filter),
  );
  return null; // required when using enhanced filters
}

Within a single filter, rules combine with AND; across filters in a group, they combine with OR. Enhanced filtering and basic (argument-only) filtering are mutually exclusive: once the resolver sets an enhanced filter, AppSync stops applying argument-based matching. That makes the subscription resolver the single place that decides what each client sees, which is exactly what you want for authorization:

  • Authorization constraint. Read the caller’s identity in the resolver and inject their user id into the filter, so a client only ever receives its own orders.
  • Client-driven narrowing. Expose a filter: String argument, let the client pass a JSON string, and merge it with the authorization constraint inside the resolver. Because the resolver owns the merge, the client can narrow the stream but never widen it past what the backend allows.

For per-user authorization, the subscription resolver reads ctx.identity and injects the user id into the filter. Under Cognito user pools that field is ctx.identity.sub, the UUID of the authenticated user. Watch the asymmetry: the subscriber path is Cognito, but the trigger path is IAM, and an IAM identity has no sub — it exposes username, userArn, and cognitoIdentityId instead.

One subtlety worth internalizing: in a subscription, passing owner: null registers for items with no owner, which is different from omitting owner entirely (all items). It is easy to wire a client filter that silently asks for the wrong slice.

Cost and limits

AppSync GraphQL pricing has three components plus data transfer at EC2 rates. The trigger mutation is billed as a data-modification operation, and every subscriber that receives a message counts as one real-time update, so fan-out multiplies that line.

ComponentRateWhat counts
Query and data modification operations$4.00 per millionMutations, including the trigger mutation
Real-time updates$2.00 per millionEach outbound broadcast message and WebSocket operation; fan-out multiplies this
Connection-minutes$0.08 per million minutesTime a client holds a WebSocket open

AWS publishes a worked example on the pricing page: a chat app with 2,500 monthly active users, 1,500 connected minutes each, 1,000 sent and 1,000 received messages per month, totals $10.00 operations + $0.21 data transfer + $5.00 real-time + $0.30 connectivity = $15.51 per month. Treat that as AWS’s illustration, not a projection for any specific BFF.

A few documented limits to design against: subscription message payload is capped at 240 KB per message and is not adjustable; request execution time is 30 seconds; handler code size is 32 KB; the evaluated resolver response is 5 MB; the schema document is 1 MB. There is also a default cap of 200 subscriptions per client connection, which is adjustable. The quotas table does not publish a per-account ceiling on concurrent subscriptions, so do not assume one there.

Common pitfalls

  • Writing to a datastore inside the trigger mutation. The NONE resolver must be a pure echo. A write here creates a double write and data drift against the downstream source of truth.
  • Forgetting per-client filters. Every client then receives every event, which is both a firehose and a real-time-update cost blowup, since each delivery is billed.
  • Mis-scoped IAM. appsync:GraphQL must target the field ARN, and the field must carry @aws_iam for the server caller. With multi-auth, the field must allow both the client mode and IAM.
  • Silent trigger-leg failure. A dropped event reaches no subscriber and raises no client-side error, so the retry policy and dead-letter queue covered earlier are the only safety net.
  • Mixing up the products. appsync.EventApi with channel namespaces is a separate pub/sub product. Do not paste its addChannelNamespace snippets into a GraphqlApi build; they look similar in search results.
  • Returning a non-null value from the subscription resolver when enhanced filters are on. It must return null.

A note on legacy: current AWS tutorials use the JS @aws-appsync/utils runtime for both the NONE local resolver and enhanced filters. Many community posts and older AWS pages still show VTL such as $util.toJson($context.arguments). VTL still works, but JS (FunctionRuntime.JS_1_0_0) is the recommended path.

Closing

For an AppSync BFF where clients already speak GraphQL and downstream events can be modeled as domain mutations, the bridge pattern is the right default: route events through EventBridge into a NONE-data-source mutation and let enhanced filters do per-client routing, with no second WebSocket stack to run. Within that bridge, start with the native EventBridge to AppSync target and add a Lambda trigger only when you need payload transformation, multi-mutation fan-out, or custom auth. The boundary is clear: when the push is not naturally a mutation or clients are not GraphQL clients, API Gateway WebSocket gives raw bidirectional messaging; at device scale with topic-tree fan-out, AWS IoT Core is the purpose-built tool. The next step is to cdk synth the one question still open above — whether the NONE trigger takes a unit JS resolver or needs the pipeline shape — before any deploy.

References

Related posts