Discover top website trends and insights for 2025
Read report
Blog
From months to minutes: How we rebuilt Webflow's billing infrastructure

From months to minutes: How we rebuilt Webflow's billing infrastructure

Rebuilding billing at Webflow for speed, clarity, and scale

From months to minutes: How we rebuilt Webflow's billing infrastructure

Rebuilding billing at Webflow for speed, clarity, and scale

We’re hiring!

We’re looking for product and engineering talent to join us on our mission to bring development superpowers to everyone.

Explore open roles
Explore open roles
Written by
Erick Bett
Erick Bett
Staff Software Engineer
Erick Bett
Erick Bett

After spending months updating pricing logic, we knew something had to change. This post walks through how we rebuilt Webflow’s billing infrastructure, giving teams the flexibility to ship updates without engineering bottlenecks.

When a simple pricing update required three engineers and two months,  it was clear our billing system had become a bottleneck. Routine changes turned into complex, error-prone work that slowed product development and frustrated teams across the company. This post shares how we rebuilt Webflow’s billing infrastructure from the ground up, turning multi-month projects into simple updates and freeing teams to ship features without engineering lift.

Problems with our billing infrastructure

Webflow's billing infrastructure evolved organically over time. What started as a simple Stripe wrapper became a complex internal system that, while more robust, introduced friction that slowed product velocity.

After our latest pricing update took three engineers two months to complete, we saw our billing infrastructure needed an overhaul, as the upcoming needs were much more complex. We identified four key problems that were limiting our velocity:

#1: Simple pricing updates required deep engineering effort

Routine pricing changes like adding a plan or updating discounts required navigating brittle logic. Each change meant touching multiple systems, coordinating across teams, and managing a high risk of regressions. Engineers avoided these updates because they were slow, error-prone, and pulled focus from roadmap work. We couldn't move fast on pricing strategy.

#2: Limited billing model flexibility prevented product iteration

We originally designed our billing infrastructure for simple subscription models, but Webflow had evolved far beyond that. Use cases like usage-based billing, tiered pricing, add-ons, and enterprise-specific features required complex workarounds. Each new billing requirement became a custom implementation instead of a natural extension of our system, creating maintenance overhead and technical debt.

#3: Billing bottlenecks threatened system reliability

As Webflow grew, our billing infrastructure struggled to keep pace. Our synchronization engine called Stripe for every subscription change, creating performance bottlenecks that became more pronounced with scale. Database queries for billing operations consumed notable resources, and our architecture couldn't efficiently handle the volume of transactions our growing customer base required.

#4: Complexity and technical debt slowed development

Years of iterative updates without architectural improvements created a complex system that was difficult to understand, modify, and extend. Each pricing change required handling grace periods for existing customers, managing multiple plan versions, and ensuring backward compatibility. We built enterprise features as bolt-ons instead of first-class citizens, implemented through workarounds like beta flags and discount coupons instead of proper billing constructs.

These problems made it clear that incremental improvements wouldn't suffice — we needed a fundamental rethinking of our billing architecture to restore development velocity and enable future growth.

Billing infrastructure principles

Despite our initial hesitation to depart from our familiar system, the benefits of adopting a robust billing solution quickly became clear. As we began evaluating alternatives, our selection criteria included:

  • Scalability: The ability to handle 10x growth in billing operations with low-latency entitlement checks, persistent caching to minimize API calls, and reduced dependency on synchronous Stripe calls that created performance bottlenecks.
  • Legacy pricing management: Maintaining compatibility with previous pricing and packaging while offering clear pathways to newer models. Our last solution lacked simple migration processes, forcing us to develop specialized code for individual pricing adjustments, which resulted in increased support requirements and accumulated technical debt.
  • Flexibility: Ease of integration and the ability to update pricing and packaging.
  • Zero-deploy entitlement updates: The ability to make entitlement changes through a user interface without engineering work or deployments. Updates are instantly reflected across the platform, replacing our static configuration file approach that requires code changes and deployments for every entitlement modification.

After extensive evaluations, we selected Stigg because it met all our criteria, especially the robust entitlement service key to our needs.

Proof of concept

We built a Proof of Concept (POC) to validate our hypothesis and evaluate core features. Our POC focused on three key success factors:

  1. Improving the team's velocity in implementing pricing and packaging changes.
  2. Verifying that the entitlement system with appropriate caching could deliver fast response times.
  3. Ensuring successful implementation and management of enterprise plan subscriptions.

A small team built a focused integration that validated our core hypotheses around velocity, performance, and enterprise capabilities. After observing Stigg in a production-like environment, we confirmed it met our engineering and product-level key principles. The POC gave us confidence that this wasn't just a technology swap — it was a fundamental improvement to how we could operate our billing system.

Migration process

Given the complexity of this undertaking, we decided to adopt the Strangler Fig approach to replace our in-house billing system to avoid downtime gradually. Here's how we approached the migration:

  • Initial integration and double write/update: We first implemented the billing service to start actively syncing with Stigg. During this phase, we performed double writes — updating our in-house billing system and, in parallel, sending updates to Stigg via a background job. This ensured consistent data across both systems during the migration process.
  • Data migration: We then executed a one-time migration to export all existing subscriptions to Stigg. The double write process was crucial here, as it prevented any lapse in updates between the two systems during the export process.
  • Handling Unexpected Scenarios: We developed a backfill mechanism to address unforeseen situations, such as backup restores, site clones, or system crashes. When specific errors like "ResourceNotFound" occur, we execute the backfill logic before retrying the function. Additionally, we created a diffing engine that detected and logged any inconsistencies between Stigg and our internal billing platform. This mechanism was crucial for diagnosing issues and sometimes triggered the backfill process to correct missing or incorrectly configured subscriptions and entitlements.

These steps gave us the confidence to transition to Stigg as our primary source of truth for billing, ensuring a smooth migration process with minimal disruption.

System architecture: Designing for gradual migration

The core challenge wasn't just integrating with Stigg — it was doing so without breaking our existing system or forcing a risky big-bang deployment. We needed a way to gradually migrate workspaces (the primary organizational units where teams collaborate on sites) while maintaining full functionality for those not yet enrolled.
From our previous billing system implementations, we knew that directly coupling application code to third-party APIs creates brittle systems that are difficult to test and migrate safely. This time, we built a billing service layer that completely encapsulates Stigg from the rest of our application.
This abstraction layer gave us three key capabilities: centralized API access for simplified integration management, comprehensive logging and monitoring at a single point, and a unified interface that allowed the rest of our application to remain agnostic about the underlying billing provider. Most importantly, it enabled us to swap implementations without touching application code.
We organized our billing service into four core domains that map to our business needs:

  • Customer Service for workspace lifecycle management
  • Plan Service for pricing information
  • Subscription Service for billing operations
  • Entitlement Service for feature access control

Each service needed to support multiple implementations to enable our gradual rollout strategy.

Architecture diagram showing Webflow's billing system setup. The Webflow App routes requests through a Strategy layer, which then selects the relevant Strategy. Based on the selected strategy, requests are directed to one of four core services: Entitlement Service, Subscription Service, Customer Service, or Plan Service. The entitlement service interacts with a Redis cache for data retrieval. On a cache miss, data is fetched from the Stigg API, and updates are asynchronously pushed back to Redis via an SQS queue. The diagram illustrates a modular, decoupled system designed for flexibility, caching, and efficient external API integration.
No items found.

Strategy pattern for zero-downtime rollouts

We implemented the Strategy Design Pattern across all our billing services to solve the gradual migration challenge. This pattern allowed us to maintain three different implementations of each service: a Stigg implementation for enrolled workspaces, a null no-op implementation for legacy workspaces, and a mock implementation for testing.

export function getEntitlementService(
  opts: ServiceFactoryOptions
): EntitlementServiceInterface {
  const loggerInstance = opts.logger ?? createDefaultLogger();
  const eventEmitterInstance = opts.eventEmitter ?? getEventEmitter();

  switch (serviceResolver(opts)) {
    case SERVICE_PROVIDERS.MOCK:
      return getTestingMockForService(StiggEntitlementService);

    case SERVICE_PROVIDERS.NULL:
      return new NullEntitlementService({
        logger: loggerInstance,
        eventEmitter: eventEmitterInstance,
      });

    case SERVICE_PROVIDERS.STIGG:
      return new StiggEntitlementService({
        logger: loggerInstance,
        eventEmitter: eventEmitterInstance,
        stigg: getConfiguredStiggClient(loggerInstance),
      });

    default:
      throw new Error('Unknown service type');
  }
}

The best part of this approach was that calling code remained oblivious to which strategy it was using — it calls the same interface and receives consistent behavior whether it's hitting Stigg's live API, a no-op implementation, or mock data:

const strategy = await getBillingServiceStrategy(workspaceId);
const service = EntitlementService(strategy);
const entitlement = await service.getBooleanEntitlement({
  workspaceId,
  featureId: CAN_LOCALIZE_CMS,
  siteId,
});

if (entitlement.hasAccess) {
  // has access to localize CMS
}

Each strategy implementation served a specific purpose. Our Stigg strategy handled the full integration with their API, including data transformation between their format and ours:

class StiggEntitlementService {
  async getBooleanEntitlement(params: GetBooleanEntitlement) {
    const entitlement = await this.stiggClient.getBooleanEntitlement(params);
    const transformed = stiggToWebflowEntitlementTransformer(entitlement);

    this.eventEmitter.emit('entitlement.get-boolean', transformed);
    return transformed;
  }
}

Our null strategy provided a safe no-op implementation that preserved monitoring capabilities:

class NullEntitlementService {
  async getBooleanEntitlement(params: GetBooleanEntitlement) {
    this.eventEmitter.emit('entitlement.get-boolean', params);
    return {};
  }
}

This pattern was essential for our migration success. It enabled zero-downtime rollouts, gradual workspace migration, comprehensive testing with mocks, and the flexibility to swap billing providers in the future by simply implementing new strategies.

Keeping two systems in sync

The strategy pattern solved our code organization challenge. However, we still faced a coordination problem: ensuring that workspaces enrolled in our feature flag rollout were properly enrolled on Stigg's platform. This synchronization was key for maintaining data consistency as we handed over subscription management from our legacy system to Stigg.

The core challenge was that we had two independent systems making enrollment decisions. Our feature flag system determined which workspaces should use the new billing infrastructure, while Stigg needed to know which customers it should actively manage. A mismatch between these systems could result in billing inconsistencies or service disruptions.

We solved this with an enrollment synchronization function that runs whenever a workspace interacts with our billing system:

export async function isStiggBillingSyncEnabled(
  workspace: Workspace
): Promise<boolean> {
  try {
    return await getAndSetRolloutValue(workspace);
  } catch (err: Error) {
    if (shouldRunCustomerBackfill(err.message)) {
      try {
        await backfillMissingCustomer(String(workspace._id), err.message);
        return await getAndSetRolloutValue(workspace);
      } catch (backfillError: BackfillError) {
        logger.error('Error backfilling Stigg customer', err);
      }
    }
    return false;
  }
}

async function getAndSetRolloutValue(workspace: Workspace) {
  const isEnrolled = featureConfig.getFeatureFlag(FEATURE_FLAG);
  await ensureEnrolledInStigg(workspace, isEnrolled);
  return isEnrolled;
}

This function became our source of truth reconciliation. It detected mismatches between our feature flags and Stigg's enrollment state, corrected them automatically, and cached the results to minimize performance impact.

Performance and reliability

The new billing system delivered notable performance improvements across the entire platform.

  • By eliminating the synchronous Stripe API calls that previously blocked every plan change, we achieved a 95% reduction in sync operations for workspaces with large numbers of sites. This architectural change removed a notable bottleneck that caused timeouts and degraded user experience.
  • In our entitlement system, we replaced a static capabilities config with dynamic, cached entitlement checks that provided centralized usage tracking—something we never had before. Previously, different features implemented their tracking methods (or didn't track usage), creating inconsistencies and performance overhead. The new system improved page load times while reducing code complexity across the platform.
  • The caching strategy we implemented eliminated redundant API calls, automatically detected and corrected data inconsistencies, and ensured the system remained functional even during external service outages. This resilience was key during the migration period when we needed to maintain service reliability while coordinating between multiple systems.
  • We developed distinct error-handling strategies based on the criticality of each operation. We classified enrollment failures as critical to prevent billing inconsistencies, and logged less critical operations without interfering with user workflows. Additionally, we implemented comprehensive backfill logic to address edge cases such as workspace restores, site clones, and temporary system failures.

Throughout the rollout, we monitored performance closely with dedicated DataDog dashboards that tracked cache hit rates, API response times, synchronization failures, and enrollment mismatches in real-time. This monitoring proved invaluable for identifying and resolving issues before they could impact customers and gave us confidence to accelerate the rollout timeline.

Reflections on the migration

The migration to the new billing system has transformed how we handle pricing and packaging changes. What once took months of engineering effort has become a non-event, with product managers now able to update plans and entitlements with little to no engineering involvement. This shift has empowered teams to implement changes quickly and confidently.

Screenshot of a Slack message stating, "Feels really good to get to a place where launching new features and incorporating them into existing plans is a non-event for our engineering team and I can manage the launch process on my own." The message has several positive emoji reactions, including hearts, flames, and thumbs up.

The strategic design decisions and comprehensive documentation have also enabled feature teams to add new features and entitlements with minimal Subscriptions & Payments team guidance. While we continue to clean up the remaining sections of the legacy system, this has laid a strong foundation for future growth and innovation at Webflow.

Lessons learned

Migrating a key system like billing taught us valuable lessons that will inform future large-scale migrations:
1. Map all edge cases early, especially cross-system workflows. We missed accounting for site workflows like restores, snapshot reverts, and workspace transfers. These overlooked scenarios led to subscription discrepancies that required backfilling. We systematically catalog all workflows that touch the migration system before starting implementation.

2. Plan for rate limits and external API constraints from day one. Due to workspaces with numerous sites, we faced rate-limit constraints from Stigg's API, leading to service disruptions. We worked with Stigg to increase limits and leverage an edge API backed by DynamoDB to reduce dependency on real-time external queries and provide better resilience.

3. Legacy cleanup is more complex than the initial migration. While the new system worked beautifully, removing legacy implementations proved more complicated than anticipated. Coordinating with feature teams to migrate to entitlements while avoiding performance degradation required careful sequencing and extensive testing. We learned to allocate more time for cleanup phases in future migrations.

4. Monitoring and observability are migration accelerators. Our comprehensive DataDog dashboards didn't just help us catch issues — they gave us the confidence to move faster. Real-time visibility into cache hit rates, API response times, and system health allowed us to accelerate rollout timelines because we could identify and resolve problems before they impacted customers.

Conclusion

Rebuilding our billing infrastructure was a complex but necessary endeavor to support Webflow's next decade of growth. This gave us a scalable and flexible foundation that reduces the engineering effort required to manage pricing and packaging changes while better supporting our future growth and customer expectations.

Among the key gains from this migration, we've dramatically improved the efficiency of our billing updates, allowing changes that once took months to be made in a fraction of the time. The new system enables us to introduce new features and pricing models without the heavy lifting previously required. It has been designed with scalability in mind, ready to support Webflow's expanding user base without becoming a bottleneck. We've achieved notable latency improvements — eliminating the synchronous API calls that previously caused timeouts and delivering faster, more reliable billing operations across the platform. Another important outcome of this project has been the decision to encapsulate third-party tools behind an internal service layer. This pattern simplified the current rollout and ensured that we could swap out third-party providers in the future without requiring notable changes across our codebase.

Looking ahead, there's still a lot to build. We've laid the foundation, but many of the most challenging and rewarding problems in billing, entitlements, and monetization are still ahead of us. If solving these kinds of challenges excites you, we'd love to hear from you—check out our open roles.

We’re hiring!

We’re looking for product and engineering talent to join us on our mission to bring development superpowers to everyone.

Explore open roles
We’re hiring!

We’re looking for product and engineering talent to join us on our mission to bring development superpowers to everyone.

Explore open roles
Explore open roles
Last Updated
June 9, 2025
Category
We’re hiring!

We’re looking for product and engineering talent to join us on our mission to bring development superpowers to everyone.

Explore open roles
Explore open roles