Genesys Cloud - Developer Community!

Sign Up

Expand all | Collapse all

Terraform Extremely Slow

  • 1.  Terraform Extremely Slow

    Posted 4 days ago

    Hello,

    With Provider 1.74.0, we are experiencing extreme delays when objects are being pushed into the environment.

    Pushing a single object, even during a targeted 'terraform plan', takes over 5 minutes.  Our Teams deal with large number of objects, which is causing significant delays.

    Detailed log files have been uploaded to Case #0003992543.

    Thank you in advance.


    #CXasCode

    ------------------------------
    Juraj Makacek
    ------------------------------


  • 2.  RE: Terraform Extremely Slow

    Posted 4 days ago

    Hello Juraj, 

    Thank you for bringing this issue to our attention. We understand this delay may be disruptive to your workflow. 

    Our CXasCode engineering team is based in Ireland and India; so we will prioritize this first thing tomorrow morning. To help us investigate, can you please confirm whether you only started seeing the issue after upgrading to v1.74.0? If so, which version were you on before upgrading to v1.74.0.

    Thank you, 

    Naeem Tai



    ------------------------------
    Naeem Tai
    Team Lead, Lead Software Engineer
    ------------------------------



  • 3.  RE: Terraform Extremely Slow

    Posted 4 days ago

    Hello,

    I have only noticed it today, during our attempt to recover some deleted objects.  We upgraded about a month ago if that makes any difference, but didn't have any reports from our Teams until today.  As such, I doubt it would be related to the version, may be due to restoring objects which were previously deleted?

    In either case, we would like to rely on Cx-As-Code to be able to recover objects as quickly as possible.  Our environments have a lot of objects, so we would welcome any performance improvements.

    Thank you.



    ------------------------------
    Juraj Makacek
    ------------------------------



  • 4.  RE: Terraform Extremely Slow

    Posted 4 days ago

    Thank you for the additional information, Juraj. We have also retrieved the logs you provided through the case. 



    ------------------------------
    Naeem Tai
    Team Lead, Lead Software Engineer
    ------------------------------



  • 5.  RE: Terraform Extremely Slow

    Posted 3 days ago

    Hi @Juraj Makacek

    After analyzing your logs from Case #0003992543,  identified the root cause of the delays.

    The resource module.routing_queue.genesyscloud_routing_queue.xxxx_test (ID: xxxxx) exists in your Terraform state but has been deleted from Genesys Cloud-the API is returning 404 Not Found. When this occurs, the provider enters a retry loop with exponential backoff (500ms, 1s, 2s, 4s, 8s, then 10s intervals) for up to 5 minutes to handle eventual consistency-a brief period after resource creation where the API may temporarily return 404 before data is fully propagated. The logs confirm this: repeated 404 responses from 15:34:56 until the timeout at 15:39:56-exactly 5 minutes. After the timeout, the provider correctly removes the resource from state and the plan continues.

    Since you're recovering deleted objects, the Terraform state still contains references to old resource IDs that no longer exist in Genesys Cloud. Terraform attempts to refresh these resources, receives 404 responses, and waits 5 minutes per deleted resource before proceeding.

    To speed up recovery,  you can clean your Terraform state before running the plan:

    1. Remove specific resources from state:
       terraform state rm module.routing_queue.genesyscloud_routing_queue.xxxx_test

           2. Or start fresh with an empty state (if recovering the entire environment):
                 mv terraform.tfstate terraform.tfstate.backup
                 terraform plan


    Thanks



    ------------------------------
    Hemanth Dogiparthi
    Manager, Software Engineering
    ------------------------------



  • 6.  RE: Terraform Extremely Slow

    Posted 3 days ago

    Hello,

    Thank you very much.  The "backoff" explanation makes a lot of sense for other delays we've experienced as well.

    First, a compliment, as this part I was not expecting to work: The run eventually does the "right thing", re-creates the object and updates the state file with the new ID.

    However, the backoff process is a challenge, as it is blocking everything else from executing.  For example, if it was "retry every minute for 2 retries total, while allowing other objects to be processed in the meantime,  it would be an easier sell.  As-is, everything is blocked, no indication of what's happening, and with having 1000s of objects like in our case, it makes it effectively unusable for the purpose we have it in the first place - being able to recover.

    To address your 2 points specifically:

    1. Yes, this is an option, we are aware of it and do have a process for it.  My challenge here would be why have automation at all, if we have to do things manually - review all object types, note the ones missing, get the proper TF resources names for each, and run a command for each - even if we assume 0 human error, we are still talking about a lot of manual work with hard-to-organize parallel work.  Especially in our current situation, where we had a "bad run", and we are literally talking about 1000s.
    2. In our current situation, "start fresh" is also not an option unfortunately, as "not everything" was deleted, so we would have long wait times and failures the other way: "Object already exists", which combined with the long run times we are discussing, I am not sure would be any easier.

    Apologies for the long response, but the short version would be that with the current delays, the resulting timeouts, and the fact backoff is a blocking operation, our partners lost faith in our ability to recover.

    Thank you.



    ------------------------------
    Juraj Makacek
    ------------------------------



  • 7.  RE: Terraform Extremely Slow

    Posted 3 days ago

    Hi @Juraj Makacek

    Thank you for the detailed feedback.

    We understand the challenge: when recovering from a "bad run" with thousands of deleted objects, the current 5-minute retry backoff will not help the cause. 

    Here is a solution we propose:

    We
     will make the retry timeout configurable.
     This will be delivered in an upcoming release. You'll be able to set the timeout interval via an environment variable or provider configuration, where setting it to 0 will mean the provider tries only once and immediately recognizes the resource as deleted-eliminating the backoff entirely. This will allow you to choose between:
    • Production deployments: Keep the default 5-minute timeout for eventual consistency
    • Recovery scenarios: Set to 0 for immediate fail-fast behavior
    In the meantime, you can try increasing Terraform's parallelism to process more resources concurrently
    This won't eliminate the per-resource delay, but it will allow more resources to be processed in parallelNote: Higher parallelism may trigger Genesys Cloud API rate limits, so you may need to experiment to find the right balance for your environment.
    We'll follow up with release details once the configurable timeout is available.

    Thanks


    ------------------------------
    Hemanth Dogiparthi
    Manager, Software Engineering
    ------------------------------



  • 8.  RE: Terraform Extremely Slow

    Posted 3 days ago

    Hello,

    Thank you very much, that would be HUGE!

    If I may ask, since there's an opportunity for new parameters, may I please ask for another one?

    This would be in addition to "parallelism" and "timeoutInterval" (naming up to you)

    retryInterval (or backoffInterval): Time, in seconds, that determines how often the "retry" happens.  In combination with "timeoutInterval" this would be super helpful for protection against running into Genesys Cloud API rate limits. I feel "500ms, 1s, 2s, 4s, 8s, then 10s"  may be a bit "too aggressive", but interested in your opinion as well.

    Thank you again!



    ------------------------------
    Juraj Makacek
    ------------------------------



  • 9.  RE: Terraform Extremely Slow

    Posted 9 hours ago
    Edited by Venkata Hemanth Dogiparthi 9 hours ago




  • 10.  RE: Terraform Extremely Slow

    Posted 9 hours ago
    Edited by Venkata Hemanth Dogiparthi 9 hours ago




  • 11.  RE: Terraform Extremely Slow

    Posted 3 days ago
    Edited by Juraj Makacek 2 days ago

    Apologies for double-posting, the above is a lot more important and this is more for satisfy my curiosity.

    I've been thinking about this statement:

    "API is returning 404 Not Found. When this occurs, the provider enters a retry loop with exponential backoff "

    The question I have is, why?  Especially during 'terraform plan', I'm wondering, what is the advantage or purpose of the backoff?

    The only reason I can think of is to account for dependency resolution without having to define order, but for that I'd think some kind of queue-based system would be better - sending the request to the back of the queue to retry later.

    Thank you.



    ------------------------------
    Juraj Makacek
    ------------------------------



  • 12.  RE: Terraform Extremely Slow

    Posted 11 hours ago
    Edited by Juraj Makacek 10 hours ago
    Hello,
     
    As we are going through the restore, we have noticed a few more things.  
     
    Please let me know if you'd like separate issues open for this, and I will be able to provide logs later if needed (still restoring).
     
    1. Some data sources take a very long time to fail
     
    data.genesyscloud_user.XXX: Still reading... [24m10s elapsed]
     
    2. Callable Time Sets crash the plugin.  Removing it from state file works, Cx-As-Code can then re-create it.
     
    │ Error: Plugin did not respond
    │ 
    │   with module.outbound_callabletimeset.genesyscloud_outbound_callabletimeset.XXX,
    │   on modules/outbound_callabletimeset/XXX.tf line 1, in resource "genesyscloud_outbound_callabletimeset" "XXX":
    │    1: resource "genesyscloud_outbound_callabletimeset" "XXX" {
    │ 
    │ The plugin encountered an error, and failed to respond to the
    │ plugin.(*GRPCProvider).ReadResource call. The plugin logs may contain more
    │ details.
     
    3. Some flows needed to be added manually - Cx-As-Code error message below.  In other words, other object types seem to recover (although slowly), where with flows it seems to fail.
     
    API Error: 410 - Flow 'XXX' has been deleted.
     
    Thank you.



    ------------------------------
    Juraj Makacek
    ------------------------------



  • 13.  RE: Terraform Extremely Slow

    Posted 9 hours ago

    Hi @Juraj Makacek

    Please find my responses below.

    On backoff interval times:
    The
     retry/backoff interval (500ms, 1s, 2s, 4s, 8s, 10s...) is part of the Terraform SDK's retry.RetryContext function (github.com/hashicorp/terraform-plugin-sdk/v2/helper/retry)- the standard retry mechanism used across Terraform providers. 

    The exponential backoff (500ms → 1s → 2s → 4s → 8s → 10s cap) is actually designed to help with rate limits by progressively slowing down retries. The delays increase quickly and cap at 10 seconds between attempts for most part of the retries.

    Why a backoff is needed:

    The ReadContext function is the standard Terraform SDK method that gets called for all operations - terraform planterraform apply, and our custom export process. This is built into the Terraform framework itself; whenever Terraform needs to read the current state of a resource from the public API, it invokes ReadContext. There's no mechanism in the SDK to distinguish which operation triggered the call - the framework treats all read operations identically by design.
    The exponential backoff retries are essential for terraform apply scenarios, where after creating or updating a resource, Terraform immediately calls ReadContext to verify the state matches what was configured. Due to eventual consistency, the newly created resource may not be immediately available - the retry mechanism ensures Terraform can successfully read and confirm the resource state before proceeding.
    The configurable timeout interval we have proposed previously should solve your problem. For recovery scenarios during terraform plan, you can set the timeout to 0  to skip retries and fail fast on deleted resources. For normal terraform apply operations where eventual consistency matters, you rely on  the default timeout to ensure reliable state verification.


    on data sources take a very long time to fail

     This looks like a bug and not an expected behaviour. Can you attach logs to the same case. for further investigation.


    on API Error: 410 , no recovery of flows

    Will check and get back to you. Please attach logs to the same case.

    on Callable Time Sets crash the plugin.  Did not try removing and re-importing yet.

    Please attach logs to the same case for further investigation

    Thanks



    ------------------------------
    Hemanth Dogiparthi
    Manager, Software Engineering
    ------------------------------



  • 14.  RE: Terraform Extremely Slow

    Posted an hour ago

    Thank you.  We will upload some logs as soon as our environments are validated.

    Do you please have an ETA as to when we can expect the configurable "retry timeout"?

    Thank you again.



    ------------------------------
    Juraj Makacek
    ------------------------------