39 results found
-
When there are errors on the servicebus we don't log it. It would help if we logged errors for troubleshooting purposes.
When there are issues with the servicebus it is not possible to see what went wrong many times. You can go to the servicebus and look at the graph and see that there are "server errors" and "user errors" and "throttled messages"but there is no way to see the details of the server errors or the user errors.
Some times we can see a log entry in the CMS logs but not always and I guess it depends on the type of error.
If we had more information it would help when troubleshooting customers having problems with the servicebus.14 votes -
Immutable Backups
Immutable Backups are increasingly becoming cybersecurity and resilience best practice.
Currently CMS PaaS data is not covered by immutable backups. To further protect us against ransomware attacks and accidental or malicious data tampering having this as a standard or at least optional feature would be highly desirable, providing a reliable last line of defense for fast and trustworthy recovery after a cyber incident or operational failure.
Any update on timeline for its availability would be very much appreciated.6 votes -
Failover alerts
We need to receive alerts when a site goes into failover. The failover CMS should show a warning that its disabled during failover (instead of error message).
9 votes -
Allow disabling warmup for integration and preproduction
Integration and preproduction environments are sometimes protected by IP-whitelists and in those cases the warmup step always fails with status code 401. The warmup system waits for about 15 minutes until these requests time out. This increases delivery times to these environments specially when CI/CD is set up.
Ex:
2026-03-10 12:40:24 Information Starting to warm up the targets slots...
2026-03-10 12:40:25 Information Preparing target slot for Go Live (<masked>/slot) (warming up the slot)
2026-03-10 12:52:21 Warning Timed out waiting for all instances for webapp <masked> and slot "slot" to become ready!
2026-03-10 12:52:21 Information Validating deployment ID uniqueness between slots…7 votes -
Dynamic Scaling in DXP - improve performance and reduce costs
The default hardware SKU for Optimizely is P1V3. P1V3 only has 2 cores and P2V3 has 4 cores. Because the number of cores available to the system dictates the number of default threads the system attempt to regulate, it would be best to deviate away from our current default hardware SKU, P1V3 to P2V3, instead. However, doing so would increase costs which we do not want to do necessarily. This is a proposal to decrease costs while simultaneously increasing hardware resources during peak hours.
The idea is to increase a customer's lower environment hardware to P2V3 during "working hours" (to…
1 vote -
HSTS on the root domain
We are experiencing some redirection issues we have no control over as they are done in the root domain.
http://oldsite.com redirects to https://www.oldsite.com then finally to https://www.newsite.com
The redirection should be as follows:
http://oldsite.com -> https://oldsite.com -> https://www.newsite.com
Please update Optimzely’s configuration to do these redirects properly
30 votes -
Regenerate Content Graph keys and secrets through Paas portal
Clients should have the ability to regenerate Content Graph keys and secrets in the self-service Paas portal.
6 votesGood news - this idea is now being explored by our product and design teams. We’re researching potential solutions and scoping out what an implementation might look like. We’ll share updates here as our thinking evolves.
-
IP Address restriction in Cloudflare
It would be great if we could configure an IP Address whitelist in Cloudflare so that we only allow a specific set of source IP Addresses to be allowed to access our DXP instances. This will allow us to block public access to non-production environments
2 votes -
App Insights alerts
Self-serve creation of alerts in App Insights based on specific thresholds exceeding a limit will help be proactive in responding to potential performance issues or other problems
11 votes -
Add .NET Counter publication to App Insights on Startup
There are a whole suite of .NET counters available to use to publish to App Insights. This is a low-lift modification that enables us to gain many insights, directly inside of App Insights Metrics, we can utilize to diagnose issues on customers.
This enables us to do performance investigations that otherwise require manual intervention (such as downloading .ETL files and opening them in PerfView or capturing dump files and analyzing them).
We can skip these manual steps and jump write to "close to root causes" by publishing this information. There is little cost and no significant performance degradation associated with…
1 vote -
Ability to smoke-test more than 1 site during smooth deploy (slot domains)
Ability to smoke-test more than 1 site during smooth deploy (slot domains)
As a CMS developer and QA specialist we want to be able to smoke-test multiple sites so that we can detect potential issues on all our sites during deployment with DXP Cloud Platform to our multi-site CMS platform.
When we deploy to PREP/PROD we get a temporary SLOT to run our smoke-tests against. For example:
https://projectidprep-slot.dxcloud.episerver.net/
https://projectidprod-slot.dxcloud.episerver.net/
These are the default URL provided by Optimizely and are configured on the first website. We would like to be able to smoke-test multiple sites on the SLOT instance.
We could…
5 votes -
25 Production Service Buses Broken
We need to define what "working" means for a service bus so that reliability engineering can maintain reliability based on these metrics.
When a service bus no longer functions, reliability engineering should be equipped with the capacity to upgrade service buses in order to meet a component-specific SLA.
We need product management to define what "working" means for a service bus so reliability engineering can respond appropriately when a service bus is "broken" rather than having to go back to PM as though we need a exception for every broken service bus.
We also need monitoring in place to ensure…
1 vote -
Assess binding settings to resolve resets
We need to evaluate whether setting the suggested Microsoft setting is worthwhile to resolve resets. As part of that effort, we would have to evaluate the risks of doing so.
1 vote -
Align IQI Health Check with Dashboard Actionable Reports
Currently, the IQI report offers information such as a breakdown of HTTP status codes. Unfortunately, that does not offer the customer or partner information they can actively take action on because it requires further investigation on their part to make it actionable.
We want to bridge the gap between the data and being able to take action, on our side. The information we offer within the Dashboard is much closer to directly actionable, in many cases it is directly actionable.
For example, on the dashboard we offer a EpiCms database deadlock report. The customer or partner needs to identify which…
1 vote -
IQI Report - App Insights as the Source of Truth
There are multiple similar functions happening that are similar within Optimizely. They all need to be utilizing the same data a the singular source of truth. They are:
Incident Management, primary data is App Insights.
Deep diagnostics done by Problem Management, primary data is App Insights
App Insights Dashboard
IQI Health Check reporting, primary information (aside from pingdom's availability info which for legal reasons) is Cloud Flare for health information.
The inconsistency of source data causes issues because we cannot make a consistent presentation of the information when the source being utilized is entirely different.
It also adds possibility of…
1 vote -
Identify and Aid Customers with Production Live-Locks
There's the concept of a "dead" lock and a "live" lock. A live-lock is essentially a race condition within a production environment. It causes stair stepping of CPU usually until the server crashes.
This often happens when a developer accidentally uses a non-thread safe object in a multi-threaded manner.
The object (for example a HashSet) being used needs to be identified and a thread safe type needs to be replaced so that the live-lock goes away and the CPU goes back to normal.
1 vote -
Down-Sampling Service Bus App Insights
Much of the log analytics costs come from voluminous amounts of service bus activity that is largely useless for analytics purposes. We could generally use a fraction of the analytics and we would be just fine.
For diagnostic purposes, we generally need to inspect the contents of the service bus itself to identify problems.
1 vote -
Consider shifting from Adaptive Sampling to Fixed Sampling
There's a known bug in Adaptive Sampling that prevents us from getting accurate analytics from App Insights. It largely calls the values of App Insights largely into question because we cannot tell when the metrics within App Insights are accurate or not.
Moving from Adaptive Sampling to Fixed Rate Sampling resolves this issue, but it also can cause an increase in log analytics and the possibility of exceeding the log quota.
If we can come up with a weekly way to determine the appropriate sampling rate for a given type of log data for a customer and tweak the analytics…
1 vote -
Identify and Communicate Crashing Instance Causes by Exception/Log Inspection
Due to my elevated level of access nobody on the team gets these errors emails from Microsoft except Erik.
We need an easy way to address these with customers through the ticketing system.
These are causing instances crashes for the sites. Here's an example of what's happening on moco.
Here are the email threads I'm receiving from Microsoft that shows they're having outages.
1 vote -
Upload very large single files using cms
People are needing to upload very large files, such as video files and are running into Cloudflare file size limits.
Being able to upload a very large file into the storage blobs using the Deployment API and then being able to refer to it in the CMS may be a way to work around the size limits in Cloudflare.
The problem trying to do this now is that uploading via the deployment API won't make it into the same container that the CMS is configured to read from. Also, usually when people add media, a reference in the DB is…
4 votes
- Don't see your idea?