If a piece of software is described in any way, shape or form with the word “enterprise” it’s a piece of garbage.
-Remy’s Law of Enterprise Software
“Enterprise” software products live in an uncomfortable space. They need to fulfill the needs of a business, without being specific to any one business. They’re a one-size fits all solution, and companies like Oracle or SAP compete on their feature-set and the customizability of their inner platform.
Maciek recently had his own horrifying encounter with Microsoft’s own Enterprise Resource Planning tool, Dynamics AX. Dynamics AX is a monster glued together out of the left-over parts of every product Microsoft makes. It integrates with SharePoint, Office, Project Server, and of course SQL Server and SQL Server Reporting Services (SSRS), because a tool like this is not useful without some sort of reporting system. It’s extensible through both .NET and COM. It’s a mess that’s been featured before.
Dynamics AX first entered Maciek’s life when his end-users complained that one of the Accounts Receivable reports was “incorrect”. Specifically, they needed the “Invoice Date” field to be color coded. If the invoice went out to a customer on a Monday, the field should be green, but if it went out the second Thursday of the month, it should be red, but black if it was any other day, except for Wednesdays, which should always be yellow unless they fall on an odd numbered calendar day or it’s a leap year. Maciek didn’t ask the users why this particular insane business rule existed, because they wouldn’t have explained it to him anyway. One specific user- Vlad- had a vision of what the report was supposed to look like, and that Maciek was responsible for fulfilling that vision.
Maciek dug into Dynamics AX and opened up the report in Business Intelligence Development Studio- Microsoft’s tool for editing SSRS reports. The report itself was a canned report, developed by Microsoft and bundled with Dynamics AX. Since “naming things” is one of the “hard” problems in computer science, whoever actually developed the report didn’t bother trying to solve it. Every single field in the report was named “textbox45” or “textbox94”. Since every field used a customized expression to control what was actually displayed, Maciek couldn’t actually tell which field was the “Invoice Date” without examining the properties of hundreds of fields.
It took hours to find the field, and then Maciek carefully built a formatting expression that met Vlad’s vision. He ran the report with a few different sets of parameters, confirmed the output, then he sent sample reports to Vlad. “Can you confirm these reports are correct? If they are, I can roll the changes out to AX.”
A week went by with no reply from Vlad. Maciek moved on to other tasks. Another week went by. Maciek started to forget about Vlad entirely. Then, suddenly, a message with a subject line of “REPORTS ARE BROKEN!!!!!” arrived in Maciek’s inbox.
“Why isn’t this working?” Vlad’s email demanded. “I went to AX and ran the report and the invoice date field isn’t color coded correctly. FIX IT.”
“I didn’t put the changes in production yet,” Maciek replied. “I need you to confirm that the sample reports I sent are correct.”
“I DON’T CARE,” Vlad replied, “I JUST WANT THE REPORTS FIXED.”
This email exchange ballooned into a series of CCs and BCCs. Six levels of management swooped in to solve this crisis, and they solved it by dragging Maciek through twelve hours of meetings. Eventually, Vlad sent a follow up email. “Was there something I was supposed to look at? Could you resend?” Finally, Maciek was able to get the changes validated and released to production. He thought that would be the end of it.
A month later, Vlad sent another email with the subject, “THE REPORTS ARE BROKEN!!!!!” Specifically, the AR invoice report, which Maciek was the last person to touch, just printed out an error code and didn’t generate data.
As the “expert”, Maciek verified the error in production, and then pulled up the report in BIDS to see if he could debug the problem. When he ran the report from BIDS, even against production data, it worked fine. It printed out 3,000 records just fine. Since it worked on his machine, that meant the problem had to be somewhere in Dynamics AX. AX didn’t just run reports, but it had hooks where X++ (AX’s platform-specific programming language) could interact with the report lifecycle.
Maciek grabbed a machete and plunged into the thicket of Microsoft’s X++ code. There, he found this:
/// <summary>
/// Provides the opportunity for validation prior to running the report.
/// </summary>
protected container preRunValidate()
{
// Record count is a good proxy for overall time on this
// report. However, each record requires a significant amount
// of processing and costly balance queries, so the limits
// are set significantly lower for this report than other
// reports. 100 records will take around 10 seconds to process
// and 2500 records will take around 15 minutes to process.
#define.WarningLimit(100)
#define.ErrorLimit(2500)
Query countQuery = this.getFirstQuery();
int recordCount;
recordCount = QueryRun::getQueryRowCount(countQuery);
if (recordCount > #ErrorLimit)
{
// Processing over the error limit should take around 20 minutes, so even
// with some error possible due to overlap in counting this still
// means the report will timeout on a machine with low volume and
// no load.
validateResult = [SrsReportPreRunState::Error];
}
else if (recordCount > #WarningLimit)
{
// Processing up to the warning limit should take around 10 seconds
validateResult = [SrsReportPreRunState::Warning];
}
return validateResult;
}
Note, this code executes after the query has been run against the database. If the query returns more than 2,500 records, this method sets an error code. From the comments, Maciek determined that the original developer believed that rendering a row on the report was neither CPU nor IO bound, but instead was a function of time. Maciek didn’t believe that, and even if it were true, Vlad wouldn’t mind waiting longer for the report to run if he got the results he wanted, but with that error limit set, increasing the timeout wouldn’t do anything.
On a whim, Maciek decided to live dangerously, and disabled the error limit check. Thus far, the report continues to run just fine.