[关闭]
@ferstar 2024-04-02T05:04:35.000000Z 字数 3631 阅读 30

UAT Environment Data Missing Issue Report

pai


First Encountered Issue

Phenomenon

In JURA5, after the completion of batch1, it was necessary to export annotation data corresponding to 60 codes for accuracy calculation. During this process, it was discovered that over 500 pieces of rule annotation data were lost.

Cause Analysis

During the enhancement development of the Jura2 model, our engineers manually executed the task of synchronizing data from the Jura production environment to the UAT server, to meet data testing requirements. However, this part of synchronized data conflicted with Jura5 annotation data. Our synchronization mechanism did not fully consider this, defaulting to an overwrite operation, which resulted in the loss of this part of annotation data due to being filled with blank data from the Jura official environment. Additionally, by the time the data loss was discovered, it had exceeded the previous one-month backup retention period of the UAT database, thus it was not possible to recover the data through rollback.

Solutions and Follow-ups

  1. Reconstruct the data synchronization logic, defaulting to merging rather than overwriting conflicting data.
  2. Adjust the access permissions of the UAT server, allowing only manager-level development and operation personnel to operate.
  3. Build two new UAT servers, providing independent testing environments for Jura2.1/5/2.
  4. Extend the data backup period of the UAT server to two years.
  5. The accidentally lost manual annotation data was re-annotated.

Second Encountered Issue

Phenomenon

The UAT environment CG report module lost 2 rules: A(a) - Application of Principles, A(b) - Compliance with CPs.

Cause Analysis

A BUG existed in the logic of updating CG Report module grouping information. When parsing data from the user group information initialization table, it failed to clean up extra whitespace, mistakenly judging and executing the cleanup operation for the two rules A(a) - Application of Principles and A(b) - Compliance with CPs, causing loss of rule information.

Solutions and Follow-ups

  1. Refactor the grouping information update code, optimizing the logic for new vs. old rules determination.
  2. Add stricter unit test cases, including but not limited to: verifying the total number of rules, the number of rules for each group in Jura1~5, the number and order of subgroups, etc.
  3. Focus on CodeReview, raising quality requirements for code related to group information updates.
  4. Checked annotation data, no loss was found for the aforementioned two rules.

Further Corresponding Measures

  1. Firstly, hardware resources were checked, and the usage rates of CPU, memory, and disk were all under 80%. Such hardware resource configuration can ensure the stability and good performance of the UAT environment;
  2. Further checked the data backup mechanism. To ensure the safety and integrity of data, data in the UAT environment is synchronously backed up to a remote location every day at dawn, with a backup period of no less than two years, to guard against potential accidents. If there is a need to clear or redeploy the UAT environment, we promise to apply to HKEX in advance.
  3. Currently, two business models are using the UAT environment simultaneously, involving 100+ rules for model extraction and prediction in the same annual report, during which there are frequent manual synchronizations, adjustments, changes, and internal testing. It is difficult to maintain data consistency. In response to this situation, a UAT2 environment and a UAT3 environment have already been set up, providing testing platforms for the JURA2.1 team, JURA5 team, and JURA2 team respectively, to ensure the smooth progress of testing work.
  4. Strengthen the regression testing effort of the UAT environment. Specifically, every time there is a major version of code submission, corresponding regression testing will be arranged to ensure that all functions can still operate normally in the new code version.
  5. To standardize operations and reduce the possibility of misoperation, we have again provided detailed training for related colleagues, established standard operating procedures, and limited that only administrator-level users can operate the UAT environment.
  6. Regression testing for UAT2 and UAT3 environments has passed.
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注