Epic Work Items Rollout Plan
## Conditions for Go vs No Go
### Engineering
* [x] https://gitlab.com/groups/gitlab-org/-/epics/12738+ completed
* [x] https://gitlab.com/groups/gitlab-org/-/epics/12751+ completed
* [x] https://gitlab.com/groups/gitlab-org/-/epics/13056+ completed
* [x] Data is in sync on SaaS
* [x] Fallback plan in place with guidelines how to debug
* [x] Rollout on `gitlab-org` and `gitlab-com` without any issues for at least 3 days in production
### Product
* [ ] Epics QA https://gitlab.com/gitlab-org/gitlab/-/issues/472056
## Fallback plan and monitoring
**Monitoring**
* Check sync events on the [Sync Error Dashboard](https://log.gprd.gitlab.net/app/dashboards#/view/802801b0-d256-11ee-9f58-4f945bb0e111?\_g=h@699c3c1)
* [Sentry Errors](https://new-sentry.gitlab.net/organizations/gitlab/issues/?project=3&query=is%3Aunresolved+SyncAsWorkItemError&referrer=issue-list&statsPeriod=14d)
* [DebugBear](https://www.debugbear.com/project/26125?filterId=362&info=custom%3A&interval=month&share=0tp5aA3o0lk9yoOlxsPUeW4qd) (WebVitals Performance)
* [Sentry Performance](https://new-sentry.gitlab.net/organizations/gitlab/performance/summary/?project=4&referrer=performance-transaction-summary&statsPeriod=24h&transaction=projects%3Aissues%3Aindex&unselectedSeries=avg%28%29&unselectedSeries=p100%28%29)
**What do we do in the case of mismatches after rollout?**
1. Evaluate with if it's an actual mismatch or false positive (check the prod database)
1. Either you have read access to a replica, or you can use postgres.ai
2. If there is replica access but you need to check data is required, you can request prod access via teleport
2. Evaluate the impact (how many records are affected)
1. The [PLSQL script](https://gitlab.com/gitlab-org/gitlab/-/issues/470685#note_2025787511 "Epic Work Items Rollout Plan") is a good starting point for the most common data we need to check and the amount of records affected
3. Check with Product if the mismatch is a blocker for GA
1. Depending on the data, product may decide that it's okay to still move forward with GA
4. Evaluate if the mismatch comes from the WorkItem \> Legacy Epic syncing or from Legacy Epic \> WorkItem syncing
1. When a mismatch comes from WorkItem to Legacy Epic syncing, it might be easier to rewrite the APIs to read from the WorkItem and get rid of the Legacy Epic data as SSoT.
5. Evaluate if we need a full re-sync or if we could read from the legacy epic in the meantime
1. To prevent a full re-sync of the data, we could also solve some mismatches by reading from the legacy epic.
2. Use this approach with caution as it only works when the logic on the WorkItem side is correct.
## How to rollout
#### Feature Flags
<table>
<tr>
<th>Feature Flag</th>
<th>Description</th>
</tr>
<tr>
<td>
`work_item_epics`
</td>
<td>Group-based feature flag</td>
</tr>
<tr>
<td>
`work_items_rolledup_dates`
</td>
<td>Group-based feature flag</td>
</tr>
<tr>
<td>
`epic_and_work_item_associations_unification`
</td>
<td>
Group-based feature flag.
:warning:️ This feature flag must not be rolled back once it has been rolled out to a group! Otherwise users would not see data they modified on the work item side.
</td>
</tr>
</table>
### Plan Team Member dogfooding
**`gitlab-org`**
* [x] Enable `work_items_rolledup_dates` for `gitlab-org`
* [x] Enable `epic_and_work_item_associations_unification` for `gitlab-org`
* [x] Enable `work_item_epics` for `gitlab-org`
**`gitlab-com`**
* [x] Enable `work_items_rolledup_dates` for `gitlab-com`
* [x] Enable `epic_and_work_item_associations_unification` for `gitlab-com`
* [x] Enable `work_item_epics` for `gitlab-com`
**`gitlab-data`**
* [x] Enable `work_items_rolledup_dates` for `gitlab-data`
* [x] Enable `epic_and_work_item_associations_unification` for `gitlab-data`
* [x] Enable `work_item_epics` for `gitlab-data`
### GitLab Team Member dogfooding
* [x] Enable `work_item_epics_rollout` for gitlab team members. [See docs](https://docs.gitlab.com/ee/development/feature_flags/controls.html) for how to enable it on a [feature group](https://docs.gitlab.com/ee/development/feature_flags/controls.html#:\~:text=If%20you%20would%20like%20to%20gather%20feedback%20internally%20first%2C%20feature%20flags%20scoped%20to%20a%20user%20can%20also%20be%20enabled%20for%20GitLab%20team%20members%20with%20the%20gitlab_team_members%20feature%20group%3A)
```
/chatops run feature set --feature-group=gitlab_team_members work_item_epics_rollout true
```
### Before SaaS rollout
* [x] Get https://gitlab.com/gitlab-org/gitlab/-/issues/478191+ merged if we agree that we want to continue with SaaS rollout
* [x] Change `work_item_epics` feature flag to only check for the root ancestor
* [x] Enable `work_items_rolledup_dates` Observe the data as this changes the behaviour of dates calculation for groups that do not have `work_item_epics` enabled. If all is fine the flag can be removed
* [x] Enable `epic_and_work_item_associations_unification`for all and observe any potential performance problems. If all is fine the flag can be removed
* [x] Enable `work_item_epics_rollout` for all (This would enable everyone to see the epic work items on `gitlab-org` / `gitlab-com`). The flag can be removed too.
### SaaS rollout
1. Enable for _all_
Since `work_item_epics` will enable work item epics per group with a percentage based rollout, we are not able to guarantee that the following feature flags are enabled for the same groups. For this reason we need to enable them for all.
- [x] `epic_and_work_item_associations_unification`
1. :warning:️ This feature flag must not be rolled back once it has been rolled out to a group! Otherwise users would not see data they modified on the work item side.
- [x] `work_items_rolledup_dates`
2. Use a **Percentage of actors** rollout for `work_item_epics`
Percentage rollout steps rollout steps:
* [x] 10%
* [x] 25% (after 12hs)
* [x] 50% (after 12hs)
* [x] 75% (after 12hs)
* [x] 100% (after 12hs)
### Before SM rollout
* [x] Check service ping metrics introduced by https://gitlab.com/gitlab-org/gitlab/-/merge_requests/161331+ to make sure data is synced correctly
### SM rollout
* [x] Enable `work_item_epics` by default
* [x] Enable `work_items_rolledup_dates` by default
* [ ] Enable `epic_and_work_item_associations_unification` by default
## How to roll back
In the case if something goes wrong:
**When only rolled out to plan team members**
* Disable `work_item_epics` on `gitlab-org`. It's fine to enable again on `gitlab-org/plan-stage`
**When already rolled out to gitlab team members**
* Either disable `work_item_epics` for `gitlab-org` , or if we're fine to have plan-team members still with access, restrict feature flag actors on `work_item_epics_rollout`
**When already rolled out to SaaS**
* Disable `work_item_epics`
* Disable `work_item_epics_rollout`
* Potentially: Disable `work_items_rolledup_dates`
* We might not need to disable it. The cases when we should disable it are: syncing mismatches for dates, and syncing mismatches for the child hierarchy
## Timeline
<table>
<tr>
<th>Date</th>
<th>Description</th>
<th>Notes</th>
</tr>
<tr>
<td>Aug 12</td>
<td>Finish pre-checks. Make sure all data is in sync</td>
<td>
:white_check_mark: [Done](https://gitlab.com/gitlab-org/gitlab/-/issues/470685#note_1984919729 "Epic Work Items Rollout Plan")
</td>
</tr>
<tr>
<td>Aug 12</td>
<td>
**Plan-team dogfooding live 17.4**
Enable epic work items on `gitlab-org`
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>Aug 14</td>
<td>
**Plan-team dogfooding live**
Enable epic work items on `gitlab-com`
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>Aug 14</td>
<td>Pre-checks. Make sure all data is in sync</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>Aug 14</td>
<td>
**GitLab Team member dogfooding live**
En**able** `work_item_epics_rollout` for feature-group `gitlab_team_members`
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>Sept 4</td>
<td>
Enable for all users of `gitlab-org` and `gitlab-com`
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>Before</td>
<td>Pre-checks. Make sure all data is in sync on SaaS</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>October 10</td>
<td>
**SaaS go/no-go decision**
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>October 15</td>
<td>
**Enable on SaaS incrementally 17.6**
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>Before</td>
<td>Pre-checks, Make sure all data is in sync for SM customers as reported by service ping</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>TBC</td>
<td>
**SM go/no-go decision**
</td>
<td>
:white_check_mark:
</td>
</tr>
<tr>
<td>December 19</td>
<td>
**SM live with 17.7**
</td>
<td>
:white_check_mark:
</td>
</tr>
</table>
issue