Design of snapshots in a transactional database along with versioning of reference data

Disclaimer: I have read everything that I can read on the topic of snapshots and versioning on both stack overflow and on internet. My requirement is not version tracking for audit trail or the database-level snapshots. I have spent more than 1 week to research on my own and to think through the possible options. Sorry, I could have missed some links - if the solution to my problem is already discussed in some other thread, please point me there.

It is a bit long; please bear with me.

Here is the situation: We are trying to create a generic design to store snapshots of the transactional data in our transactional database and also to keep a revision history of reference data.

As part of the business process, a user can press a button to publish certain object. For the purpose of illustration, let us say that the user can publish a proposal from the vendor before negotiation starts. Then, at different points in time through the negotiation process, the user can publish the proposal data. The proposal contains a budget, sales targets and many other items. When a proposal is snapshotted, all the linked entities have to be snapshotted. Finally, after the negotiation, a contract is signed. At this point, a complete snapshot of the contract has to be created. Not all the entities in the contract are there in the proposal – there are lot of overlapping entities, but there are unique entities attached to proposal and contract.

We have to keep both these published versions and the latest active versions available. Published versions are made available on a website to be referenced by both vendors as well the management team. Not all published versions are made available on the website, but the last published proposal and latest published contract are always available in the website. This website also has to be populated from the same database.

Also, a finance user can decide to snapshot only the budget alone and a sales manager can snapshot the sales targets. So, snapshots are available at multiple granularities.

We also have a requirement to track versions of the master data. It is a business requirement to track all the changes to key master data columns over time. For example, we have region information associated with the sales targets. The name of the region can change and we want to track these changes. Let us assume that at the time of proposal, the region’s name is R1 and a snapshot is created. Then, the name of the region changes to R2 and then 2 other snapshots are created. We want to able to link the sales targets to the correct region name at those points in time, not necessarily to the latest region name.

We have some flexibility in modelling as we have both a transaction DB and a data warehouse DB and we can decide to store some of this information either in the transaction DB or in the data warehouse DB.

Here is our design. We have a Publication table which captures basic information about the published data – who published and the date, the reason, and the type of object published (proposal or budget or sales targets).

We store the snapshots in the same table as the original data. So, proposal snapshots would be stored with live proposals in the proposals table. We have a column called Publication ID in every table which has to be published. This column is a FK to the Publication table. If the Publication ID is null, that record is the active version.

I realized that the post is very lengthy. Hence, rather than listing the scenario details, I thought of quickly summarizing the design considerations in a mind map. Snapshot Design Considerations

Теперь есть два решения, к которым мы склоняемся - оба будут хранить снимок всех данных, независимо от того, изменились они или нет. Поддержание только дельты при сохранении неповрежденных структур таблиц потребует очень сложной хранимой процедуры, которая должна выполняться при каждой вставке / обновлении любого из созданных снимков объектов. Я не хочу идти по этому пути, так как это займет больше времени, да и объемы в любом случае не такие уж большие.

Решение 1. Каждый раз, когда публикуется объект (например, предложение или бюджет), мы заполняем дерево XML и сохраняем его в базе данных. На веб-сайте должна быть доступна только последняя версия, а старые версии нужны редко. Учитывая это, столкнусь ли я с большой проблемой производительности из-за использования XML? Мы используем SQL Server. Объемы данных невелики.

Решение 2: Все таблицы транзакций будут иметь идентификатор публикации, а справочные данные будут иметь даты начала и окончания. Каждый раз, когда объект публикуется, мы делаем копию всех записей транзакции и помещаем туда идентификатор публикации, а также копируем все записи справочных данных и помещаем дату моментального снимка в качестве даты окончания. Это позволило бы нам иметь нормальное управление версиями справочных данных вне процесса публикации.

Здесь мне нужны мнения опытных умов относительно недостатков этих двух подходов и того, есть ли другой лучший сценарий.

6
задан DaveInCaz 20 July 2016 в 19:44
поделиться