Modeling discussions: Data deletions

time to read 4 min | 671 words

A decade(!) ago I wrote that you should avoid soft deletes. Today I run into a question in the mailing list and I remembered writing about this, it turned out that there has been quite the discussion on this at the time.

The context of the discussion at the time was deleting data from relational systems, but the same principles apply. The question I just fielded asked how you can translate a Delete() operation inside the RavenDB client to a soft delete (IsDeleted = true) operation. The RavenDB client API supports a few ways to interact with how we are talking to the underlying database, including some pretty interesting hooks deep into the pipeline.

What it doesn’t offer, though, is a way to turn a Delete() operation into and update (or an update to a delete). We do have facilities in place that allow you to detect (and abort) on invalid operations. For example, invoices should never be deleted. You can tell the RavenDB client API that it should throw whenever an invoice is about to be deleted, but you have no way of saying that we should take the Delete(invoice) and turn that into a soft delete operation.

This is quite intentionally by design.

Having a way to transform basic operations (like delete –> update) is a good way to be pretty confused about what is actually going on in the system. It is better to allow the user to enforce the required behavior (invoices cannot be deleted) and let the calling code handle this different.

The natural response here, of course, is that this places a burden on the calling code. Surely we want to be able to follow DRY and not write conditionals when the user clicks on the delete button. But this isn’t an issue where this is extra duplicated code.

An invoice is never deleted, it is cancelled. There are tax implications on that, you need to get it correct.
A payment is never removed, it is refunded.

You absolutely want to block deletions of those type of documents, and you need to treat them (very) different in code.

In the enusing decade since the blog posts at the top of this post were written, there have been a number of changes. Some of them are architecturally minor, such as the database technology of choice or the guiding principles for maintainable software development. Some of them are pretty significant.

One such change is the GDPR.

“Huh?!” I can imagine you thinking. How does the GDPR applies to an architectural discussion of soft deletes vs. business operations. It turns out that it is very relevant. One of the things that the GDPR mandates (and there are similar laws elsewhere, such as the CCPA) the right to be forgotten. So if you are using soft deletes, you might actually run into real problems down the line. “I asked to be deleted, they told me they did, but they secretly kept my data!”. The one thing that I keep hearing about the GDPR is that no one ever found it humorous. Not with the kind of penalties that are attached to it.

So when thinking about deletes in your system, you need to consider quite a few factors:

Does it make sense, from a business perspective, to actually lose that data? Deleting a note from a customer’s record is probably just fine. Removing the record of the customer at all? Probably not.
Do I need to keep this data? Invoices are one thing that pops to mind.
Do I need to forget this data? That is the other way, and what you can forget and how can be really complex.

At any rate, for all but the simplest scenarios, just marking IsDeleted = true is likely not going to be sufficient. And all the other arguments that has been raised (which I’m not going to repeat, read the posts, they are good ones) are still in effect.

Tweet Share Share 10 comments

Tags:

Comments

04 Apr 2019
22:55 PM

Stuart Bale

".. enthusing decade .." should perhaps be ".. ensuing decade .." ?

05 Apr 2019
05:36 AM

Christian

One thing about the GPDR right to be forgotten is that it often collides with reality - e.g. imagine a forum. I answer to a thread and other people respond to me - how can I delete that without making the entire thread unreadable? There is an ongoing discussion in such cases that the right is overriden by the majority, but it is an open question how our courts will weight this.

05 Apr 2019
11:19 AM

Craig

With regard to the forum post in relation to GDPR, you would not delete forum post at all, but rather you would somehow obscure the author of the forum post. What the "right to be forgotten" actually provides for is the removal of P.I.I. (Personally Identifiable Information). This means the text of your forum post can remain (meaning the entire thread continues to make sense) but your post would be changed to be attributed to "Anonymous" or some other non-personally identifiable name.

The general approach to adhere to GDPR, specifically in more event-driven or event-sourced systems, is not to delete any data at all but to use Crypto Shredding to prevent decryption of previously encrypted data.

06 Apr 2019
05:54 AM

Oren Eini

Stuart, thanks, fixed.

06 Apr 2019
05:55 AM

Oren Eini

Christian, Looking at Reddit, where such things (due to moderation, not GDPR) happens quite a lot, that is workable. But people have found ways to avoid it. For example, you have bots that create copy of a post to avoid later modifications, deletes, etc.

This gets really interesting (for a legal question, at least), what do you do with someone else post that quotes you? Or another's post that mention you by name?

06 Apr 2019
05:57 AM

Oren Eini

Craig, The issue in this case is that some of the data has never been encrypted at all. And what do you do about PII references from other users to the user who wants to be deleted?

06 Apr 2019
08:51 AM

Stuart Bale

And how does it work if I include personal information in the post - such as saying here that my last name is Bale and my first name is Stuart - now if someone quotes this post, how would the site be able to remove my personal details from the text of this message?

06 Apr 2019
09:55 AM

Craig

Oren - Unencrypted data can always be encrypted in retrospect.

Oren / Stuart - Re: PII references and PII inside posts. Those are very good points, and I don't really know the answer to how to deal with users who would, for example, put their own identifiable information inside the text of their forum post, short of ensuring that all "quoted" posts never copy the raw text, only a reference to the original post, and trying hard to remove or obfuscate personal data (an almost impossible task - much like a profanity filter).

That said, what we're getting to here are the fundamental flaws with GDPR and the so-called "right to be forgotten". Even if we did delete the entire forum post, what happens when I've already read the forum post prior to you deciding to have it removed under GDPR? What happens if half the world reads that post before it's deleted? How do you make me (or anyone else who read your post) forget what we've already read?

06 Apr 2019
12:30 PM

Oren Eini

Stuart, No good options here, I'm guessing. Very likely not, and can cause problems down the road. I assume that telling the user that they need to find all instances of their name for the site to remove isn't valid. And given your name, what do you do if you have another Stuart that talks about moving bales of hay? No real good answer here.

06 Apr 2019
12:34 PM

Oren Eini

Craig, As I understand it, the whole point of crypto shredding is to prepare in advance to be able to "forget" stuff. If all your data are encrypted with your key, I can forget everything about you by deleting the key. However, if I haven't done that, you have the problem of finding out what data belongs to whom. After all, deleting the data or encrypting it is pretty much the same thing.

Another factor here is what happens when you have conflicting requirements. For example, imagine that a user comes to you and wants to be forgotten. You remove the data, then you have a subpoena from a court because this user is involve in a lawsuit and these posts are evidence.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB