Negative feature response: Automatic attachment compression in RavenDB

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (895) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1450) rss
ravendb.net (534) rss
reviews (184) rss

2025
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Oct 20 2021

Negative feature responseAutomatic attachment compression in RavenDB

time to read 1 min | 187 words

Following my previous post, which mentioned that you can save significantly on disk space if you store a plain text attachment using gzip, we go a feature request:

Perhaps in future attachments could have built-in compression as well?

The answer to that is no, but I thought that it is worth a post to explain why not.

Let’s consider the typical types of attachments that you’ll store in RavenDB. Based on experience, we usually see:

PDF files
Word / Excel / Power Point
Images (JPEG, PNG, GIF, etc)
Videoes
Designs (floor plans, CAD / DWG, etc)
Text files

Aside from the text files, pretty much all the data you’ll store as an attachment is already compressed. In fact, you’ll be hard pressed today to find any file format that does not already have built-in compression.

Compressing already compressed data is… suboptimal. I will not usually lead to significant space savings and can actually make the file size larger. It also burns CPU cycles unnecessarily.

It is better to shift the responsibility to the users in this case, since they have a lot more information about what they actually put into RavenDB and won’t have to guess.

Tweet Share Share 6 comments

Tags:

Comments

17 Oct 2021
22:26 PM

Milosz

These are fair points you made. I guess if somebody insisted it could be overcome by introducing EligibleForCompression property to the attachments infrastructure but I totally understand that attachments are general-purpose mechanism and doing special-cases is problematical and in this case may be not worthy.

If someone really insisted on having collection-wide compression of texts I guess he could emulate attachments to some degree and store texts in normal documents within its own entity type e.g. TextAttachment or if he wanted to be more domain-specific EbookContent.

Regarding user-based compression - I was wondering how would one know whether the text in an attachment is compressed and if so how is it compressed. Two ideas came to my mind that make use of attachment's ContentType property:

The usage of Media Type's Structured Syntax Name Suffixes. There are 3 compression-related suffixes registered at the moment: +zip, +gzip, +zstd
(https://www.iana.org/assignments/media-type-structured-suffix)
Example: text/plain+gzip;charset=utf-8
Usage of unregistered suffixes is not recommended "given the possibility of conflicts with future suffix definitions"
(https://www.rfc-editor.org/rfc/rfc6838.html#section-4.2.8)
The usage of own Media Type (https://en.wikipedia.org/wiki/Media_type#Registration_trees, https://www.rfc-editor.org/rfc/rfc6838.html#section-3.1)

I could even imagine myself creating an IAttachmentsSessionOperations extension methods called Store{/Get}CompressedText that would wrap a stream into {de}compressing stream and would construct{/parse} a ContentType string.

17 Oct 2021
22:50 PM

Milosz

Sorry, I forgot to clarify that hypothetical EligibleForCompression property would be set by user when storing an attachment.

19 Oct 2021
09:26 AM

Oren Eini

Milosz,

Yes, technically speaking you can re-use document compression in RavenDB to do cross text compression. Not something that I actually thought of, but would work.In general, EligibleForCompression is the same as just sending a gzip (or zstd, etc) values, no need to get anything inside RavenDB involved.

19 Oct 2021
09:55 AM

Milosz

Sure, it wouldn't be much helpful if it would just locally compress the attachment - what I meant is that by EligibleForCompression it would behave like a smart compression of values in documents that you presented in the very first approach in the previous post (also described here https://ravendb.net/articles/ravendb-5-0-features-smart-document-compression).
But then again - I totally understand that gains here are probably negligible compared to feature implementation and ownership.

20 Oct 2021
14:29 PM

Steve

We just added 2 extension methods to IAttachmentsSessionOperations to StoreGzipped/TryGetGzipped for the (very) few cases where we do store some larger text files. It tries to gzip them, for larger results we keep the original content, and the TryGetGzipped first checks for the binary marker to determine if it was gzipped:

bool IsGzipCompressed(byte[] data) => data.Length > 1 && data[0] == 0x1F && data[1] == 0x8B;

This also helped us that we didn't have to gzip all the existing documents and we can just keep the same logic, when you really need the attachment directly from the database you can just download it, add .gz to the filename and use WinRAR or any other tool to decrompress them.

20 Oct 2021
15:25 PM

Oren Eini

Steve,

Awesome that this is that easy to integrate.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Negative feature responseAutomatic attachment compression in RavenDB

More posts in "Negative feature response" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

More posts in "Negative feature response" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication