architecture (614) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1086) rss
raven (1455) rss
ravendb.net (539) rss
reviews (184) rss

2025
- July (5)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Aug 20 2009

A design question: What do you do with inaccurate data?

time to read 1 min | 158 words

I intentionally don’t intend to give out enough information about this problem, I want to see what your opinion is.

I have an application where a certain action invalidate some of the data that the user is shown. It is quite expensive to recalculate that data, so we can’t just recalculate it right then and there, and in many cases, it will be the exact same data as the user is currently shown.

The question is, what should we do with this data?

Ignore the invalidation and just show the (possibly invalid) data to the user until the application refresh itself normally.
Remove the data all together. However, the lack of the data is already meaningful in the application.
Put some notification that the data is invalid.

I have my own opinions, but I would like to hear what you think we should do…

Tweet Share Share 36 comments

Tags:

Design

Comments

20 Aug 2009
20:41 PM

Mike G

I would show the user that the data is invalid (like a red border around the box containing the data) and give them an obvious way to recalculate it if they want to.

20 Aug 2009
20:45 PM

Andy Miller

If this is a business application, the issue has probably come up before (though not necessarily with this data) and possibly solved in another application. If so, then I would go with something similar (or at least not opposite) of the other solutions.

If this is the first time the issue has come up and the audience has no expectations, then I would go with #3: indicate that the data is possibly invalid.

20 Aug 2009
20:49 PM

yonderboi

IMHO, the user should be notified that the data he is looking at might not be valid, especially if it will influence the decision that he is about to make.

Additionally, you can give user an option to refresh the data while making it clear that the operation might be time consuming due to the complexity of calculations that will take place.

If you audit the decision making process and the statistics will show that the user is happy with having some invalid data, you can remove the "force refresh" option after all.

20 Aug 2009
20:55 PM

Phil

+1 for what Mike and Andy said, indicate that the data is invalid. I would probably be more likely to show a message to the effect that there is no valid data (and just show the message, not the data). Showing possibly bad data to an end user seems like a really bad idea, even with disclaimers.

Although it kind of depends what you mean by the second bullet point. Do you mean the lack of data shown to a user means something to the user, or the lack of data being there (i.e. the data actually gets deleted) means something?

If you mean the former, I would likely take the step of removing the data from the system completely, as having possibly inaccurate data in the database seems like a bad idea to me as well.

20 Aug 2009
21:01 PM

Jan Willem B

It depends on the type of data that is shown. Incorrect data in certain contexts (medical application for example) could lead to incorrect decisions and to serious damage.

20 Aug 2009
21:15 PM

Scott White

I catch it and handle it but I don't put it in the transactional table because no data is better then bad data.

20 Aug 2009
21:22 PM

zvolkov

I would show the stale data. If user could have loaded that page / screen a second earlier and the app would have legitimately presented the same data that you now consider stale. Unless of course the same user can be the source of the change, then it would be non-intuitive having him changed the data but still seeing the old version on another screen.

20 Aug 2009
21:23 PM

Shane Courtrille

+1 to Jan

The problem with the showing the invalid data response is that it ignores human nature. You've stated that the data most of the time will be the same. If you give the invalid data and a way to update it then it's highly likely for a week people will update the data when they need it. When the only thing that changes is the color of the box (or whatever indicator you come up with) they're going to decide that the color of the box isn't really that important. And then they're likely to start ignoring the indicator and just use the data (since it's almost always the same anyway right?)

The question becomes does this really matter? How real time do you need to be? If someone can die then I would absolutely never show incorrect data. I think context is quite important here.

20 Aug 2009
21:32 PM

Mats Helander

3, without a doubt. Don't deny the user the opportunity to see the old data (#2), but make it very clear that it is invalid (make it gray?) and offer a refresh option.

20 Aug 2009
21:37 PM

Stephen

Isn't this entirely a.. it depends.. answer

Like anything the context should be evaluated, theres times when it is critical and showing wrong data EVEN WITH a notice that the data may be incorrect and is being inspected (waiting for the updated data).. times when you might as well just not say anything because the information is not critical, and even if the 'warning' system comes for free, the additional complexity of the UI wouldn't be something you want..

I would imagine the majority case would be to show incorrect data with a notice.

20 Aug 2009
21:47 PM

Joshua

As a general rule I would go with displaying the stale data, since it’s the easiest thing and in most situations it won't adversely affect the user experience. That said if there was something particular about the data that made it important to the user decision path I would indicate it was invalid with maybe an option to refresh. The option of not showing the data seems like a bad idea since it will likely confuse the user.

20 Aug 2009
21:49 PM

Fabio Maulo

Which kind of "data" you are talking about ?

20 Aug 2009
21:53 PM

dave-ilsw

Since the lack of data already has a meaning, then removing the data is probably as bad of an idea as leaving the usually correct, but sometimes wrong data fully visible.

One way to make it clear that there used to be valid data here and that there will eventually be valid data here again is to put a screen over the entire area that holds the invalid data - don't just screen off the data, but screen off the text labels and white space too, leaving only valid data and actionable controls fully visible.

20 Aug 2009
22:00 PM

Roger

Unless the data is readonly, aren't we developers always showing potential inaccurate data to the user?

If the limit is to show 1ns or 1 year old data, I would say totally depends on the context.

20 Aug 2009
22:15 PM

Frank Quednau

If removing the data has a meaning, wouldn't logic dictate that marking it as invalid also obtains a meaning?

What does invalid mean anyway if you say that the recalculation may lead to just the same result?

Can the further processing of the data before recalculation, be it an interpretation, or action, lead to issues regarding a user's confidence in the application or her taking the wrong actions or bringing the system into an unwanted state?

Without an answer to those questions I see all three alternatives as perfectly viable.

20 Aug 2009
22:41 PM

Paul

Make it grey. Give a it a tooltip that says it's stale... It's probably still gonna be useful

20 Aug 2009
23:14 PM

Chris Holmes

I think what Oren means by "the lack of the data is already meaningful in the application" is that the absence of the data has a specific meaning.

Take the NHibernate Profiler, for instance. If you are wanting to look at some data from it, and you perform an action, and you expect some data to come to the screen, but it does not, then that is meaningful to you as the user.

So in this case, what I would probably want to do is show the data, but then gray it out and pop a progress bar informing the user that we're going to recalculate the data because it is no longer valid.

21 Aug 2009
00:59 AM

Ward Bell

1 and #2 strike me as almost always wrong.

I'm inclined toward #3 but it may be insufficient.

Whether it is sufficient or not depends upon the consequences if the user relies on the invalid data ... despite the fact that it is clearly invalid.

For example, if it the label says "Are there land mines?" and the invalid answer is "No" ... then #3 is not going to be sufficient. It's pretty expensive to look for land mines so I can see why you wouldn't do it automatically. But #3 is clearly the wrong approach.

If the label says "Credit Score:" ... well it's not as dangerous as a land mine ... but it could be too risky to let stand.

If the label says "Number of Orders Placed This Year:" ... then #3 is good enough.

That's what I mean by "it depends".

BTW: It might be important to explain in what way the data are invalid ... and what steps would be necessary to make them valid.

21 Aug 2009
01:19 AM

Jason Y

3, and disable taking any action based on the data. That is, gray out the data AND any buttons involving doing something with it. And show something to let the user know that the app is refreshing the data.

21 Aug 2009
03:31 AM

Mike

This does seem like a 'depends' answer, but do you need the control something like a state pattern would give you where you can show (possibly bad) data by persisting the event that invalidates the data and then have the ability to show both versions and then work with that event/action within the solution you are presenting to the user?

If this would be too heavy then I'd remove the data in the context I work in because bad data has serious ramifications on the decisions the users make outside the application.

21 Aug 2009
05:27 AM

Venu

This would depend on the context and whether the data would influence user's decision. For mission critical like stock prices or medical information based on which the user has to make a decision NOW, it is better to notify that the data is invalid.

In situations where the user can be lead through a workflow and the calculation can be performed again (like a eCommerce shopping cart wizard), you might even get away with a message notifying user that the action might have invalidated the price and the final calculation will be performed before placing the order.

If you audit these changes and see that these numbers are always coming in favor of the user, then you can perform an offline compensating action and email the user that price has been adjusted in his favor.

21 Aug 2009
05:56 AM

Set

If you don't have a requirement on performance, just recalculate the data ( well unless it takes 5 minutes to spit it out ).

After the possibility to update the data at will would be the best in most cases

21 Aug 2009
06:18 AM

Markus Zywitza

Don't model the data as invalid data. After all, the data is not invalid or inaccurate, it is just not up to date.

At the time of calculation, the data is neither invalid nor inaccurate, so I would show exactly this to the user, either in the way of a combination of value und calculation timestamp, or even better as a short historic list of values and timestamps (5 most recent values or 5 most recent differing values) so that the user can decide for herself if she wants to recalculate the value.

21 Aug 2009
07:05 AM

Dave Mertens

I have the same issue with some credit score dossiers. With a entity extension (much like a Windsor extension or a Nhibernate interrogator) I can add additional functionality to my base entity class. Which is a combination of Caliburn, CSLA and DependedObject/DependedProperty. I've created a extension that only invalidates the current screen if a property changes which is displayed to the user. If an entity invalidates, a notification is shown at the top of the screen. It's up to the user to invoke the 'Update' procedure (which can take some time) or to continue with the current invalidated data.

21 Aug 2009
08:46 AM

firefly

I agree with most. #1 is definitely a no no. #2 and #3 are debatable. Whatever it is the user should be made aware of what is happening.

I like Markus idea. Especially if the old data are any interest to the user. We can save them to a set then implement some paging. Each time the user enter some new data that require calculation a new set of data is created.

21 Aug 2009
08:50 AM

Dhananjay Goyani

As the 'lack of data' is fine, the #3 looks good. But again this depends upon kind of application / scenario.

21 Aug 2009
08:58 AM

Reshef Mann

I would go for showing the data and notifying the user that it might be invalid. I think it is the http://en.wikipedia.org/wiki/Worse_is_better way to go.

Anyway, I would make sure that he wouldn't miss the message.

21 Aug 2009
09:08 AM

Peter Morris

In the past I have had a separate persistent class for the totals. For example

MyClassTotals

Whenever something changed which would affect those totals I deleted the totals. When the user tried to read the totals I would show them (if present) or calculate them there and then and update the DB for the next query.

If the calculations are VERY expensive I'd have a nightly job calculating the non existent totals, and give the user in the UI a button "Calculate" which is enabled only when the totals are not present, so they can pay the cost of viewing the totals if it is important enough to them.

21 Aug 2009
11:44 AM

Fredy

It really depends.

With the given info I guess I would:

•Put some notification that the data is invalid.

And let the user refresh manually if he likes.

21 Aug 2009
13:30 PM

Sean Gough

Without more information my vote goes to showing a warning that the data is potentially invalid.

It really depends on context and how the data is being used though. If there is a requirement for accurate data at all times and decisions are being made on said data then even #3 might not be sufficient (unless it's very obvious).

On the flip side if the data is informational only and accuracy is not critical the #3 should suffice. I'd probably still give them a "refresh on demand" option though.

21 Aug 2009
14:11 PM

Eli Thompson

Show the stale data, with a notice that it's stale, and give the user a way to refresh it.

21 Aug 2009
14:54 PM

James L

Hide the data, have a button to go get the data if they really need to see it.

Stale data is no use to anyone, and it's human nature to assume it's 'good enough' and make some bad decision based on it.

21 Aug 2009
16:12 PM

Michel Grootjans

I would ask the customer ;-)

You can create a whole project thinking about all the possibilities. In the end, the customer has to make the decision, and you can assist him/her and inform him/her of the consequences of the alternatives.

21 Aug 2009
22:24 PM

zihotki

We should never underestimate 'stupidity' of a user. And also we never make a noise of warning messages because this may cause effects like - 'Mooom, what this application wants from me? Did I broke it?'. And we can't ask a user each time we need to recalculate the data.

The best we can do here is to add command's batching. For example, a user clicks on sort button to sort come complex amount of data and sorting action is added to batching window (or whatever). After he clicks some other button to filter some data on some pattern and we add this action to the batch too. If the user want to see the result he can click a 'Run all commands' button. If he wants to add some other manipulation logic - he can add. And the data will stay valid. Also we can show some time estimation of operations and let the user to rest, to drink a cup of tee or coffee, and also we can grab the latest citations from bash.org or whatever and amuse the user.

But this way isn't perfect. But from my point of view it's more preferable than notifications about stale data (a lot of users don't read notifications and product documentation). And this will prevent user from waiting for the end of operation (e.g. 'What? I need to wait some stupid program? Agrh.. It makes me crazy, sometimes I'll broke this stupid box..').

And also it seems to me that the answer on your question depends very much on approximate time of data recalculation.

Ayende, do you intend to share your opinions?

22 Aug 2009
07:32 AM

Anne

I agree that this is a "It depends" question:

Option 2 seems like the worst option to me. It also looks a bit like finding an easy way out of a tricky design question, which is why I probably wouldn't do it that way.

In my opinion, option 3 sounds like what I try to go with. This has to do with believing that the user should get as much information as possible from the system, allowing them to take appropriate actions based on the data that is displayed. In this case I'd probably go for a design that shows the stale data, along with a note telling the user that their data is probably out-of-date. Make sure that the field with the stale data is properly visualized (e.g. a red border has been suggested somewhere in the comments), so that it's unlikely that the user will miss that information.

Additional features could be that this note is only displayed if the data is indeed stale (not sure if that can be handled by the system). Otherwise just always display a warning message. Plus, if it's possible to provide the user with the possibility to trigger an update, that would be nice too.

As I said it all depends on what the data is. You might want to go with the first option, just displaying the stale data, if the chances of that to happen are pretty slim, the differences between the displayed data and the actual data are marginal and the effects on the user aren't that bad.

I think the guidelines for me are always: Don't confuse the user and give them as much information as they need to be able to properly work with the system. However, if a problem might affect less than 1% of the users and the effort to fix it is high, you might want to consider, whether you can live with this edge case happening or not.

22 Aug 2009
19:46 PM

Jimmy Zimms

I'd ask the domain expert what needs to happen. This is a risk that ONLY has an answer from the experts. EACH business problem is totally going to be different. Sometimes the answer will be simple and sometimes not. THIS IS NOT A TECHNOLOGY DECISION-this is a business decision that will be implemented by our technology!

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

A design question: What do you do with inaccurate data?

Comments

3, without a doubt. Don't deny the user the opportunity to see the old data (#2), but make it very clear that it is invalid (make it gray?) and offer a refresh option.

1 and #2 strike me as almost always wrong.

3, and disable taking any action based on the data. That is, gray out the data AND any buttons involving doing something with it. And show something to let the user know that the app is refreshing the data.

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Comments

3, without a doubt. Don't deny the user the opportunity to see the old data (#2), but make it very clear that it is invalid (make it gray?) and offer a refresh option.

1 and #2 strike me as almost always wrong.

3, and disable taking any action based on the data. That is, gray out the data AND any buttons involving doing something with it. And show something to let the user know that the app is refreshing the data.

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication